[00:08:50] New patchset: Asher; "extend check_mysql_slave_delay to support pt-heartbeat monitoring by server_id in a multi-tier replication tree + nrpe hooks." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1920 [00:09:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1920 [00:19:48] !log powercycling formey [00:19:50] Logged the message, Master [00:21:09] hi. what does "Node-count limit exceeded" mean? [00:21:25] where are you seeing this? [00:21:53] Ryan_Lane, did it swap die? [00:21:57] maybe [00:22:02] at the bottom of https://pt.wiktionary.org/w/index.php?title=Wikcion%E1rio:Lista_de_l%EDnguas [00:22:03] I have no clue [00:22:09] Reedy: did you do something? [00:22:13] No [00:22:14] I've seen it a few times in the last few weeks [00:22:17] ah. ok [00:22:18] really? [00:22:21] swap death? [00:22:24] what was causing it? [00:22:34] One time Tomasz killed process for me when I noticed them [00:22:43] viewvc.cgi processes that max the cpu and cause a lot of swapping [00:22:46] run for 20-30 minutes+ [00:23:05] wonder if there's some exploit in viewvc [00:23:43] hi, is something up with server lag? [00:23:44] The last 24 hrs I've had issues where I make edits, click to edit the page further and I'm getting old revisions lacking the most recent edits (which appear on history and isn't fixed by purge) [00:23:45] PROBLEM - Puppet freshness on srv191 is CRITICAL: Puppet has not run in the last 10 hours [00:26:46] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1920 [00:32:30] Change abandoned: Ryan Lane; "something went wrong." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1920 [00:33:17] New patchset: Asher; "extend check_mysql_slave_delay to support pt-heartbeat monitoring by server_id in a multi-tier replication tree + nrpe hooks." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1921 [00:33:47] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1921 [00:33:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1921 [00:33:49] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1921 [00:33:52] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1921 [00:53:14] !log stopping pdns on virt1 again to test dns [00:53:16] Logged the message, Master [01:00:23] !log brought pdns on virt1 back up [01:00:24] Logged the message, Master [02:04:47] !log LocalisationUpdate completed (1.18) at Sat Jan 14 02:04:46 UTC 2012 [02:04:49] Logged the message, Master [02:05:16] Ontime [02:14:51] !log LocalisationUpdate failed: SVN update of extensions failed [02:14:53] Logged the message, Master [02:15:58] !log That was me testing something [02:16:00] Logged the message, Master [02:27:14] Reedy: can you file rt tickets? [02:27:23] yeah [02:27:37] why? [02:28:26] I need !bug 33509 to be filed there, since that's not a typical problem we should deal with in bugzilla [02:28:56] Ah [02:28:56] yeah [02:29:07] New patchset: Bhartshorne; "added query duration statistics" [operations/software] (master) - https://gerrit.wikimedia.org/r/1922 [02:29:54] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1922 [02:29:57] done [02:30:58] long query was long [02:31:30] thanks [02:31:40] wonder who could possibly handle this [02:32:40] Depends where the entry is actually hosted [02:32:50] I know Rob did some for the labs domains earlier today [02:33:12] Hi, anyone can tell me how much data memcached can deal on mw-servers? [02:33:21] alot [02:33:37] errm [02:33:39] gigabytes then? [02:33:49] http://noc.wikimedia.org/conf/highlight.php?file=mc.php [02:33:51] 79 hosts [02:34:53] Ooh, now I need a manual to translate that [02:35:00] $wgMemCachedInstanceSize = 2000; [02:35:17] http://svn.wikimedia.org/viewvc/mediawiki/trunk/debs/memcached/debian/memcached.conf?view=markup [02:35:20] 2G each [02:35:34] so at least 158GB [02:35:38] ah yeah, thats it :) [02:36:08] possibly more [02:36:11] but can't be far off [02:37:06] 158GB should be enough :) [02:37:56] gn8 folks [02:39:44] Reedy: instra.com, I think it's pretty old [02:39:51] now I need to lower the cache for mysqld and get memcached installed... [02:40:06] Reedy: we host the site so no changes are needed besides DNS registrar bureaucracy [02:40:24] Most of WMF stuff now are on godaddy [02:40:37] this one is so old that it isn't [02:41:17] They should have had a password since the change was made from Florida to San Franisco [02:41:27] *cisco [02:41:46] heh [02:43:20] PROBLEM - Host sq46 is DOWN: PING CRITICAL - Packet loss = 100% [02:52:59] RECOVERY - Puppet freshness on srv191 is OK: puppet ran at Sat Jan 14 02:52:36 UTC 2012 [02:56:19] Reedy, $wgMemCachedTimeout is in seconds? [02:58:20] umph, google is my friend -- sorry [02:58:26] Heh [02:58:35] Most likely [02:58:50] did not expect to find sth via google [02:59:11] http://www.mediawiki.org/wiki/Manual:$wgMemCachedTimeout [02:59:38] Most of our globals are documented to some level or another [03:00:00] that is exemplary :) [03:00:47] I wonder why $wgMemCachedInstanceSize isn't [03:02:07] it's not in core [03:02:13] I wonder if it's legacy/invented [03:02:24] as it'll be set in the memcached config rather than by mw [03:03:44] well at least size is defined in the memcached.conf as 2000 [03:12:49] I am wondering if there is a max "timeToLive" for a memcached object -- cannot find $wgMemc's Class -- no manualpage :( [03:14:41] Look at the code [03:15:32] max ttl will be set in memcached itself I'd guess [03:15:38] Well, it tells me I can set a ttl, but I am just wondering if there is a max value behind that [03:16:42] usually most people aren't caching data that is that static [03:16:52] http://www.google.co.uk/webhp?sourceid=chrome-instant&ix=tea&ie=UTF-8&ion=1#sclient=psy-ab&hl=en&site=webhp&source=hp&q=memcached%20max%20ttl&pbx=1&oq=&aq=&aqi=&aql=&gs_sm=&gs_upl=&fp=4c26890357e371b1&ion=1&ion=1&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=4c26890357e371b1&ion=1&biw=1366&bih=667 [03:16:56] Google suggests 30 days [03:17:41] whew, 30 days is alot [04:15:32] RECOVERY - Disk space on es1004 is OK: DISK OK [04:16:52] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:39:28] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [04:59:57] PHP fatal error in /usr/local/apache/common-local/php-1.18/includes/objectcache/MemcachedClient.php line 912: Allowed memory size of 125829120 bytes exhausted (tried to allocate 533926 bytes) [04:59:57] on http://commons.wikimedia.org/w/index.php?title=Special:RecentChanges&limit=5000 [05:00:09] get some RAM! ;) [05:00:49] 2500 works [05:01:27] you shouldn't be permitted to get 5000 i thought? [05:01:34] unless you have some special rights? [05:01:47] bot or apihighlimits or sysop? [05:01:51] I didn't get them - got the error instead ;) [05:01:52] sysop [05:02:15] but.. em.. 5000 list entries... not that much?! [05:02:44] most ppl are limited to 500 i think... [05:02:47] not 5 million ;) [05:02:53] don't think so [05:02:58] 500 is the GUI limit ;) [05:02:59] although that might not apply to this particular query [05:03:10] you can get more by URL manipulation [05:03:21] nearly sure [05:03:38] 500 is the limit for the api across the board unless you have apihighlimit(s?) [05:03:45] hmm.. really? [05:03:46] which sysops and bots get automatically [05:03:54] but your link wasn't the api [05:04:07] ah [05:04:08] sure [05:04:15] anyway, i got the same error as a lowly autoconfirmed [05:05:33] in any way: that error isn't a graceful deny ;) [05:06:24] sure, it's trying to actually fulfill and it dies [05:06:30] yup [05:06:52] probably it's doing too much in memory and should be instead flushing to the client periodically [07:29:29] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [10:06:24] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 445028 MB (3% inode=99%): [10:08:24] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 436164 MB (3% inode=99%): [10:09:34] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [10:37:04] RECOVERY - MySQL slave status on es1004 is OK: OK: [16:30:30] zz [16:54:29] New review: Petrb; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/1918 [17:39:00] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [19:06:39] I'm trying to reset the password for an account on testwiki. The email is sent OK, and I can log in with the temp password and set a new password OK, but when I try to use that new password, I'm told "Login Error for ACP1 Incorrect password entered. Please try again." What is ACP1, and how can I log into my account/ [19:09:11] alot_of_mike: hi, does it happen on other sites too? or only on test wiki [19:09:27] Isn't that the account creation project? [19:09:42] yes, sounds like acct creation project [19:09:50] I'm testing on meta now - the message is slightly different (doesn't mention ACP1) but the effect is the same - I cannot log in with the new password [19:10:05] alot_of_mike: ok, just try to reset pw on meta [19:10:07] what username? [19:10:15] Reedy: Perlwikibot testing [19:10:48] petan: I did. I still get a password error after changing the password from the temp one, it just doesn't mention ACP1 [19:11:04] ok, let's wait what Reed y find out [19:12:13] If it makes a difference, the account is locked via centralauth. It used to keep users from logging in, but hasn't done that in years, unless behaviour was changed since I've been gone. [19:14:26] It's blocked indefinitely on enwiki/testwiki [19:15:04] on testwiki too O.o [19:15:35] Reedy: Should that affect logging in? My recollection is that it should not. [19:15:50] I've no idea [19:15:59] oh, I did that [19:16:00] In theory it shouldn't [19:16:02] I wonder why [19:17:00] * alot_of_mike figured it out [19:17:09] we don't need to edit for the tests that use this account :) [19:17:16] But we do need to log in... :\ [19:18:28] I can try unblocking/unlocking it to see if that helps [19:18:50] not sure why it needs to be locked and blocked from editing [19:18:54] I wouldn't bother, I'm quite certain it has no effect on logging in. [19:19:16] Reedy: I didn't have sysop on all wikis, and the account shouldn't be permitted to edit, so I had pathoschild lock it. [19:22:55] huh, apparently being locked /does/ keep you from logging in now. I guess it'd be less evil if that were explained when your login fails [19:23:15] which is why we asked the devs to change it way back long ago -_- [20:07:01] !log stopping pdns on virt1 [20:07:03] Logged the message, Master [20:19:23] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [20:29:48] !log stopping opendj on virt1 [20:29:50] Logged the message, Master [21:10:20] PROBLEM - LDAP on virt1 is CRITICAL: Connection refused [21:12:50] PROBLEM - LDAPS on virt1 is CRITICAL: Connection refused [23:23:17] how come I can't edit from latest diffs anymore [23:23:25] *edit sections [23:37:06] !log shutting down virt1 [23:37:07] Logged the message, Master [23:41:42] PROBLEM - Host virt1 is DOWN: PING CRITICAL - Packet loss = 100% [23:50:02] PROBLEM - Host virt1.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100%