[00:06:08] gn8 folks [00:11:13] New patchset: Ryan Lane; "Add gluster cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2663 [00:12:03] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2663 [00:12:03] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2663 [00:13:25] Evening guys, are you aware of server issues with the Simple English Wikipedia? I've tried 3 times to submit an edit now, and keep getting a message from my browser informing me that the server did not send any data. Error 324 (Empty Response) [00:16:09] New patchset: Ryan Lane; "Fix labstore node syntax" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2664 [00:16:31] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2664 [00:16:31] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2664 [00:16:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2664 [00:16:33] !log andrew synchronized php-1.19/extensions/Vector/modules/ext.vector.collapsibleNav.js 'Deploy r111804' [00:16:35] Logged the message, Master [00:19:24] New patchset: Ryan Lane; "RAWR. How did this upstart job not get fixed?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2665 [00:19:45] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2665 [00:19:46] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2665 [00:20:51] New patchset: Lcarr; "Creating new class for new nagios host (aka neon)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [00:21:12] !log andrew synchronized php-1.19/extensions/Vector/modules/ext.vector.collapsibleNav.js 'Deploy r111806' [00:21:15] Logged the message, Master [00:27:03] Just to let you know, if you have anyone on VirginMedia in the UK, they might be having grief getting onto some or all Wikipedias, looks like Virgin's interchange for Amsterdam is down. [00:27:35] I've just tried tracerouting to simple.wikipedia.org, the amsterdam interchange is the last one I hit, the next three hops all die [00:27:42] New patchset: Lcarr; "Creating new class for new nagios host (aka neon)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2666 [00:28:17] BarkingFish: oh yeah ? are you having issues on editing or on reads as well ? [00:28:27] LeslieCarr: Both [00:28:39] can you paste in a traceroute please ? [00:28:46] sure [00:28:54] thanks [00:28:56] I'll put it up on pastebin.com save flooding here [00:31:44] LeslieCarr: http://pastebin.com/LKkQN3Aj [00:36:23] NedFlanders: i am seeing ok connectivity - can you give me your ip or your gateway plz ? [00:36:38] oh wait a second [00:36:44] i'm seeing a bit of p-loss [00:36:51] sure, i just have to go find my public IP :) [00:43:15] !log aaron synchronized php-1.19/resources/mediawiki.special/mediawiki.special.preferences.js 'deployed r111808' [00:43:16] Logged the message, Master [00:44:33] !log andrew synchronized php-1.19/resources/Resources.php 'Deploy r111809' [00:44:35] Logged the message, Master [00:51:04] !log andrew synchronized php-1.19/resources/mediawiki/mediawiki.user.js 'Attempt to repush r111695' [00:51:06] Logged the message, Master [00:58:43] !log Running scap to ensure a consistent environment [00:58:45] Logged the message, junior [01:00:27] !log andrew synchronizing Wikimedia installation... : [01:00:29] Logged the message, Master [01:03:45] sync done. [01:19:22] sorry again, LeslieCarr - I pinged out. [01:19:44] no problem [01:20:12] looks like everything is good this end now, I can get to simple, I can get to the smaller wikis I access too, it could just be something which will resolve- it's still slow, but I am getting in. [01:20:51] if it's still giving issues , tomorrow alternately jump on here/berate your isp :) [01:21:28] oh I'll berate them alright. Their customer service line will be so blue I'll give the operator a heart attack [01:22:11] I should be on cable, but the cabinet which supplied it got vandalised a fortnight ago, doors kicked off and wires pulled out. So we're on public Wifi again. [01:38:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:39:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.038 seconds [01:50:51] Reedy: did you deploy the r111688 changs? [01:51:41] Sat Feb 18 1:29:28 UTC 2012 srv202 mediawikiwiki SpoofUser::getConflicts 10.0.6.26 1146 Table 'centralauth.user' doesn't exist (10.0.6.26) [01:56:10] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 626s [01:56:19] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 635s [01:59:05] !log aaron synchronized php-1.19/extensions/CentralAuth/CentralAuth.php 'disabled AntiSpoof hooks which broken account creation with DB errors' [01:59:07] Logged the message, Master [02:13:07] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:15:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.064 seconds [02:18:33] !log LocalisationUpdate completed (1.18) at Sat Feb 18 02:18:33 UTC 2012 [02:18:35] Logged the message, Master [02:21:16] AaronSchulz: would look like I didn't as I was mainly using the maintenance scripts on fenari... [02:22:30] lol [02:30:31] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [02:35:13] !log LocalisationUpdate completed (1.19) at Sat Feb 18 02:35:12 UTC 2012 [02:35:15] Logged the message, Master [02:38:01] RECOVERY - Puppet freshness on search1002 is OK: puppet ran at Sat Feb 18 02:37:45 UTC 2012 [02:56:46] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:57:31] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [03:23:48] PROBLEM - RAID on spence is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:24:24] PROBLEM - DPKG on spence is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:24:42] PROBLEM - Disk space on spence is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [04:57:24] en Wiki and Commons are not loading. When trying to access them both sites just seem to hang [05:01:13] Bidgee: Both are loading fine for me, is it still happening? [05:01:47] Slowly coming back but painfully slow [05:02:33] That might be a ISP based issue, because well, you are the only one atm [05:03:46] Another editor on #wikimedia-au also stated how slow it was about an hour or so ago [14:42:11] Is Wikipedia being slow or is it my connection? [05:09:48] tracert from Bidgee if/when someone looks at this: http://pastie.org/3406026 [05:56:54] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [05:59:54] PROBLEM - Puppet freshness on search1003 is CRITICAL: Puppet has not run in the last 10 hours [06:02:54] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [06:02:55] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [06:45:34] * Aaron|home reads http://www.preservenet.com/freeways/FreewaysEmbarcadero.html [08:10:21] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [08:13:12] PROBLEM - Lucene on search3 is CRITICAL: Connection timed out [08:13:21] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [08:14:24] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.002 second response time on port 8123 [08:14:25] PROBLEM - DPKG on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:15:36] RECOVERY - DPKG on db1047 is OK: All packages OK [08:38:24] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [08:43:21] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [08:53:06] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [08:54:09] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.006 second response time on port 8123 [08:54:54] RECOVERY - Lucene on search3 is OK: TCP OK - 0.006 second response time on port 8123 [09:05:17] New patchset: Mark Bergsma; "Remove Ryan's ssh key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2667 [09:05:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2608 [09:05:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2667 [09:05:52] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2667 [09:05:52] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2667 [09:10:54] New patchset: Mark Bergsma; "Revert "Remove Ryan's ssh key"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2668 [09:11:15] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2668 [09:12:43] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2668 [09:12:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2668 [09:14:24] PROBLEM - Disk space on srv219 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=62%): /var/lib/ureadahead/debugfs 0 MB (0% inode=62%): [09:21:00] RECOVERY - Disk space on srv219 is OK: DISK OK [09:23:42] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Sat Feb 18 09:23:29 UTC 2012 [10:40:40] !log catrope synchronized php-1.19/resources/mediawiki/mediawiki.js 'touch' [10:40:42] Logged the message, Master [10:41:10] !log catrope synchronized php-1.19/resources/mediawiki/mediawiki.user.js 'touch' [10:41:12] Logged the message, Master [10:41:33] !log catrope synchronized php-1.19/resources/startup.js 'touch' [10:41:35] Logged the message, Master [10:47:30] PROBLEM - Disk space on db30 is CRITICAL: DISK CRITICAL - free space: / 287 MB (3% inode=87%): [10:47:48] PROBLEM - MySQL disk space on db30 is CRITICAL: DISK CRITICAL - free space: / 287 MB (3% inode=87%): [11:54:37] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:58:57] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [12:31:39] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [13:39:56] hi, can i ask something about http://ja.wikipedia.beta.wmflabs.org here? [13:43:43] whym: I gess #wikimedia-labs fits better [13:45:05] hoo: actually I tried it before... but maybe I'll try again in a different time [13:47:09] depends what the question is, ask it in here and we may be better able to assist you [13:47:52] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:50:25] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [14:12:55] PROBLEM - MySQL Slave Delay on db16 is CRITICAL: CRIT replication delay 236 seconds [14:18:10] RECOVERY - MySQL Slave Delay on db16 is OK: OK replication delay 3 seconds [14:31:42] !log reedy synchronized php-1.19/includes/actions/HistoryAction.php 'r111828' [14:31:44] Logged the message, Master [15:17:05] !log reedy synchronized php-1.19/extensions/AntiSpoof/ [15:17:07] Logged the message, Master [15:18:12] !log reedy synchronized php-1.19/extensions/CentralAuth [15:18:14] Logged the message, Master [15:47:37] New patchset: Pyoungmeister; "adding searchidx1001 to site.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2669 [15:47:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2669 [15:57:22] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [16:00:22] PROBLEM - Puppet freshness on search1003 is CRITICAL: Puppet has not run in the last 10 hours [16:03:22] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [16:03:22] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [16:06:13] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:07:25] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [16:40:34] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:41:46] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [18:11:16] PROBLEM - Puppet freshness on bast1001 is CRITICAL: Puppet has not run in the last 10 hours [18:27:55] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:29:07] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [18:40:04] PROBLEM - Disk space on srv222 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=62%): /var/lib/ureadahead/debugfs 0 MB (0% inode=62%): [18:43:49] PROBLEM - Puppet freshness on search1001 is CRITICAL: Puppet has not run in the last 10 hours [18:44:34] PROBLEM - Disk space on srv224 is CRITICAL: DISK CRITICAL - free space: / 232 MB (3% inode=62%): /var/lib/ureadahead/debugfs 232 MB (3% inode=62%): [18:45:55] RECOVERY - Disk space on srv224 is OK: DISK OK [18:49:49] PROBLEM - Disk space on srv224 is CRITICAL: DISK CRITICAL - free space: / 208 MB (2% inode=62%): /var/lib/ureadahead/debugfs 208 MB (2% inode=62%): [18:50:34] RECOVERY - Disk space on srv222 is OK: DISK OK [18:51:10] RECOVERY - Disk space on srv224 is OK: DISK OK [19:34:49] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [20:45:07] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2669 [20:45:08] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2669 [20:56:50] PROBLEM - Lucene on search1002 is CRITICAL: Connection refused [22:32:09] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [23:11:15] New patchset: Pyoungmeister; "new conf for eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2670 [23:11:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2670 [23:14:04] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2670 [23:14:04] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2670 [23:23:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:24:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.648 seconds [23:53:52] gn8 folks [23:58:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:59:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.033 seconds