[00:03:15] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:11:42] New patchset: Aaron Schulz; "global-multiwrite -> global-swift for now." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41814 [00:12:53] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41814 [00:16:28] !log aaron synchronized wmf-config/CommonSettings.php [00:16:37] Logged the message, Master [00:17:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.443 seconds [00:24:33] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [00:42:45] !log aaron synchronized php-1.21wmf6/extensions/ConfirmEdit [00:42:52] Logged the message, Master [00:47:09] New patchset: Catrope; "Fix failing Nagios check for Parsoid Varnish" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/41819 [00:48:03] * AaronSchulz detects RoanKattouw lurking though pipes [00:50:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:01:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.079 seconds [01:10:51] PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours [01:10:52] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [01:15:12] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 287 seconds [01:17:00] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 20 seconds [01:19:10] New patchset: Aaron Schulz; "Use nas for captchas for testwikis for now." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41821 [01:19:19] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41821 [01:20:15] !log aaron synchronized wmf-config/CommonSettings.php [01:20:24] Logged the message, Master [01:38:09] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:52:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [01:53:31] New review: Ori.livneh; "Thank you." [operations/mediawiki-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/41812 [01:53:31] Change merged: Ori.livneh; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/41812 [02:24:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:46] PROBLEM - MySQL disk space on db78 is CRITICAL: DISK CRITICAL - free space: /a 115778 MB (3% inode=99%): [02:32:05] !log LocalisationUpdate completed (1.21wmf6) at Tue Jan 1 02:32:04 UTC 2013 [02:32:13] Logged the message, Master [02:33:40] https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=dbrepllag&sishowalldb= lag is rising on one. [02:36:12] PROBLEM - Apache HTTP on mw36 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:21] PROBLEM - Apache HTTP on mw33 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:21] PROBLEM - Apache HTTP on mw27 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:21] PROBLEM - Apache HTTP on mw32 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:21] PROBLEM - Apache HTTP on srv237 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:21] PROBLEM - Apache HTTP on mw52 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:21] PROBLEM - Apache HTTP on srv204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:22] PROBLEM - Apache HTTP on mw37 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:22] PROBLEM - Apache HTTP on mw29 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:30] PROBLEM - MySQL Slave Delay on db59 is CRITICAL: CRIT replication delay 267 seconds [02:36:39] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.024 seconds [02:36:40] PROBLEM - Apache HTTP on mw31 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:40] PROBLEM - Apache HTTP on mw17 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:40] PROBLEM - Apache HTTP on mw28 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:48] PROBLEM - Apache HTTP on mw26 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:48] PROBLEM - Apache HTTP on mw35 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:48] PROBLEM - Apache HTTP on mw39 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:48] PROBLEM - Apache HTTP on mw25 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:48] PROBLEM - Apache HTTP on mw30 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:58] PROBLEM - Apache HTTP on mw55 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:58] PROBLEM - Apache HTTP on mw34 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:58] PROBLEM - Apache HTTP on mw51 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:58] PROBLEM - Apache HTTP on mw24 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:36:58] PROBLEM - Apache HTTP on mw48 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:06] PROBLEM - Apache HTTP on mw40 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:06] PROBLEM - Apache HTTP on mw49 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:06] PROBLEM - Apache HTTP on mw58 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:06] PROBLEM - Apache HTTP on mw54 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:15] PROBLEM - Apache HTTP on mw44 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:16] PROBLEM - Apache HTTP on mw59 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:23] Hmm, it went back down. [02:37:24] PROBLEM - Apache HTTP on mw47 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:33] PROBLEM - Apache HTTP on mw43 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:34] PROBLEM - Apache HTTP on mw21 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:42] PROBLEM - Apache HTTP on mw38 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:43] PROBLEM - Apache HTTP on mw22 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:51] PROBLEM - Apache HTTP on mw42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:51] PROBLEM - Apache HTTP on mw46 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:38:00] RECOVERY - Apache HTTP on mw29 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.062 second response time [02:38:00] RECOVERY - Apache HTTP on srv237 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.172 second response time [02:38:00] RECOVERY - Apache HTTP on srv204 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.198 second response time [02:38:00] RECOVERY - Apache HTTP on mw32 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.272 second response time [02:38:00] RECOVERY - Apache HTTP on mw33 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.689 second response time [02:38:18] RECOVERY - Apache HTTP on mw17 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.094 second response time [02:38:18] RECOVERY - MySQL Slave Delay on db59 is OK: OK replication delay 0 seconds [02:38:18] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 188 seconds [02:38:18] RECOVERY - Apache HTTP on mw28 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.973 second response time [02:38:18] RECOVERY - Apache HTTP on mw31 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.690 second response time [02:38:27] RECOVERY - Apache HTTP on mw35 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.039 second response time [02:38:27] RECOVERY - Apache HTTP on mw26 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.055 second response time [02:38:27] RECOVERY - Apache HTTP on mw25 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.049 second response time [02:38:27] RECOVERY - Apache HTTP on mw30 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.061 second response time [02:38:27] RECOVERY - Apache HTTP on mw39 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.269 second response time [02:38:36] RECOVERY - Apache HTTP on mw55 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.058 second response time [02:38:36] RECOVERY - Apache HTTP on mw34 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.056 second response time [02:38:36] RECOVERY - Apache HTTP on mw24 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [02:38:36] RECOVERY - Apache HTTP on mw48 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.945 second response time [02:38:45] RECOVERY - Apache HTTP on mw49 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.057 second response time [02:38:45] RECOVERY - Apache HTTP on mw58 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.056 second response time [02:38:45] RECOVERY - Apache HTTP on mw40 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.016 second response time [02:38:45] RECOVERY - Apache HTTP on mw54 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.833 second response time [02:38:54] RECOVERY - Apache HTTP on mw59 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.056 second response time [02:38:54] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 186 seconds [02:38:54] RECOVERY - Apache HTTP on mw44 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.594 second response time [02:39:12] RECOVERY - Apache HTTP on mw21 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.057 second response time [02:39:21] RECOVERY - Apache HTTP on mw22 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.043 second response time [02:39:21] RECOVERY - Apache HTTP on mw43 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.061 second response time [02:39:21] RECOVERY - Apache HTTP on mw38 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.962 second response time [02:39:30] RECOVERY - Apache HTTP on mw42 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.059 second response time [02:39:30] RECOVERY - Apache HTTP on mw46 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.059 second response time [02:39:39] RECOVERY - Apache HTTP on mw36 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.045 second response time [02:39:48] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.065 second response time [02:39:48] RECOVERY - Apache HTTP on mw52 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.057 second response time [02:39:48] RECOVERY - Apache HTTP on mw37 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.067 second response time [02:40:24] RECOVERY - Apache HTTP on mw51 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 7.128 second response time [02:40:51] RECOVERY - Apache HTTP on mw47 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.517 second response time [02:47:45] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 198 seconds [02:48:48] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 220 seconds [02:52:51] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 8 seconds [02:54:03] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [02:59:54] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [03:00:03] RECOVERY - Puppet freshness on neon is OK: puppet ran at Tue Jan 1 02:59:38 UTC 2013 [03:07:51] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [03:07:51] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [03:07:51] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [03:07:51] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [03:29:54] RECOVERY - MySQL disk space on db78 is OK: DISK OK [03:30:30] RECOVERY - Puppet freshness on cp1028 is OK: puppet ran at Tue Jan 1 03:30:19 UTC 2013 [03:32:00] RECOVERY - Puppet freshness on mw55 is OK: puppet ran at Tue Jan 1 03:31:55 UTC 2013 [04:30:37] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [04:30:37] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [05:51:38] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [05:51:38] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [05:51:38] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [05:51:38] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [05:51:38] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [05:51:39] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [06:58:59] PROBLEM - Puppet freshness on solr2 is CRITICAL: Puppet has not run in the last 10 hours [07:00:56] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [07:10:59] PROBLEM - Puppet freshness on solr3 is CRITICAL: Puppet has not run in the last 10 hours [07:10:59] PROBLEM - Puppet freshness on solr1003 is CRITICAL: Puppet has not run in the last 10 hours [07:11:53] PROBLEM - Puppet freshness on solr1001 is CRITICAL: Puppet has not run in the last 10 hours [08:00:05] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [08:41:11] PROBLEM - Puppet freshness on sq81 is CRITICAL: Puppet has not run in the last 10 hours [08:54:05] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [08:57:14] PROBLEM - SSH on srv191 is CRITICAL: Server answer: [09:03:05] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [09:12:14] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 194 seconds [09:13:18] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 235 seconds [09:22:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.046 seconds [09:40:08] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [10:06:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:17:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.272 seconds [10:26:11] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [10:51:05] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [10:51:14] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [10:51:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:05:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [11:11:51] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [11:11:51] PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours [11:16:30] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 188 seconds [11:16:48] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 192 seconds [11:25:30] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [11:25:30] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [11:37:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:43:12] PROBLEM - NTP on srv191 is CRITICAL: NTP CRITICAL: No response from NTP server [11:53:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.033 seconds [12:27:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:36:45] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [12:36:46] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [12:37:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.625 seconds [12:46:57] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [13:00:45] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [13:08:51] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [13:08:51] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [13:08:51] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [13:08:51] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [13:12:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:25:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.021 seconds [13:35:51] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [13:58:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:11:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.516 seconds [14:32:11] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [14:32:11] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [14:45:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.524 seconds [15:01:35] PROBLEM - Apache HTTP on srv191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:01:58] Can someone restart gerrit-wm? Apparently it's not passing messages on to #mediawiki [15:31:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:42:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.389 seconds [15:43:08] PROBLEM - Host srv266 is DOWN: PING CRITICAL - Packet loss = 100% [15:52:29] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [15:52:29] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [15:52:29] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [15:52:29] PROBLEM - Puppet freshness on ms-fe1003 is CRITICAL: Puppet has not run in the last 10 hours [15:52:29] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [15:52:29] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [16:17:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:21:10] Is there an easy way to find n such that "Wikimedia wikis' uptime for (editors/readers) in 2012 was n%"? [16:21:53] (I realise there are many simplications involved, but it would give a reasonable overview) [16:24:00] woosters: Perhaps you might know [16:24:02] [16:21] Jarry1250 Is there an easy way to find n such that "Wikimedia wikis' uptime for (editors/readers) in 2012 was n%"? [16:29:23] PROBLEM - MySQL Replication Heartbeat on db1035 is CRITICAL: CRIT replication delay 188 seconds [16:29:59] PROBLEM - MySQL Slave Delay on db1035 is CRITICAL: CRIT replication delay 205 seconds [16:30:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.021 seconds [16:49:29] RECOVERY - MySQL Slave Delay on db1035 is OK: OK replication delay 0 seconds [16:50:32] RECOVERY - MySQL Replication Heartbeat on db1035 is OK: OK replication delay 0 seconds [17:00:27] PROBLEM - Puppet freshness on solr2 is CRITICAL: Puppet has not run in the last 10 hours [17:02:32] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [17:03:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:14] PROBLEM - MySQL Slave Delay on db1007 is CRITICAL: CRIT replication delay 188 seconds [17:05:32] PROBLEM - MySQL Replication Heartbeat on db1007 is CRITICAL: CRIT replication delay 199 seconds [17:12:26] PROBLEM - Puppet freshness on solr3 is CRITICAL: Puppet has not run in the last 10 hours [17:12:26] PROBLEM - Puppet freshness on solr1003 is CRITICAL: Puppet has not run in the last 10 hours [17:13:29] PROBLEM - Puppet freshness on solr1001 is CRITICAL: Puppet has not run in the last 10 hours [17:15:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.825 seconds [17:17:58] hi jarry1250 [17:18:22] Hi :) [17:18:31] Sorry, I forgot to copy my second line earlier [17:18:33] [16:21] Jarry1250 (I realise there are many simplications involved, but it would give a reasonable overview) [17:18:34] there is a way but available only to a us [17:18:54] we have to get into watchmouse and run some scripts [17:20:58] woosters: Right you are. Is that a 2 minute job or a 5 hour sort of thing? [17:21:57] a few minutes [17:23:28] Mmm, well, I'd be interest to know anyway; I was looking for a figure to put in my "2012 in review" story for the Signpost Technology report [17:26:58] let me dig into it this evening and send it to u [17:28:12] woosters: Of course. You have my email I think, if I'm not lurking on IRC [17:28:22] yep [17:31:20] RECOVERY - MySQL Slave Delay on db1007 is OK: OK replication delay 0 seconds [17:32:23] RECOVERY - MySQL Replication Heartbeat on db1007 is OK: OK replication delay 0 seconds [17:50:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:01:29] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [18:01:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.526 seconds [18:21:08] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.004 second response time on port 11000 [18:26:32] PROBLEM - Puppet freshness on srv191 is CRITICAL: Puppet has not run in the last 10 hours [18:36:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:26] PROBLEM - Puppet freshness on sq81 is CRITICAL: Puppet has not run in the last 10 hours [18:49:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.025 seconds [18:55:29] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [19:04:29] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [19:04:56] PROBLEM - Parsoid on kuo is CRITICAL: Connection refused [19:22:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:26:32] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:35:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.948 seconds [19:37:47] PROBLEM - Apache HTTP on srv220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:39:26] RECOVERY - Apache HTTP on srv220 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.311 second response time [19:39:44] RECOVERY - Parsoid on kuo is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.057 seconds [19:41:32] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [19:44:59] PROBLEM - Apache HTTP on srv220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:47:50] PROBLEM - Apache HTTP on srv222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:48:08] PROBLEM - Apache HTTP on srv224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:17] RECOVERY - Apache HTTP on srv222 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.308 second response time [19:51:35] RECOVERY - Apache HTTP on srv224 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.064 second response time [19:51:51] !log shot hung converts on all image scalers [19:51:53] RECOVERY - Apache HTTP on srv220 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.061 second response time [19:52:02] Logged the message, Master [19:52:30] and gone again. happy new year to all. [19:53:23] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.079 second response time [20:08:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:21:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.475 seconds [20:27:26] PROBLEM - Puppet freshness on stat1 is CRITICAL: Puppet has not run in the last 10 hours [20:46:51] apergos: https://upload.wikimedia.org/wikipedia/commons/thumb/9/90/Cutterspractical00vinc.pdf/page6-595px-Cutterspractical00vinc.pdf.jpg [20:46:53] related? [20:49:28] don't know [20:50:46] try a slightly different size once or twice [20:56:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:07:13] Jarry1250: Sounds like interesting stats (re: uptime/year). [21:10:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.029 seconds [21:13:29] PROBLEM - Puppet freshness on silver is CRITICAL: Puppet has not run in the last 10 hours [21:13:29] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [21:42:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:54:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.028 seconds [22:28:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:38:32] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [22:38:32] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [22:42:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.048 seconds [22:53:15] Ryan_Lane: did you see virt0's memcache is still bouncy? [23:02:32] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [23:10:29] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [23:10:29] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [23:10:29] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [23:10:29] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [23:15:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:28:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.605 seconds [23:37:29] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours