[00:03:25] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [00:34:10] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 181 seconds [00:35:13] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 193 seconds [00:35:22] PROBLEM - Puppet freshness on mc1003 is CRITICAL: Puppet has not run in the last 10 hours [00:50:07] [01:28:46] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [01:29:04] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [02:07:30] PROBLEM - Puppet freshness on knsq28 is CRITICAL: Puppet has not run in the last 10 hours [02:29:31] !log LocalisationUpdate completed (1.21wmf9) at Sun Feb 17 02:29:31 UTC 2013 [02:29:36] Logged the message, Master [02:33:48] New review: Hoo man; "Patch Set 3:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48979 [02:35:33] New patchset: Hoo man; "Fix CORS for wikidata testing instances" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48979 [02:37:11] RECOVERY - MySQL disk space on neon is OK: DISK OK [02:37:20] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [02:37:25] New review: Hoo man; "Patch Set 4:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/48979 [03:03:24] So now, of course, I'm starting to get mail to ops@ and can't help but worry about that failed backup. :-) [03:29:59] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [03:30:17] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 3 seconds [03:49:45] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [03:49:45] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [04:06:15] PROBLEM - Puppet freshness on db1002 is CRITICAL: Puppet has not run in the last 10 hours [04:07:18] PROBLEM - Puppet freshness on cp1023 is CRITICAL: Puppet has not run in the last 10 hours [04:20:30] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [04:22:09] RECOVERY - MySQL disk space on neon is OK: DISK OK [04:50:24] Coren: are you in SF? [04:50:48] jeremyb_: Nope. Working from the East coast (Specifically, near Montréal) [04:51:01] Coren: have you been to foulab? [04:51:10] or met drdee? [04:51:39] Neither, though now that I look at it I should put it on my todo. :-) [04:53:33] Coren: :-) [04:53:41] Coren: you should also come to NYC sometime! [04:53:53] http://lists.wikimedia.org/pipermail/wikimedia_nyc/2013-January/000265.html [04:54:03] that's 6.5 days away [04:54:50] Heh, no way I can make it atm; I'll have to catch the next opportunity. [04:55:03] ok :( [04:55:45] * jeremyb_ runs off to sleep [04:55:54] Especially not the weekend before I "officially" start working. I have this feeling of dread and terror at the workload already. :-) [05:16:18] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [05:16:45] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [05:50:53] RECOVERY - MySQL disk space on neon is OK: DISK OK [05:51:20] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [06:00:02] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 183 seconds [06:01:32] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 208 seconds [06:05:09] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [06:05:17] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [06:05:26] PROBLEM - Puppet freshness on srv246 is CRITICAL: Puppet has not run in the last 10 hours [06:06:02] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [06:11:17] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 9.022 second response time on port 8123 [06:25:51] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [06:27:29] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 9.018 second response time on port 8123 [06:52:04] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:53:43] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.802 seconds [07:22:25] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [07:27:40] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.027 second response time on port 8123 [07:32:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:45:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.037 seconds [08:13:07] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [08:15:49] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:28:43] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.063 seconds [08:37:06] PROBLEM - Puppet freshness on mc1006 is CRITICAL: Puppet has not run in the last 10 hours [08:39:03] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: Puppet has not run in the last 10 hours [08:44:00] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [08:52:06] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [09:09:03] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [09:34:32] PROBLEM - Backend Squid HTTP on sq41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:34:58] PROBLEM - Frontend Squid HTTP on sq41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:10] PROBLEM - SSH on sq41 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:50:43] PROBLEM - Puppet freshness on kaulen is CRITICAL: Puppet has not run in the last 10 hours [09:52:49] RECOVERY - Frontend Squid HTTP on sq41 is OK: HTTP OK HTTP/1.0 200 OK - 631 bytes in 0.008 seconds [09:59:43] PROBLEM - Puppet freshness on palladium is CRITICAL: Puppet has not run in the last 10 hours [09:59:43] PROBLEM - Puppet freshness on sq85 is CRITICAL: Puppet has not run in the last 10 hours [10:00:37] PROBLEM - Puppet freshness on db1026 is CRITICAL: Puppet has not run in the last 10 hours [10:00:37] PROBLEM - Puppet freshness on knsq23 is CRITICAL: Puppet has not run in the last 10 hours [10:04:40] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [10:36:34] PROBLEM - Puppet freshness on mc1003 is CRITICAL: Puppet has not run in the last 10 hours [11:09:08] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [11:09:25] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [11:14:04] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 197 seconds [11:14:13] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 201 seconds [11:24:52] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [11:25:01] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [11:40:48] RECOVERY - MySQL disk space on neon is OK: DISK OK [11:41:15] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [12:08:33] PROBLEM - Puppet freshness on knsq28 is CRITICAL: Puppet has not run in the last 10 hours [12:48:54] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 203 seconds [12:49:39] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 220 seconds [12:51:09] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [12:51:27] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [13:10:48] RECOVERY - Puppet freshness on cp1023 is OK: puppet ran at Sun Feb 17 13:10:21 UTC 2013 [13:12:18] RECOVERY - Puppet freshness on db1002 is OK: puppet ran at Sun Feb 17 13:12:01 UTC 2013 [13:25:12] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [13:25:48] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [13:56:15] RECOVERY - MySQL disk space on neon is OK: DISK OK [13:56:24] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [14:06:36] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 196 seconds [14:07:21] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 212 seconds [14:13:59] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [14:14:26] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [14:53:10] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 185 seconds [14:54:29] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 199 seconds [15:24:06] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 28 seconds [15:24:33] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [16:06:42] PROBLEM - Puppet freshness on srv246 is CRITICAL: Puppet has not run in the last 10 hours [16:17:15] New patchset: Aude; "Add changesAsJson setting for Wikibase" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/49489 [16:18:42] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 201 seconds [16:19:01] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 192 seconds [16:20:30] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [16:20:57] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [16:28:00] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 22 seconds [16:29:30] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [16:44:12] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 187 seconds [16:44:30] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 192 seconds [16:45:43] RECOVERY - MySQL disk space on neon is OK: DISK OK [16:46:00] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [16:48:06] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [16:49:36] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.028 second response time on port 8123 [16:51:24] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [16:51:42] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [17:57:41] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [17:58:26] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [18:14:20] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [18:28:58] RECOVERY - MySQL disk space on neon is OK: DISK OK [18:29:25] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [18:38:43] PROBLEM - Puppet freshness on mc1006 is CRITICAL: Puppet has not run in the last 10 hours [18:40:49] PROBLEM - Puppet freshness on snapshot4 is CRITICAL: Puppet has not run in the last 10 hours [18:40:54] sup fools [18:42:59] any system admin around to investigate the problem with the messages at #wikimedia-tech ? [18:45:46] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [18:53:43] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [19:10:49] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [19:22:49] PROBLEM - Puppet freshness on sq41 is CRITICAL: Puppet has not run in the last 10 hours [19:52:22] PROBLEM - Puppet freshness on kaulen is CRITICAL: Puppet has not run in the last 10 hours [20:01:22] PROBLEM - Puppet freshness on palladium is CRITICAL: Puppet has not run in the last 10 hours [20:01:22] PROBLEM - Puppet freshness on sq85 is CRITICAL: Puppet has not run in the last 10 hours [20:02:25] PROBLEM - Puppet freshness on db1026 is CRITICAL: Puppet has not run in the last 10 hours [20:02:26] PROBLEM - Puppet freshness on knsq23 is CRITICAL: Puppet has not run in the last 10 hours [20:06:19] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [20:26:07] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 186 seconds [20:26:52] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 193 seconds [20:28:40] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [20:29:43] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [20:33:35] New review: Ori.livneh; "Patch Set 3:" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49200 [20:38:16] PROBLEM - Puppet freshness on mc1003 is CRITICAL: Puppet has not run in the last 10 hours [21:14:43] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [21:15:37] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [21:45:28] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho [21:46:49] RECOVERY - MySQL disk space on neon is OK: DISK OK [22:09:53] PROBLEM - Puppet freshness on knsq28 is CRITICAL: Puppet has not run in the last 10 hours [22:10:38] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 186 seconds [22:11:06] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 196 seconds [22:57:53] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [22:58:20] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [23:13:37] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [23:14:04] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [23:30:43] RECOVERY - MySQL disk space on neon is OK: DISK OK [23:30:52] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 2 processes with args ircecho