[00:01:07] ottomata: yeah kind of :-] [00:01:14] ottomata: I am on 3rd floor [00:02:33] RECOVERY - MySQL disk space on neon is OK: DISK OK [00:03:03] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:03:04] RECOVERY - Host wtp1001 is UP: PING OK - Packet loss = 0%, RTA = 0.41 ms [00:03:04] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [00:03:33] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [00:05:13] PROBLEM - Parsoid on wtp1001 is CRITICAL: Connection refused [00:06:05] roankattouw: new wtp1001 has been provisioned [00:06:13] Yay [00:06:14] Thanks [00:06:16] I'll bring it up [00:06:25] New patchset: Dzahn; "move all monitoring related stuff into new monitoring directory and include that in site.pp and remove ./misc/nagios.pp (merge with nagios.pp)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51792 [00:07:43] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 00:07:36 UTC 2013 [00:07:53] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 00:07:44 UTC 2013 [00:08:03] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:08:03] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 00:07:56 UTC 2013 [00:08:33] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [00:08:43] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 00:08:41 UTC 2013 [00:09:03] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:09:13] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 00:09:07 UTC 2013 [00:09:33] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [00:09:53] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 00:09:51 UTC 2013 [00:10:03] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:10:13] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 00:10:11 UTC 2013 [00:10:33] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [00:10:53] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 00:10:50 UTC 2013 [00:11:03] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:11:13] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 00:11:10 UTC 2013 [00:11:33] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [00:11:53] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 00:11:44 UTC 2013 [00:12:04] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:12:13] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 00:12:03 UTC 2013 [00:12:33] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [00:12:34] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 00:12:30 UTC 2013 [00:13:03] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:13:23] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 00:13:21 UTC 2013 [00:13:33] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [00:13:43] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 00:13:41 UTC 2013 [00:14:06] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:14:33] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [00:15:43] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 00:15:41 UTC 2013 [00:16:03] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [00:24:14] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:24:43] PROBLEM - SSH on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:24:51] New patchset: Pyoungmeister; "using db71 instead of db71" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51810 [00:25:43] RECOVERY - SSH on palladium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:26:38] New patchset: Reedy; "using db71 instead of db70" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51810 [00:26:47] notpeter: I was sitting here wtf'ing [00:26:56] since when db71 !== db70 [00:27:02] FAIL [00:27:06] db71 !== db71 [00:27:44] Reedy: i was trying to create a causality loop to travel back in time [00:28:27] ottomata: still around? [00:28:31] yupheyaa [00:28:38] how about an emery reboot now [00:28:42] you in ops bunker? [00:28:45] yeah [00:28:56] should be fine, let's just send erik z and email with time it is down [00:28:57] OK I figured out the salt thing [00:28:57] that should be fine [00:29:14] !log installing package upgrades on emery [00:29:15] if you let me know I can email the analytics list [00:29:19] Logged the message, Master [00:29:25] perfect, serveradmin log will suffice [00:29:30] i'll email with that [00:29:35] times from that [00:29:37] ok? cool:) [00:29:39] PROBLEM - mysqld processes on db69 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [00:29:58] New patchset: Demon; "Add wikibugs IRC bot to #mediawiki-visualeditor" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37570 [00:35:35] !log rebooting emery for kernel upgrade [00:35:41] Logged the message, Master [00:37:21] PROBLEM - Host kaulen is DOWN: PING CRITICAL - Packet loss = 100% [00:37:45] icinga-wm: not true?! [00:37:49] RECOVERY - Host kaulen is UP: PING OK - Packet loss = 0%, RTA = 26.56 ms [00:38:04] mutante: I was about to say the same thing [00:38:09] BZ was timing out for a minute or so [00:39:44] hmm, not under heavy load [00:39:49] PROBLEM - SSH on emery is CRITICAL: Connection refused [00:39:59] PROBLEM - udp2log log age for aft on emery is CRITICAL: Connection refused by host [00:40:19] that about emery is me and known [00:40:23] rebooting [00:40:39] PROBLEM - udp2log log age for emery on emery is CRITICAL: Connection refused by host [00:41:02] RobH: https://wikitech.wikimedia.org/wiki/Sartoris#Bringing_up_new_minions [00:41:59] New patchset: Mark Bergsma; "Restart bits varnish on cache allocation failures" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51814 [00:46:00] New patchset: Mark Bergsma; "Restart bits varnish on cache allocation failures" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51814 [00:48:19] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51814 [00:48:58] RoanKattuow: something seems to be up with Bugzilla [00:49:00] * Jasper_Deng can't connect [00:49:23] now I can load it, but it took quite some time [00:51:09] RECOVERY - Parsoid on wtp1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.004 second response time [00:51:12] mutante: ^ 2nd time [00:51:49] New patchset: Jforrester; "Add wikibugs IRC bot to #mediawiki-visualeditor" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37570 [00:52:45] couple of incoming network spikes.. [00:52:46] New review: Jforrester; "Rebased, not that it really matters as wikibugs doesn't follow this currently." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37570 [00:53:22] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51810 [00:54:11] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 637 bytes in 0.000 second response time [00:54:19] RECOVERY - Host search-pool5.svc.eqiad.wmnet is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [00:56:06] New patchset: Ottomata; "Adding puppet Limn module." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/49710 [01:00:31] Anyone feel like +2ing https://gerrit.wikimedia.org/r/#/c/37570/ ? It's not currently used to configure wikibugs, but that hasn't stopped people +2ing similar commits for what we want it to do when it's got working... [01:06:25] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [01:06:55] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [01:08:55] PROBLEM - mysqld processes on db71 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [01:11:11] hey maplebed [01:11:21] evening! [01:11:46] I was hoping someone would be around. [01:11:50] * maplebed -> pm [01:13:01] RoanKattouw: cool, thx dude [01:13:34] Also, wtp1001 is now up [01:13:44] Crap I should adjust its weigth [01:13:51] glad chris got the install handled for ya [01:14:42] Yeah [01:17:11] !log powercycling emery [01:17:17] Logged the message, Master [01:17:21] Reedy: so wfPickRandom can pick something with 0 weight [01:17:26] mutante: BZ is timing out again [01:19:15] Reedy: i see it, but then it's back before you even get to look at it .. and it's not even doing much [01:19:30] SSH hangs also [01:19:37] wfm [01:20:52] we have "sar" setup there [01:21:18] over 90% idle for the last hour .. type sar [01:27:55] RECOVERY - SSH on emery is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [01:28:05] RECOVERY - udp2log log age for aft on emery is OK: OK: all log files active [01:28:06] RECOVERY - udp2log log age for emery on emery is OK: OK: all log files active [01:38:06] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 192 seconds [01:38:25] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 199 seconds [01:47:34] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [01:48:03] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [02:17:51] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [02:27:35] !log LocalisationUpdate completed (1.21wmf10) at Sat Mar 2 02:27:35 UTC 2013 [02:27:41] Logged the message, Master [02:31:41] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [02:33:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:35:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 9.494 second response time [02:41:21] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 181 seconds [02:41:41] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 185 seconds [02:52:41] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 14 seconds [02:53:21] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 1 seconds [03:08:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:26:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.120 second response time [03:31:34] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [03:32:34] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [03:33:04] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [03:41:24] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 186 seconds [03:43:24] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [03:57:14] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:07:04] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 185 seconds [04:07:34] PROBLEM - MySQL Replication Heartbeat on db1020 is CRITICAL: CRIT replication delay 199 seconds [04:07:54] PROBLEM - MySQL Slave Delay on db1020 is CRITICAL: CRIT replication delay 206 seconds [04:07:54] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 210 seconds [04:08:24] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 04:08:21 UTC 2013 [04:08:34] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [04:09:34] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 04:09:29 UTC 2013 [04:09:34] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [04:10:04] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.155 second response time [04:10:34] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 04:10:27 UTC 2013 [04:10:34] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [04:10:34] RECOVERY - MySQL Replication Heartbeat on db1020 is OK: OK replication delay 3 seconds [04:10:54] RECOVERY - MySQL Slave Delay on db1020 is OK: OK replication delay 2 seconds [04:10:55] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 16 seconds [04:11:04] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 17 seconds [04:11:25] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 04:11:21 UTC 2013 [04:11:34] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [04:12:15] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 04:12:11 UTC 2013 [04:12:34] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [04:12:54] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 04:12:44 UTC 2013 [04:13:04] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [04:13:30] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [04:13:42] RECOVERY - MySQL disk space on neon is OK: DISK OK [04:14:00] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [04:26:00] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 189 seconds [04:26:10] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 191 seconds [04:26:20] PROBLEM - MySQL Replication Heartbeat on db36 is CRITICAL: CRIT replication delay 217 seconds [04:26:40] PROBLEM - MySQL Slave Delay on db36 is CRITICAL: CRIT replication delay 234 seconds [04:28:20] RECOVERY - MySQL Replication Heartbeat on db36 is OK: OK replication delay 0 seconds [04:28:40] RECOVERY - MySQL Slave Delay on db36 is OK: OK replication delay 0 seconds [05:46:05] PROBLEM - Apache HTTP on mw1153 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:46:55] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 1.432 second response time [06:05:56] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [06:06:07] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [06:07:32] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [06:07:52] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [06:29:12] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 06:29:07 UTC 2013 [06:29:32] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [06:29:42] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 06:29:39 UTC 2013 [06:29:53] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [06:30:02] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 06:29:59 UTC 2013 [06:30:32] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 06:30:30 UTC 2013 [06:30:52] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [06:31:32] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 06:31:22 UTC 2013 [06:31:33] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [06:31:52] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 06:31:44 UTC 2013 [06:31:52] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [06:32:02] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 06:31:55 UTC 2013 [06:32:32] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [06:32:52] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [06:33:12] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 06:33:07 UTC 2013 [06:33:32] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [06:56:55] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [07:02:55] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [07:19:38] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [07:21:08] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [07:21:28] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [07:55:28] RECOVERY - MySQL disk space on neon is OK: DISK OK [07:55:58] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [08:07:40] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 08:07:35 UTC 2013 [08:08:08] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 08:08:06 UTC 2013 [08:08:30] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [08:08:38] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [08:08:48] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 08:08:43 UTC 2013 [08:08:58] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 08:08:53 UTC 2013 [08:09:28] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [08:09:38] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [08:13:18] PROBLEM - Puppet freshness on mw1048 is CRITICAL: Puppet has not run in the last 10 hours [08:13:18] PROBLEM - Puppet freshness on mw1050 is CRITICAL: Puppet has not run in the last 10 hours [08:15:59] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 08:15:49 UTC 2013 [08:16:38] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [08:32:19] PROBLEM - Puppet freshness on cp1011 is CRITICAL: Puppet has not run in the last 10 hours [08:37:19] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [08:38:19] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [08:39:19] PROBLEM - Puppet freshness on db38 is CRITICAL: Puppet has not run in the last 10 hours [09:11:26] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [09:12:16] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [09:12:36] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 09:12:32 UTC 2013 [09:13:06] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [09:13:16] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [09:13:26] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [09:41:26] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 09:41:16 UTC 2013 [09:41:26] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [09:43:06] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 09:43:01 UTC 2013 [09:43:16] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [10:24:05] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [10:24:25] RECOVERY - MySQL disk space on neon is OK: DISK OK [10:30:26] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 183 seconds [10:31:05] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 198 seconds [11:03:44] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [11:04:14] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [11:09:24] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 30 seconds [11:09:44] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [11:41:30] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [11:41:40] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [12:15:29] RECOVERY - MySQL disk space on neon is OK: DISK OK [12:15:38] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [12:15:38] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [12:15:58] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [12:32:28] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [12:45:38] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 186 seconds [12:45:58] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 192 seconds [12:53:38] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 12:53:34 UTC 2013 [12:54:38] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [13:02:38] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [13:02:58] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [13:05:38] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 13:05:36 UTC 2013 [13:05:58] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [13:37:25] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 185 seconds [13:38:25] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [13:45:50] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [13:46:20] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 197 seconds [13:46:40] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 210 seconds [14:01:40] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 6 seconds [14:02:20] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay 18 seconds [14:29:39] Hi [16:08:09] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 16:08:03 UTC 2013 [16:08:29] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [16:09:08] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 16:09:05 UTC 2013 [16:09:28] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [16:10:08] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 16:10:00 UTC 2013 [16:10:28] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [16:10:58] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 16:10:50 UTC 2013 [16:11:28] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [16:11:46] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 16:11:32 UTC 2013 [16:11:56] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [16:12:26] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [16:12:57] PROBLEM - MySQL Replication Heartbeat on db44 is CRITICAL: CRIT replication delay 214 seconds [16:13:16] PROBLEM - MySQL Slave Delay on db44 is CRITICAL: CRIT replication delay 232 seconds [16:13:46] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [16:14:16] PROBLEM - ircecho_service_running on neon is CRITICAL: Connection refused by host [16:31:16] RECOVERY - MySQL Slave Delay on db44 is OK: OK replication delay 0 seconds [16:31:56] RECOVERY - MySQL Replication Heartbeat on db44 is OK: OK replication delay 0 seconds [16:57:46] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [16:59:16] RECOVERY - ircecho_service_running on neon is OK: PROCS OK: 3 processes with args ircecho [16:59:45] RECOVERY - MySQL disk space on neon is OK: DISK OK [17:03:35] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [17:55:50] PROBLEM - Host knsq24 is DOWN: PING CRITICAL - Packet loss = 100% [17:55:50] PROBLEM - Host knsq16 is DOWN: PING CRITICAL - Packet loss = 100% [17:56:40] PROBLEM - Host hooft is DOWN: PING CRITICAL - Packet loss = 100% [17:57:20] RECOVERY - Host knsq16 is UP: PING WARNING - Packet loss = 80%, RTA = 90.51 ms [17:57:30] RECOVERY - Host hooft is UP: PING OK - Packet loss = 0%, RTA = 88.60 ms [17:57:40] RECOVERY - Host knsq24 is UP: PING OK - Packet loss = 0%, RTA = 90.46 ms [18:14:02] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [18:14:32] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [18:14:32] PROBLEM - Puppet freshness on mw1048 is CRITICAL: Puppet has not run in the last 10 hours [18:14:32] PROBLEM - Puppet freshness on mw1050 is CRITICAL: Puppet has not run in the last 10 hours [18:17:52] PROBLEM - Host bits.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [18:18:12] PROBLEM - Host mediawiki-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [18:18:22] RECOVERY - Host bits.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 162.76 ms [18:18:42] RECOVERY - Host mediawiki-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 83.95 ms [18:32:32] PROBLEM - Puppet freshness on cp1011 is CRITICAL: Puppet has not run in the last 10 hours [18:37:32] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [18:38:32] PROBLEM - Puppet freshness on hume is CRITICAL: Puppet has not run in the last 10 hours [18:39:33] PROBLEM - Puppet freshness on db38 is CRITICAL: Puppet has not run in the last 10 hours [18:51:04] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 188 seconds [18:51:14] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 190 seconds [18:56:04] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [18:56:14] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [19:30:02] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 185 seconds [19:30:12] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 186 seconds [19:45:01] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [19:45:11] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [19:45:46] New review: Jeremyb; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51814 [19:52:01] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 182 seconds [19:52:01] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 183 seconds [20:02:03] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:02:03] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:06:01] PROBLEM - MySQL Slave Delay on db1020 is CRITICAL: CRIT replication delay 192 seconds [20:06:11] PROBLEM - MySQL Replication Heartbeat on db1020 is CRITICAL: CRIT replication delay 201 seconds [20:08:21] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 20:08:19 UTC 2013 [20:09:01] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:09:31] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:09:28 UTC 2013 [20:10:01] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:10:21] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 20:10:14 UTC 2013 [20:11:02] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:11:21] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:11:11 UTC 2013 [20:12:02] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:12:02] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 20:11:54 UTC 2013 [20:12:02] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:11:57 UTC 2013 [20:13:01] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:13:01] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:13:31] RECOVERY - Puppet freshness on cp1011 is OK: puppet ran at Sat Mar 2 20:13:24 UTC 2013 [20:13:31] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 20:13:28 UTC 2013 [20:13:31] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:13:30 UTC 2013 [20:13:42] RECOVERY - Puppet freshness on knsq17 is OK: puppet ran at Sat Mar 2 20:13:34 UTC 2013 [20:14:02] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:14:02] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:14:23] RECOVERY - Puppet freshness on hume is OK: puppet ran at Sat Mar 2 20:14:14 UTC 2013 [20:14:41] RECOVERY - Puppet freshness on mw1050 is OK: puppet ran at Sat Mar 2 20:14:35 UTC 2013 [20:14:41] RECOVERY - Puppet freshness on db38 is OK: puppet ran at Sat Mar 2 20:14:37 UTC 2013 [20:14:41] RECOVERY - Puppet freshness on mw1048 is OK: puppet ran at Sat Mar 2 20:14:40 UTC 2013 [20:15:11] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 20:15:02 UTC 2013 [20:15:12] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:15:04 UTC 2013 [20:16:01] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:16:02] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:16:31] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 20:16:27 UTC 2013 [20:16:32] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:16:29 UTC 2013 [20:17:01] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:17:02] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:17:41] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 20:17:30 UTC 2013 [20:17:41] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:17:32 UTC 2013 [20:18:01] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:18:02] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:18:21] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 20:18:17 UTC 2013 [20:18:21] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:18:18 UTC 2013 [20:18:54] <3 puppet freshness [20:19:01] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:19:01] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:19:01] RECOVERY - Puppet freshness on spence is OK: puppet ran at Sat Mar 2 20:18:54 UTC 2013 [20:19:01] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:18:55 UTC 2013 [20:20:01] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [20:20:02] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:38:21] PROBLEM - Host amssq37 is DOWN: PING CRITICAL - Packet loss = 100% [20:38:21] PROBLEM - Host amssq32 is DOWN: PING CRITICAL - Packet loss = 100% [20:38:21] PROBLEM - Host knsq20 is DOWN: PING CRITICAL - Packet loss = 100% [20:38:21] PROBLEM - Host knsq19 is DOWN: PING CRITICAL - Packet loss = 100% [20:38:21] PROBLEM - Host mediawiki-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [20:38:22] PROBLEM - Host knsq26 is DOWN: PING CRITICAL - Packet loss = 100% [20:38:22] PROBLEM - Host knsq18 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:01] PROBLEM - Host amssq31 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:01] PROBLEM - Host ms6 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:01] PROBLEM - Host amslvs4 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:01] PROBLEM - Host amssq57 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:01] PROBLEM - Host amssq36 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:11] PROBLEM - Host wiktionary-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [20:39:12] PROBLEM - Host amssq53 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:21] PROBLEM - Host ssl3003 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:21] PROBLEM - Host knsq17 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:21] PROBLEM - Host knsq23 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:21] PROBLEM - Host nescio is DOWN: PING CRITICAL - Packet loss = 100% [20:39:31] PROBLEM - Host wikimedia-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [20:39:31] PROBLEM - Host amslvs1 is DOWN: PING CRITICAL - Packet loss = 100% [20:39:32] PROBLEM - Host wikipedia-lb.esams.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [20:40:01] PROBLEM - Host 91.198.174.6 is DOWN: PING CRITICAL - Packet loss = 100% [20:40:02] PROBLEM - Host foundation-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [20:40:11] PROBLEM - Host wikimedia-lb.esams.wikimedia.org_https is DOWN: CRITICAL - Plugin timed out after 15 seconds [20:40:12] PROBLEM - Host wiktionary-lb.esams.wikimedia.org_https is DOWN: CRITICAL - Plugin timed out after 15 seconds [20:40:21] RECOVERY - Host amslvs1 is UP: PING OK - Packet loss = 0%, RTA = 100.48 ms [20:40:21] RECOVERY - Host wikipedia-lb.esams.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 100.50 ms [20:40:21] RECOVERY - Host ms6 is UP: PING WARNING - Packet loss = 93%, RTA = 103.71 ms [20:40:21] PROBLEM - Host mediawiki-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [20:40:31] RECOVERY - Host wiktionary-lb.esams.wikimedia.org is UP: PING WARNING - Packet loss = 73%, RTA = 100.22 ms [20:40:32] RECOVERY - Host 91.198.174.6 is UP: PING WARNING - Packet loss = 73%, RTA = 99.48 ms [20:40:51] PROBLEM - Host wikibooks-lb.esams.wikimedia.org_https is DOWN: PING CRITICAL - Packet loss = 100% [20:41:01] RECOVERY - Puppet freshness on db27 is OK: puppet ran at Sat Mar 2 20:40:50 UTC 2013 [20:41:01] PROBLEM - Puppet freshness on db27 is CRITICAL: Puppet has not run in the last 10 hours [20:41:31] RECOVERY - Host wikimedia-lb.esams.wikimedia.org is UP: PING WARNING - Packet loss = 93%, RTA = 104.22 ms [20:41:31] RECOVERY - Host amssq32 is UP: PING WARNING - Packet loss = 93%, RTA = 104.39 ms [20:42:01] RECOVERY - Host amssq31 is UP: PING WARNING - Packet loss = 93%, RTA = 103.51 ms [20:42:11] RECOVERY - Host ssl3003 is UP: PING OK - Packet loss = 0%, RTA = 96.85 ms [20:42:31] RECOVERY - Host knsq20 is UP: PING WARNING - Packet loss = 80%, RTA = 101.96 ms [20:42:31] RECOVERY - Host amssq53 is UP: PING WARNING - Packet loss = 50%, RTA = 93.15 ms [20:42:51] RECOVERY - Host knsq26 is UP: PING OK - Packet loss = 0%, RTA = 92.87 ms [20:42:52] RECOVERY - Host amssq37 is UP: PING OK - Packet loss = 0%, RTA = 93.52 ms [20:42:52] RECOVERY - Host knsq19 is UP: PING OK - Packet loss = 0%, RTA = 93.51 ms [20:42:52]