[00:22:39] New patchset: Ori.livneh; "Fix contact link HTML" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59021 [00:37:06] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59021 [00:40:39] !log rebooting labstore3 -> silly lockd module [00:43:41] Ryan_Lane: filed bug upstream (in case you wanna vote it up) https://code.google.com/p/gerrit/issues/detail?id=1865 [00:43:58] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [00:44:17] Krinkle: probably good for chad and christian to know [00:44:21] http://activity.openstack.org/data/display/OPNSTK2/OpenStack+Compute+%28Nova%29+-+Activity+Dashboard [00:44:27] http://activity.openstack.org/ [00:44:31] ^^ pretty interesting site [00:45:08] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 26.57 ms [00:45:42] http://activity.openstack.org/dash/ [01:09:46] !log rebooted labstore3 again -> we hates lockd. We hates it! [01:10:00] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [01:12:10] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 26.57 ms [01:55:57] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [01:55:57] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [01:55:57] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [02:11:56] PROBLEM - Puppet freshness on gallium is CRITICAL: No successful Puppet run in the last 10 hours [02:12:27] !log LocalisationUpdate completed (1.22wmf1) at Sat Apr 13 02:12:27 UTC 2013 [02:40:48] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [02:52:38] RECOVERY - Parsoid on wtp1004 is OK: HTTP OK: HTTP/1.1 200 OK - 1368 bytes in 8.870 second response time [02:55:38] PROBLEM - Parsoid on wtp1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:13:20] PROBLEM - Puppet freshness on db1058 is CRITICAL: No successful Puppet run in the last 10 hours [04:28:30] PROBLEM - DPKG on labstore3 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [04:31:40] PROBLEM - NTP on cp1041 is CRITICAL: NTP CRITICAL: Offset unknown [04:36:16] RECOVERY - DPKG on labstore3 is OK: All packages OK [04:37:56] RECOVERY - NTP on cp1041 is OK: NTP OK: Offset 0.1442428827 secs [04:44:45] New review: MZMcBride; "I think the commit message here is slightly misleading. When true, this variable outputs the content..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57649 [04:53:56] !log rebooted labstore3 again; kernel and lvm2 upgrade [04:54:10] New review: MZMcBride; "If "MediaWiki:Contact-url" doesn't exist, the HTML output will look like this:" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57649 [04:55:56] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [05:06:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:07:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [05:17:10] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 27.02 ms [05:19:10] PROBLEM - DPKG on labstore3 is CRITICAL: Connection refused by host [05:19:20] PROBLEM - RAID on labstore3 is CRITICAL: Connection refused by host [05:19:20] PROBLEM - Disk space on labstore3 is CRITICAL: Connection refused by host [05:19:30] PROBLEM - SSH on labstore3 is CRITICAL: Connection refused [05:27:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [05:31:30] PROBLEM - NTP on labstore3 is CRITICAL: NTP CRITICAL: No response from NTP server [05:56:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:57:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [06:24:48] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [07:37:36] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [08:28:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:29:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [09:18:23] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [09:29:40] New patchset: Krinkle; "noc: Add missing entries to createTxtFileSymlinks.sh" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59033 [09:29:41] New patchset: Krinkle; "noc: Refactor highlight.php to be simpler and less more secure" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [09:46:50] New review: Faidon; "You probably already thought of this but just in case: this will remove them from puppet but won't a..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58844 [10:11:40] PROBLEM - Puppet freshness on search1015 is CRITICAL: No successful Puppet run in the last 10 hours [10:13:40] PROBLEM - Puppet freshness on search1016 is CRITICAL: No successful Puppet run in the last 10 hours [10:41:46] New patchset: Krinkle; "noc: Refactor highlight.php to be simpler and less more secure" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [10:44:15] New patchset: Krinkle; "noc: Refactor highlight.php to be simpler and less more secure" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [11:05:27] New patchset: Krinkle; "noc: Refactor highlight.php to be simpler and less more secure" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [11:06:22] New review: Krinkle; "Added unit tests." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [11:06:52] New review: Krinkle; "Use PHP_SAPI as check instead of duck typing PHPUnit environment." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [11:56:00] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [11:56:00] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [11:56:00] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [11:56:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:56:40] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [11:57:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.108 second response time [12:01:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:02:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [12:02:40] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [12:07:06] New review: Hashar; "And the poor #wikimedia-dev will end up being spammed :( I guess I will hang out in #wikimedia-tech..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57752 [12:10:25] New review: Peachey88; "You can /ignore them the same in #wikimedia-dev the same as you do in #mediawiki" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57752 [12:12:25] PROBLEM - Puppet freshness on gallium is CRITICAL: No successful Puppet run in the last 10 hours [12:37:06] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [12:41:46] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [13:03:06] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [13:24:06] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [13:45:13] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [13:55:43] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 26.85 ms [14:03:33] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [14:14:00] PROBLEM - Puppet freshness on db1058 is CRITICAL: No successful Puppet run in the last 10 hours [14:26:30] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [14:33:47] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [14:40:07] RECOVERY - DPKG on labstore3 is OK: All packages OK [14:40:17] RECOVERY - SSH on labstore3 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [14:40:17] RECOVERY - Disk space on labstore3 is OK: DISK OK [14:40:27] RECOVERY - RAID on labstore3 is OK: OK: State is Optimal, checked 1 logical device(s) [14:43:57] RECOVERY - NTP on labstore3 is OK: NTP OK: Offset -7.176399231e-05 secs [14:48:07] PROBLEM - DPKG on labstore3 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:00:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:02:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [15:11:12] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [15:11:23] !log Fixed boot problem on labstore3; back to operating. [15:11:32] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [15:39:41] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [16:03:41] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [16:12:48] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [16:24:58] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [16:32:48] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [16:47:26] BBIAB [17:25:07] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:33:06] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [17:38:13] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [17:50:33] RECOVERY - DPKG on labstore3 is OK: All packages OK [17:56:43] New patchset: Liangent; "(bug 47204) Remove zh-mo from $wgDisabledVariants for zhwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59055 [18:20:28] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [18:33:32] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [18:54:54] New patchset: Ori.livneh; "Monitor MediaWiki fatals and exceptions in Ganglia" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59059 [19:04:00] New review: Ori.livneh; "This setup is currently running on vanadium, unpuppetized." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59059 [19:18:33] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [19:38:28] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [20:03:55] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [20:11:45] PROBLEM - Puppet freshness on search1015 is CRITICAL: No successful Puppet run in the last 10 hours [20:13:45] PROBLEM - Puppet freshness on search1016 is CRITICAL: No successful Puppet run in the last 10 hours [20:30:35] RECOVERY - Parsoid on wtp1004 is OK: HTTP OK: HTTP/1.1 200 OK - 1368 bytes in 3.473 second response time [20:33:45] PROBLEM - Parsoid on wtp1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:55:54] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [21:03:55] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [21:24:55] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 1 process with command name varnishncsa [21:33:55] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [21:43:58] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [21:56:08] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [21:56:08] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [21:56:08] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [22:02:58] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [22:12:58] PROBLEM - Puppet freshness on gallium is CRITICAL: No successful Puppet run in the last 10 hours [22:16:18] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [22:33:18] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [22:41:54] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [23:53:44] !log rebooting labstore3. autofs is being silly [23:55:35] PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100% [23:56:45] RECOVERY - Host labstore3 is UP: PING OK - Packet loss = 0%, RTA = 26.55 ms