[00:07:04] what is the sense of having the ops maillist archive private when the mails in it are being archived by public archivers? [00:07:26] well it would delay things [00:07:50] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 00:07:43 UTC 2013 [00:08:10] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [00:08:30] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 00:08:24 UTC 2013 [00:09:10] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [00:09:11] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 00:09:09 UTC 2013 [00:10:11] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [00:10:38] Prodego: how do you mean that? [00:10:40] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Puppet has not run in the last 10 hours [00:11:10] Danny_B: I assume the archivers don't archive instantly [00:16:20] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:16:48] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:18:11] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 7.483 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [00:18:26] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.049 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [00:20:12] Prodego: eg. mail-archive has up to date wikitech-l [00:27:18] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [00:29:09] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 5.015 seconds response time. nagiostest.beta.wmflabs.org returns [00:34:19] RECOVERY - Parsoid on cerium is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.007 second response time [00:34:53] RECOVERY - Parsoid on cerium is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.057 seconds [00:38:12] Danny_B, it is? link? [00:45:26] RobH: Seeing as you're on duty this week, would you be able to find someone who could review https://gerrit.wikimedia.org/r/#/c/44164/ ? Antoine and I have +1ed it but we don't have +2 in puppet [00:45:58] I can give it a shot, but most folks are traveling to Fosdem [00:46:03] Yeah, I know [00:46:38] RoanKattouw, solution, Mediawiki:Gerrit project ownership or whatever it is :) [00:46:53] I don't necessarily want +2 on puppet :) [00:47:25] heh [00:51:46] +2 on puppet cannot get anyone anything but trouble =P [00:56:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [01:26:36] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [01:27:07] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [01:27:13] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [01:27:22] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [01:33:59] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [01:34:04] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [01:34:45] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.49 ms [01:34:53] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [01:47:55] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [01:56:14] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [01:56:45] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [02:04:55] New patchset: Dereckson; "Maintenance for http://fr.planet.wikimedia.org/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47047 [02:05:13] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [02:09:43] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [02:13:14] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [02:16:04] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 3.692 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [02:20:19] New patchset: Dereckson; "Fixed a small typo in Planet config files." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47048 [02:27:10] !log LocalisationUpdate completed (1.21wmf8) at Fri Feb 1 02:27:09 UTC 2013 [02:27:11] Logged the message, Master [02:50:13] PROBLEM - HTTP on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:51:02] RECOVERY - HTTP on formey is OK: HTTP OK: HTTP/1.1 200 OK - 3596 bytes in 0.056 second response time [02:58:58] secure.wm.org no longer works? [02:59:41] works for me [02:59:59] hmm, can't create the url for wikimania 2013 [03:00:09] could you suggest pls? [03:00:40] why would you want to make links to it? it only exists for legacy URLs, as a redirect service [03:00:57] just use https://wikimania2013.wikimedia.org/ [03:01:21] enough for me to know it is redir only [03:01:34] so i can delete some stuff from common.js [03:03:15] https://secure.wikimedia.org/wikipedia/wikimania2013/ [03:15:44] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [03:16:19] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:17:58] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 247 bytes in 0.054 seconds [03:36:17] bad kibble [03:36:22] :o ! [03:36:27] * kibble is good kibble. [03:41:33] ha, Reedy! [03:41:43] You're no better!! [03:42:25] how's the batch move, Reedy ? [03:42:30] Dunno [03:46:15] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:48:11] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [03:50:06] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 5.439 seconds response time. nagiostest.beta.wmflabs.org returns [03:51:38] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.041 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [03:54:11] PROBLEM - HTTP on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:50] RECOVERY - HTTP on formey is OK: HTTP OK HTTP/1.1 200 OK - 3596 bytes in 0.006 seconds [04:15:11] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:16:50] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 6.947 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [04:23:41] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [04:29:54] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:31:33] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 7.471 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [05:06:05] paravoid: andrewbogott_afk: labs pmtpa bastion's broken. refusing my loging [05:06:08] login* [05:06:14] Permission denied (publickey). [05:06:19] (see also labs-l) [05:09:21] ssmollett: ^ [05:20:11] andrewbogott: ping [05:20:30] jeremyb: broken! [05:20:41] andrewbogott: yah :) [05:20:54] I spent a while on this earlier but didn't come up with much… having another go now. [05:21:10] i see there was some scrollback earlier in #-labs [05:21:20] anyway, wanted to make sure someone knew at least [05:21:25] * jeremyb heads to sleep :) [05:21:37] Thanks. Most ops are traveling today, unfortunately. [05:21:41] orly [05:21:43] allhands? [05:22:11] fosdem [05:22:32] ohhhhh [05:22:41] i didn't realize so many people were going [05:22:41] I have a reasonably good idea of what's broken, just not of how to fix it. [05:36:38] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:38:28] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 3.570 seconds response time. nagiostest.beta.wmflabs.org returns [05:39:56] paravoid, are you awake by chance? [05:47:06] LeslieCarr, how about you? [05:54:29] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 181 seconds [05:54:38] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 183 seconds [05:55:42] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 197 seconds [05:56:00] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 201 seconds [05:56:44] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [06:06:24] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [06:06:54] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [06:11:46] New patchset: Andrew Bogott; "Up ulimit for glusterd again" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47054 [06:12:22] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47054 [06:20:55] PROBLEM - NTP on mw1085 is CRITICAL: NTP CRITICAL: Offset unknown [06:25:54] RECOVERY - NTP on mw1085 is OK: NTP OK: Offset 0.001241207123 secs [06:28:34] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 06:28:30 UTC 2013 [06:28:45] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [06:28:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 06:28:44 UTC 2013 [06:29:45] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [06:37:31] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [06:39:39] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [06:40:10] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:41:01] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 3.671 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [06:45:21] PROBLEM - HTTP on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:47:00] RECOVERY - HTTP on formey is OK: HTTP OK HTTP/1.1 200 OK - 3596 bytes in 0.011 seconds [06:47:31] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [06:47:40] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [06:47:45] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [06:48:04] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [06:56:10] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:57:00] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 5.008 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [07:02:50] PROBLEM - HTTP on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:03:41] RECOVERY - HTTP on formey is OK: HTTP OK: HTTP/1.1 200 OK - 3596 bytes in 0.055 second response time [07:04:30] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [07:06:57] PROBLEM - HTTP on formey is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:08:11] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [07:08:36] RECOVERY - HTTP on formey is OK: HTTP OK HTTP/1.1 200 OK - 3596 bytes in 0.005 seconds [07:15:59] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:16:49] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:16:59] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 6.520 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [07:17:39] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.002 second response time [07:22:29] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 182 seconds [07:22:33] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 182 seconds [07:22:39] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 183 seconds [07:22:42] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 183 seconds [07:24:21] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [07:24:29] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [07:24:39] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [07:25:00] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [07:25:49] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:26:00] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [07:26:39] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [07:26:59] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 9.123 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.153.219 [07:33:09] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 183 seconds [07:33:12] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 184 seconds [07:33:19] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 190 seconds [07:34:15] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 209 seconds [07:35:09] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [07:35:24] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [07:35:54] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [07:36:39] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 1 seconds [07:45:39] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [07:45:39] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [07:45:39] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [07:45:39] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [07:45:39] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [07:47:36] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [08:20:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 08:07:39 UTC 2013 [08:20:13] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [08:20:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 08:08:04 UTC 2013 [08:20:13] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [08:20:14] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [08:21:21] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 08:21:12 UTC 2013 [08:22:50] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [08:27:21] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [08:29:00] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.048 seconds response time. nagiostest.beta.wmflabs.org returns [08:38:02] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 181 seconds [08:38:02] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 181 seconds [08:38:45] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 205 seconds [08:38:54] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 207 seconds [08:40:01] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [08:40:02] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [08:40:33] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [08:40:33] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [09:24:22] helllo [09:25:24] New patchset: Hashar; "contint: install mercurial package on gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/46931 [09:28:33] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Puppet has not run in the last 10 hours [09:33:08] New patchset: Hashar; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [09:38:18] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 191 seconds [09:39:17] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [10:00:54] New patchset: Hashar; "insert 'realm' in role::cache::configuration::active_nodes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47067 [10:01:25] New patchset: Hashar; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [10:01:42] New review: Hashar; "Rebased on top of https://gerrit.wikimedia.org/r/47067" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44709 [10:03:16] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [10:04:01] mark: hi :-] So the role::cache::configuration::active_nodes missed the realm. I have added it in with https://gerrit.wikimedia.org/r/47067 [10:04:05] andrebased my infamous patchset [10:05:03] New review: Hashar; "Makes puppet happier for the wikimedia frontend configuration:" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/47067 [10:10:20] bahhh [10:11:03] I lost my instance [10:11:34] deployment-varnish-t3 login: Feb 1 10:07:14 deployment-varnish-t3 nslcd[1074]: [a1deaa] error writing to client: Broken pipe [10:11:35] Feb 1 10:07:14 deployment-varnish-t3 nslcd[1074]: [c6c33a] error writing to client: Broken pipe [10:11:36] Feb 1 10:07:14 deployment-varnish-t3 nslcd[1074]: [e685fb] error writing to client: Broken pipe [10:11:36] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Puppet has not run in the last 10 hours [10:11:37] youhouu [10:13:46] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:14:36] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [10:21:53] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [10:21:54] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [10:46:44] New patchset: Hashar; "insert 'realm' in role::cache::configuration::active_nodes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47067 [10:46:44] New patchset: Hashar; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [10:47:44] New review: Hashar; "* removed an unrelated template (labs-upload.conf)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/47067 [10:59:21] ah [10:59:37] the varnish backends use an LVS entry as a backend and there is none in labs :-D [10:59:39] *grin* [11:12:07] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [11:35:48] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [11:36:37] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [11:44:47] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:45:35] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.002 second response time [11:49:23] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [12:04:38] !log authdns update adding db1051-60 to zone files [12:04:41] Logged the message, Master [12:07:45] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 12:07:38 UTC 2013 [12:08:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [12:08:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 12:08:12 UTC 2013 [12:09:05] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [12:09:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 12:09:10 UTC 2013 [12:10:05] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [12:10:35] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [12:46:20] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [12:47:22] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [13:06:20] New patchset: JanZerebecki; "replace the ugly HTML redirect from the old planet with a proper HTTP redirect ( RT-4410 )" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47073 [13:11:52] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:12:42] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [13:17:05] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47073 [13:20:55] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:21:40] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.004 second response time [13:22:12] New patchset: Hashar; "(bug 44251) hardcode $wgDBuser = 'wikiuser'" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/47074 [13:29:51] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [13:30:40] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [13:33:09] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 189 seconds [13:33:36] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 201 seconds [13:33:40] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 203 seconds [13:33:50] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 210 seconds [13:45:50] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 19 seconds [13:45:54] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 22 seconds [13:46:40] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [13:47:15] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [13:56:33] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [14:00:27] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 188 seconds [14:00:43] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 198 seconds [14:00:45] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 196 seconds [14:01:03] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 208 seconds [14:09:04] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 19 seconds [14:09:09] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 19 seconds [14:09:43] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 5 seconds [14:09:45] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 9 seconds [14:24:33] jzerebecki: congrats on resolving an RT ticket:) thanks [14:25:35] yay! thank you. [14:33:46] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:35] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [14:45:12] !log authdns update "adding mw1161-1200 to eqiad mgmt and production zone files [14:45:15] Logged the message, Master [14:47:46] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:48:17] apergos: ^^ ? [14:48:32] * jeremyb isn't really up to date... not sure if that's important [14:48:36] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [14:48:40] I see those, and I have no idea about them [14:49:55] hey [14:50:05] just logged in, don't have much time [14:50:10] ceph cluster crapped itself out again [14:50:11] sigh [14:50:18] ah hello [14:50:25] shall be safe to ignore [14:50:26] what do I need to do for these cases (and how can you tell)? [14:50:50] ignore until I send an email explaining our architecture, basic debugging steps etc. [14:50:54] heh [14:50:55] ok then [14:51:00] which should be before we put it into production [14:51:10] good idea :-D [14:51:28] are you in brussels then? [14:51:33] if we put into prod and I haven't done that, feel free to call me at ungodly hours and scream at me :) [14:51:37] yes [14:52:04] ah great, how was the trip? [14:52:19] enjoy the beer faidon :-] [14:52:38] tiring [14:52:42] too early [14:52:45] ugh [14:52:55] hope you get some rest time [14:53:10] that's what I plan to do now [14:53:26] see you [14:53:43] so, how do we know it's not swift that broke? [14:53:56] or can we? (from just reading the msg above) [14:55:11] sleep well! [14:59:25] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [15:00:15] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [15:05:02] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [15:07:23] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [15:08:23] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:17:02] hiiiiii morning cmjohnson1 [15:17:11] ho ottomata [15:17:12] pIInnnnnggggggg on analytisc1007 [15:17:14] that is all! [15:17:21] heh [16:42:23] ok another q [16:42:23] cmjohnson1 (and maybe mark) [16:42:23] I just asked on this RT: [16:42:23] https://rt.wikimedia.org/Ticket/Display.html?id=4328 [16:42:23] analytics could do with a crappy machine for a bastion host [16:42:23] analytics1000 [16:42:23] or osmething [16:42:23] right now our public IP is on analytics1001 [16:42:23] which is a beefy cisco, [16:42:23] can we use db42 for that? [16:42:23] I need to reinstall OS on analytics1001 soon (probably today) [16:42:23] so I could move the IP as part of that pocess [16:42:23] process [16:42:24] i suggest to rename db42 to 'lair' if it becomes a bastion box :) [16:42:24] <^demon> Or maybe bastNNNN like bast1001. Similar names are nice :) [16:42:24] ottomata: i don't see why we couldn't but let's wait for mark or robh to confirm that it is ok [16:42:24] cool [16:42:24] their one off ibm servers [16:42:24] drdee: do you know dkg? [16:42:24] yeah anything will do [16:42:24] he IRCs from a box named lair [16:42:24] iirc [16:42:24] jeremby: nope [16:42:24] * jeremyb wonders who jeremby is :P [16:42:24] your twin brother :D [16:42:25] <^demon> An evil twin? [16:42:25] * jeremyb will bbl [16:42:25] bizarro jb [16:42:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:42:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:42:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.570 second response time [16:42:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.201 seconds [16:50:54] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 187 seconds [16:50:54] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 190 seconds [16:50:54] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 191 seconds [16:50:55] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 196 seconds [16:50:56] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 21 seconds [16:50:56] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:50:56] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [16:50:56] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [16:50:56] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [16:50:56] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [16:51:11] PROBLEM - LVS Lucene on search-pool4.svc.eqiad.wmnet is CRITICAL: Connection timed out [16:53:02] RECOVERY - LVS Lucene on search-pool4.svc.eqiad.wmnet is OK: TCP OK - 0.000 second response time on port 8123 [16:53:07] Change abandoned: Alex Monk; "This event is over..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/46547 [17:04:31] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [17:06:41] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [17:07:40] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [17:20:50] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:21:39] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [17:25:59] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [17:35:49] PROBLEM - MySQL Recent Restart on db1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:36:38] PROBLEM - MySQL Recent Restart on db1011 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:36:40] RECOVERY - MySQL Recent Restart on db1011 is OK: OK 370 seconds since restart [17:38:17] RECOVERY - MySQL Recent Restart on db1011 is OK: OK 460 seconds since restart [17:43:33] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 188 seconds [17:43:40] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 192 seconds [17:43:53] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 198 seconds [17:44:02] New patchset: Micha? ?azowik; "Wikidata language code subdomain redirect to ItemByTitle special page" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/47088 [17:44:52] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 218 seconds [17:45:10] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:47:07] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [17:47:07] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [17:47:07] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [17:47:07] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [17:47:08] PROBLEM - Puppet freshness on vanadium is CRITICAL: Puppet has not run in the last 10 hours [17:48:37] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 247 bytes in 0.054 seconds [17:49:00] .... [17:49:04] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [17:49:09] New patchset: Micha? ?azowik; "Wikidata language code subdomain redirect to ItemByTitle special page" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/47088 [17:49:11] come to check about page, get clear page. [17:49:16] easiest alert ever. [17:50:29] New patchset: Micha? ?azowik; "Wikidata language code subdomain redirect to ItemByTitle special page" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/47088 [17:51:33] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [17:51:54] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [17:51:55] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [17:51:59] New patchset: Micha? ?azowik; "Wikidata language code subdomain redirect to ItemByTitle special page" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/47088 [17:52:40] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [18:03:03] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:03:53] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [18:08:53] PROBLEM - Puppet freshness on labstore2 is CRITICAL: Puppet has not run in the last 10 hours [18:08:58] New review: Denny Vrandecic; "This really means "Looks good to me", i.e. it seems to do what it should, i.e. rewriting http://en.w..." [operations/apache-config] (master) C: 1; - https://gerrit.wikimedia.org/r/47088 [18:12:03] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:12:53] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [18:12:54] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [18:13:50] RoanKattouw: So I tested your apache config locally [18:13:53] and it does indeed work [18:13:58] so im gonna +2/merge your shit [18:15:22] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44164 [18:15:40] New review: RobH; "tested apache config stuff, works, reviewed rest, seems legit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44164 [18:15:56] RobH: Thanks man [18:16:44] its merged, want me to push a puppet update on the CI server now so we can see if it breaks it? [18:17:08] ....gallium is slow. [18:17:43] and puppet is processing on gallium right now, explains why its slow. [18:19:24] <^demon> gallium needs a reboot too :\ [18:20:09] ^demon: does it? [18:20:13] i can reboot it now if it needs it. [18:20:19] <^demon> *** System restart required *** [18:20:27] meh. [18:20:36] <^demon> Antoine's not around, so I'm leery of doing it w/o him. [18:20:38] I read that 'guillaume needa a reboot too' [18:20:40] <^demon> Afraid Zuul will freak out. [18:20:43] guess it's time for a break [18:20:48] yea we will wait then. [18:20:55] puppet update applying for the ci update. [18:21:06] And right there, I'm happy that no one has root on me. [18:21:25] eh yup [18:21:27] <^demon> sudo -u guillom reboot [18:21:47] s/reboot/make me a sandwich/ ? [18:22:02] <^demon> make[1]: No target "me" found. [18:22:05] guillom:~# sudo /etc/init.d/network stop [18:22:09] * RobH watches guillom go offline [18:22:27] I probably should, considering it's 7:30pm here [18:22:30] hehe [18:22:37] i gave you a perfect exit line! [18:22:38] yeah [18:22:38] ;] [18:22:45] * apergos goes foraging for food [18:23:49] RoanKattouw: those changes are live on gallium (atleast the apache changes and such are live, i didnt test the actual tests) [18:25:08] <^demon> RobH: 18:24:50 up 99 days, 2:41, 2 users, load average: 1.12, 2.47, 2.79 [18:26:48] ^demon: thats never not good right? ;P [18:35:04] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [18:35:52] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [18:40:26] hey guys, labs question - the kripke instance seems to be down, ssh-ing into it doesn't work because I can't reach bastion [18:41:32] and more importantly, something odd seems to be happening to DNS because http://reportcard.wmflabs.org isn't accessible from there but http://208.80.153.208/ is [18:42:09] milimetric, ask in #wikimedia-labs maybe [18:42:12] they care more over there [18:42:14] heheh [18:42:33] thank you :) [18:52:00] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:52:45] woosters: who is Ryan's backup on labs stuff? [18:52:50] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [18:53:22] robla: andrewbogott handled some fun disasters yesterday [18:54:11] milimetric: I see the dns problem too; that's next on my list after the problem I'm troubleshooting now :) [18:54:38] whee! thanks Andrew [18:56:02] andrewbogott: This isn't the same problem I had yesterday, where my IP address got disassociated...is it? [18:56:14] doubt it, but may have a common underlying cause [18:56:33] Hm, maybe it's returned. [19:06:19] heya woosters, if you got a sec, whatcha think? [19:06:19] https://rt.wikimedia.org/Ticket/Display.html?id=4469 [19:06:21] s'ok [19:06:21] ? [19:16:00] PROBLEM - Parsoid Varnish on titanium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:01] PROBLEM - Parsoid on mexia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:38] PROBLEM - Parsoid Varnish on titanium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:17:32] PROBLEM - Parsoid on mexia is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:07] robla - andrewbogott and mikewang will provide support [19:20:32] ottomata - will review it and get back to u [19:21:50] RECOVERY - Parsoid on mexia is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.054 second response time [19:22:48] RECOVERY - Parsoid on mexia is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.005 seconds [19:24:01] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [19:24:57] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.002 second response time [19:25:00] k danke [19:25:36] New patchset: Catrope; "One more s/praseodymium/titanium/ for Parsoid deployment stuff" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47097 [19:25:44] RobH: ---^^ [19:29:23] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Puppet has not run in the last 10 hours [19:31:56] PROBLEM - Parsoid Varnish on cerium is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable [19:31:56] PROBLEM - Parsoid Varnish on cerium is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 676 bytes in 0.028 second response time [19:32:24] PROBLEM - Parsoid Varnish on titanium is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable [19:32:27] PROBLEM - Parsoid Varnish on titanium is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 674 bytes in 0.029 second response time [19:32:56] PROBLEM - Parsoid Varnish on celsus is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 676 bytes in 1.088 second response time [19:33:27] PROBLEM - Parsoid on tola is CRITICAL: Connection refused [19:33:35] PROBLEM - Parsoid Varnish on celsus is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable [19:33:35] PROBLEM - Parsoid Varnish on constable is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable [19:33:57] PROBLEM - Parsoid Varnish on constable is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 674 bytes in 0.053 second response time [19:33:57] PROBLEM - Parsoid on mexia is CRITICAL: Connection refused [19:33:57] PROBLEM - Parsoid on cerium is CRITICAL: Connection refused [19:33:57] PROBLEM - Parsoid on constable is CRITICAL: Connection refused [19:33:57] PROBLEM - LVS HTTP IPv4 on parsoid.svc.pmtpa.wmnet is CRITICAL: Connection refused [19:33:57] PROBLEM - Parsoid on wtp1001 is CRITICAL: Connection refused [19:33:57] PROBLEM - Parsoid on kuo is CRITICAL: Connection refused [19:34:06] PROBLEM - LVS HTTP IPv4 on parsoid.svc.eqiad.wmnet is CRITICAL: Connection refused [19:34:07] PROBLEM - Parsoid on celsus is CRITICAL: Connection refused [19:34:07] PROBLEM - Parsoid on lardner is CRITICAL: Connection refused [19:34:16] PROBLEM - Parsoid on xenon is CRITICAL: Connection refused [19:34:17] PROBLEM - Parsoid on wtp1 is CRITICAL: Connection refused [19:34:26] PROBLEM - Parsoid on caesium is CRITICAL: Connection refused [19:34:56] RECOVERY - Parsoid on mexia is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.058 second response time [19:34:57] RECOVERY - Parsoid on cerium is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.006 second response time [19:34:57] RECOVERY - Parsoid on constable is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.057 second response time [19:34:57] RECOVERY - LVS HTTP IPv4 on parsoid.svc.pmtpa.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.056 second response time [19:34:57] RECOVERY - Parsoid on wtp1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.006 second response time [19:34:57] RECOVERY - Parsoid on kuo is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.057 second response time [19:35:06] RECOVERY - LVS HTTP IPv4 on parsoid.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.002 second response time [19:35:07] RECOVERY - Parsoid on lardner is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.055 second response time [19:35:17] RECOVERY - Parsoid on xenon is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.003 second response time [19:35:17] RECOVERY - Parsoid on wtp1 is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.055 second response time [19:35:24] PROBLEM - Parsoid on celsus is CRITICAL: Connection refused [19:35:27] RECOVERY - Parsoid on caesium is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.002 second response time [19:35:27] RECOVERY - Parsoid on tola is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 0.055 second response time [19:36:06] RECOVERY - Parsoid on celsus is OK: HTTP OK: HTTP/1.1 200 OK - 1308 bytes in 2.432 second response time [19:37:11] RECOVERY - Parsoid on celsus is OK: HTTP OK HTTP/1.1 200 OK - 1221 bytes in 0.047 seconds [19:37:57] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [19:38:56] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [19:42:58] RoanKattouw: indeed it didn't [19:43:20] !log authdns-update [19:43:22] Logged the message, RobH [19:44:05] ottomata: So the DNS is updated, analytics1001.eqiad.wmnet and analytics1026.wikimedia.org [19:44:23] after the dhcpd files are updated and merged, you are good to pxe boot. [19:44:34] lemme know if you want me to merge the change on sockpuppet [19:45:02] RobH: Do you have any idea why the Parsoid Varnishes don't seem to be in Ganglia? I'm trying to figure out if I made a mistake in puppet, but it all looks the same as the Parsoid stuff, and that stuff works [19:45:31] Oh nm [19:45:33] I think I see it [19:45:35] D'oh [19:45:43] what was it? [19:45:54] cuz i noticed it earlier when i had to find a spare server [19:46:14] It wasn't actually in ganglia.pp [19:46:15] Patch inbound [19:47:06] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:47:57] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [19:48:35] New patchset: Catrope; "Add Parsoid Varnish clusters to $data_sources as well as $ganglia_clusters" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47101 [19:48:53] New patchset: Cmjohnson; "updating mac address for db1052 and db1055" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47102 [19:49:24] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47102 [19:56:50] RobH: https://gerrit.wikimedia.org/r/47101 should fix the Ganglia thing [19:57:16] PROBLEM - Host analytics1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:58:02] New patchset: Catrope; "Give the mortals group shell access to the Parsoid machine" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47103 [19:58:26] PROBLEM - Host analytics1001 is DOWN: PING CRITICAL - Packet loss = 100% [20:00:08] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [20:00:26] !log authdns-update [20:00:27] Logged the message, RobH [20:01:02] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47101 [20:02:49] RECOVERY - Host analytics1001 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms [20:03:05] RoanKattouw: change is live ganglia shows them now [20:03:07] RECOVERY - Host analytics1001 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [20:03:20] thx for patch [20:04:06] Welcome [20:04:13] Thanks for deploying [20:07:38] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 20:07:32 UTC 2013 [20:08:08] PROBLEM - Host analytics1001 is DOWN: PING CRITICAL - Packet loss = 100% [20:08:09] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [20:08:18] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 20:08:15 UTC 2013 [20:08:55] RoanKattouw: So now that i can see the caches in ganglia [20:08:59] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [20:09:00] it seems celsus is the only one doing work ;] [20:09:03] Yes, it is [20:09:07] PROBLEM - Host analytics1001 is DOWN: PING CRITICAL - Packet loss = 100% [20:09:08] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [20:09:09] Because we don't have LVS for the Varnishes yet [20:09:12] I have to work on that [20:09:18] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Fri Feb 1 20:09:12 UTC 2013 [20:09:24] are we going to roll the lvs for this into our normal lvs servers? [20:09:30] or will these require dedicated lvs servers? [20:09:32] It's already in there [20:09:35] ahh, cool [20:09:41] just wondering if i needed to look for more servers =] [20:09:59] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [20:10:08] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [20:10:15] RobH: You do, for Parsoid itself :) [20:10:22] I'm gonna put something together for that soon [20:10:36] yep, we got in the dual cpu servers now in both sites [20:10:40] Gabriel wants to do some benchmarking on one of the hot spares in pmtpa [20:10:40] and they are being racked this week and next [20:10:46] Oh? [20:10:48] so you'll have some beefier parsoid nodes soon [20:11:02] ohh, nice ;) [20:11:04] dual cpu, double memory [20:11:22] we don't need much memory, just CPU [20:11:41] but it won't hurt of course.. [20:11:44] Are these gonna be wtp1002-wtp10NN? [20:11:46] the memory was doubled just so we dont have less per core, heh [20:11:48] And how many of them are there? [20:11:53] RoanKattouw: that was my plan, uhhh [20:11:59] how many you need? [20:12:12] I assumed you would want at least 3 per site minimum [20:12:19] with a note that I may have to give you up to 5 [20:12:22] I think gwicke said he was gonna do some benchmarks on the pmtpa machines (which are cold spares) to figure out how many we'll need [20:12:26] from our very early conversations [20:12:28] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Puppet has not run in the last 10 hours [20:12:32] cool [20:12:32] how many cores are there per socket? [20:12:39] gwicke: 8 [20:12:46] so 16 total [20:12:55] Yeah, well one thing that changed since our earlier convos is that they're now planning to run this thing as the default editor starting in the summer [20:13:11] Which suggests we may be looking for a tiny capacity increase ;) [20:13:20] well, we have 10 of the new HPM servers [20:13:21] my rough and conservative guess would be that we'd need around 5-7 machines to be safe [20:13:23] (high performacne misc) [20:13:41] and i have not allocated any of them for anythign but parsoid yet [20:13:47] that is based on very extrapolated data though [20:13:53] (cuz i got some additoinal ones with ssds for ashers stuff) [20:14:20] so, understanding that Im speaking purely in 'what do we have on site now' and not as a manager who approves project allocations [20:14:24] we have the servers. [20:14:56] gwicke: we can handle that with on site spares now (as of this week) so we should be fine [20:15:08] just let me know how many we really need once you do some testing and we'll get it done [20:15:14] I'd say we need at least five. We can then benchmark those to see if the performance is sufficient to keep up with peak edit rates [20:15:31] sounds reasonable, once we have them all racked and ready i'll pass them along to you guys [20:15:33] *nod* [20:16:20] Cool [20:16:50] RobH: It might be a week or so before I set them up, I need Ryan to be back so I can figure out a deployment strategy thingy before I spin up these boxes [20:17:59] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 187 seconds [20:18:08] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:18:16] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 190 seconds [20:18:16] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 189 seconds [20:18:39] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 198 seconds [20:18:59] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [20:19:08] RoanKattouw: sounds good, chris is workign on racking apaches over the new misc stuff anyhow [20:19:15] OK ood [20:19:16] *good [20:19:16] we need the apache capacity in eqiad [20:19:29] So no real rush on racking them, until Ryan is back I won't be able to do anything with the new boxes anyways [20:20:17] RobH, am I duuummmbbbbb or sumpin? [20:20:24] to get analytics1001 (cisco) to reinstall [20:20:29] i should just set boot-order pxe [20:20:31] commit [20:20:33] and reboot, right? [20:20:39] (I'm also looking in BIOS now manually) [20:20:59] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [20:21:38] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [20:21:43] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [20:22:01] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 1 seconds [20:22:20] ottomata: yep, since i didnt change anythign on the switch [20:22:25] pxe should launch the installer [20:22:36] i think we've seen this before [20:22:38] don't remember outcome [20:22:41] it looks like it tries to netboot [20:22:44] but then just regular boot [20:23:08] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:59] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [20:24:57] RobH, maybe if I set the first boot option to [20:24:57] MBA v6.0.11 Slot 0100 [20:24:58] ? [20:32:08] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:32:58] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [20:33:42] hmm, rats, nope [20:33:55] growl, why are the ciscos so stubborn [20:34:07] you still around RobH? [20:34:10] RECOVERY - Host analytics1001 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [20:34:11] PROBLEM - NTP on analytics1001 is CRITICAL: NTP CRITICAL: Offset unknown [20:34:28] RECOVERY - Host analytics1001 is UP: PING OK - Packet loss = 0%, RTA = 26.55 ms [20:35:50] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 189 seconds [20:35:58] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 190 seconds [20:36:17] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 195 seconds [20:36:31] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 201 seconds [20:36:42] ottomata: i am [20:36:49] its a stubborn ox [20:36:54] i say netboot! [20:36:56] it says: NO [20:37:30] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 2 seconds [20:37:31] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [20:37:46] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [20:37:50] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [20:38:04] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [20:38:10] RECOVERY - NTP on analytics1001 is OK: NTP OK: Offset 0.0008429288864 secs [20:38:20] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [20:38:46] hrmm, lemme finish wolfing down this food and i take a gander at it [20:39:58] ok danke [20:51:30] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:52:20] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [21:06:10] New patchset: MaxSem; "WIP: advanced Solr monitoring script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47111 [21:20:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [21:20:52] New patchset: Hashar; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [21:21:21] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:21:31] poor mark [21:22:07] New review: Hashar; "Patchset 24 hack up the lvs configuration IPs to point to the beta Apaches." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/44709 [21:23:00] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 247 bytes in 0.058 seconds [21:24:41] Change merged: Andrew Bogott; [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/47032 [21:45:06] New patchset: Hashar; ".pep8 , ignore tabs!" [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/47115 [21:51:03] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [21:53:59] New review: Hashar; "recheck" [operations/debs/adminbot] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/47115 [21:57:50] Change merged: Andrew Bogott; [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/47115 [22:10:12] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [22:11:13] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [22:20:12] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [22:20:38] PROBLEM - Puppet freshness on labstore4 is CRITICAL: Puppet has not run in the last 10 hours [22:21:51] New patchset: Lcarr; "ganglios requires gmetad.conf -- seeing if it is happy with a mostly empty one" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47175 [22:26:32] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [22:27:32] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [22:29:05] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47175 [22:29:32] i hope ganglios doesn't actually require a full gmetad installation [22:33:01] I got a ganglia bug for you Leslie : -D [22:33:08] add in disk I/O reporting! https://bugzilla.wikimedia.org/show_bug.cgi?id=36994 [22:33:08] what ? [22:33:09] ;-d [22:33:13] hehe [22:33:17] in labs [22:33:34] ahh yeah maybe it is already in production [22:33:39] I need to poke Ryan about it [22:33:51] on a different subject, do you happen to know if LABS supports LVS ? [22:34:27] not yet [22:34:32] it should! [22:35:37] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [22:35:44] ah that clarify it, thanks! [22:36:04] andrewbogott: "no LVS support in labs", Leslie, just a minute ago, above. [22:36:21] one day it'll all work ;) [22:36:27] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [22:36:35] yeah I am sure [22:40:37] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [22:40:45] so hmm [22:40:47] bed time for me [22:40:53] have a good fun and nice week end [22:41:28] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.017 second response time [22:46:25] New patchset: Andrew Bogott; "Update changelog + many pep8 and pyflakes fixes." [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/47178 [22:54:37] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [22:55:28] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.002 second response time [22:56:47] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 186 seconds [22:57:27] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 203 seconds [22:57:28] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 205 seconds [22:57:56] New patchset: Andrew Bogott; "Update changelog + many pep8 and pyflakes fixes." [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/47178 [22:58:17] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 223 seconds [22:59:56] New review: Andrew Bogott; "With all the pep8 changes this patch is pretty much un-reviewable by a human." [operations/debs/adminbot] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/47178 [22:59:56] Change merged: Andrew Bogott; [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/47178 [23:11:54] New patchset: Lcarr; "Revert "ganglios requires gmetad.conf -- seeing if it is happy with a mostly empty one"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47180 [23:12:07] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47180 [23:13:37] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:14:28] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.003 second response time [23:17:11] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [23:17:27] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [23:17:28] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [23:17:29] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [23:27:37] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:28:28] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.005 second response time [23:36:37] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [23:37:28] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.003 second response time [23:37:52] New patchset: Lcarr; "try 2 for getting ganglios working" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47186 [23:40:03] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47186 [23:49:43] PROBLEM - Puppet freshness on ms2 is CRITICAL: Puppet has not run in the last 10 hours [23:52:32] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [23:53:22] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time [23:57:35] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: Connection timed out [23:58:25] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 247 bytes in 0.001 second response time