[00:03:36] (03PS2) 10BBlack: update zeroconfig URL for netmapper [operations/puppet] - 10https://gerrit.wikimedia.org/r/86205 [00:09:49] Ryan_Lane: does the salt minion change (above) fix things? [00:10:57] ori-l: fix things that we just broke? :) [00:11:00] hopefully [00:11:35] hm. shit [00:11:49] actually, the puppet change won't fix things [00:12:49] because there's no ldap entry for that. [00:12:59] well, salt doesn't depend on that, so I can salt a fix to them :D [00:47:38] (03CR) 10Yurik: [C: 031] update zeroconfig URL for netmapper [operations/puppet] - 10https://gerrit.wikimedia.org/r/86205 (owner: 10BBlack) [01:07:53] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [01:07:53] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [01:08:43] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [01:08:43] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [01:09:46] is there a log to view warnings? fatal log doesn't seem to have them [01:10:33] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 01:10:32 UTC 2013 [01:10:43] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [01:10:43] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 01:10:42 UTC 2013 [01:10:53] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [01:16:06] yurik, the one used by fatalmonitor has plenty of warnings [01:16:34] MaxSem, fatalmonitor looks at apache.log which has no stacktraces [01:16:58] and you waon't have them unless you install xdebug [01:17:08] bleh [01:17:27] i've been looking at the fatamonitor - shows lots of strcmp issues [01:17:52] all of the issues right now are array instead of string is passed [01:17:58] no idea who does it [01:18:03] don't want it to be my fault [01:19:25] I wonder if there's a possibility to always have one appserver with xdebug ready and direct traffic to it when someone needs stacktraces/other debugging [01:32:44] Those strcmp warnings are definitely new in 1.19 [01:32:46] uh [01:32:47] 1.22wmf19 [01:33:09] Must find the cause before deploying further, otherwise it's just going to get really noisy [01:34:03] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 01:34:00 UTC 2013 [01:34:53] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [01:38:03] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 01:37:53 UTC 2013 [01:38:43] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [01:40:53] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 01:40:49 UTC 2013 [01:41:03] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 01:40:59 UTC 2013 [01:41:43] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [01:41:53] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [02:03:23] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 02:03:22 UTC 2013 [02:03:43] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [02:03:53] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 02:03:52 UTC 2013 [02:04:53] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [02:10:43] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 02:10:41 UTC 2013 [02:10:53] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 02:10:46 UTC 2013 [02:11:43] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [02:11:53] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [02:14:34] !log LocalisationUpdate completed (1.22wmf18) at Fri Sep 27 02:14:34 UTC 2013 [02:14:53] Logged the message, Master [02:33:03] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 02:32:59 UTC 2013 [02:33:20] !log LocalisationUpdate completed (1.22wmf19) at Fri Sep 27 02:33:19 UTC 2013 [02:33:34] Logged the message, Master [02:33:43] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [02:34:13] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 02:34:09 UTC 2013 [02:34:53] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [02:41:03] (03CR) 10BBlack: [C: 032] update zeroconfig URL for netmapper [operations/puppet] - 10https://gerrit.wikimedia.org/r/86205 (owner: 10BBlack) [02:55:41] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Sep 27 02:55:41 UTC 2013 [02:55:53] Logged the message, Master [03:01:54] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [03:46:43] (03PS1) 10Yuvipanda: Explicitly reference labsvagrant class from the module [operations/puppet] - 10https://gerrit.wikimedia.org/r/86213 [03:46:44] Ryan_Lane: merge? ^ [03:47:18] waiting for jenkins [03:47:42] (03CR) 10Ryan Lane: [C: 032] Explicitly reference labsvagrant class from the module [operations/puppet] - 10https://gerrit.wikimedia.org/r/86213 (owner: 10Yuvipanda) [03:47:58] YuviPanda: ^^ [03:48:05] ty, Ryan_Lane [03:48:14] ok. gone for the night [04:13:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [04:13:30] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [04:13:50] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [04:13:50] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:26:50] PROBLEM - Puppet freshness on mw1072 is CRITICAL: No successful Puppet run in the last 10 hours [04:33:31] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 04:33:29 UTC 2013 [04:33:50] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:34:31] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 04:34:29 UTC 2013 [04:35:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [04:41:40] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 04:41:37 UTC 2013 [04:41:50] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [04:42:00] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 04:41:58 UTC 2013 [04:42:30] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [05:04:30] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 05:04:22 UTC 2013 [05:04:50] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [05:05:30] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 05:05:23 UTC 2013 [05:06:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:13:10] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 05:13:07 UTC 2013 [05:13:10] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 05:13:07 UTC 2013 [05:13:30] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [05:13:50] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [05:22:40] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [05:23:40] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [05:35:00] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 05:34:50 UTC 2013 [05:35:30] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [05:35:40] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:37:40] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [05:41:30] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 05:41:23 UTC 2013 [05:41:31] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 05:41:28 UTC 2013 [05:41:50] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [05:41:50] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [05:42:00] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 05:41:59 UTC 2013 [05:42:30] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [06:01:40] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:03:40] PROBLEM - Disk space on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:04:00] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 06:03:50 UTC 2013 [06:04:40] RECOVERY - Disk space on snapshot3 is OK: DISK OK [06:04:40] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [06:04:50] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [06:05:00] PROBLEM - DPKG on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:06:50] RECOVERY - DPKG on snapshot3 is OK: All packages OK [06:09:24] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 06:09:17 UTC 2013 [06:09:24] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [06:11:24] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 06:11:17 UTC 2013 [06:11:24] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [06:12:14] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 06:12:13 UTC 2013 [06:12:44] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [06:32:34] PROBLEM - RAID on snapshot1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:34:04] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 06:34:02 UTC 2013 [06:34:44] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [06:35:04] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 06:35:03 UTC 2013 [06:35:24] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [06:37:34] RECOVERY - RAID on snapshot1002 is OK: OK: no RAID installed [06:40:34] PROBLEM - RAID on snapshot1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:41:14] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 06:41:07 UTC 2013 [06:41:44] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [06:42:04] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 06:42:03 UTC 2013 [06:42:24] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [06:42:34] RECOVERY - RAID on snapshot1002 is OK: OK: no RAID installed [07:03:44] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 07:03:36 UTC 2013 [07:03:44] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [07:04:34] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 07:04:32 UTC 2013 [07:05:24] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:08:44] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:10:44] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [07:11:14] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 07:11:07 UTC 2013 [07:11:44] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [07:12:34] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 07:12:28 UTC 2013 [07:13:24] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [07:20:04] PROBLEM - DPKG on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:20:44] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:20:54] RECOVERY - DPKG on snapshot3 is OK: All packages OK [07:21:44] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [07:25:44] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:26:44] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [07:30:17] hello [07:31:04] PROBLEM - DPKG on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:31:55] RECOVERY - DPKG on snapshot3 is OK: All packages OK [07:33:04] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 07:32:59 UTC 2013 [07:33:44] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [07:33:54] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 07:33:50 UTC 2013 [07:34:24] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [07:38:46] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:38:46] PROBLEM - DPKG on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:36] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [07:40:36] RECOVERY - DPKG on snapshot3 is OK: All packages OK [07:49:35] (03PS3) 10JanZerebecki: replace SSLCACertificatePath with SSLCertificateChainFile in Apache templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/84901 (owner: 10Dzahn) [07:50:46] PROBLEM - DPKG on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:53:36] RECOVERY - DPKG on snapshot3 is OK: All packages OK [07:53:42] (03CR) 10JanZerebecki: [C: 031] "Using CACertificatePath may be a performance problem because apache will send a list of all certificates in that path as acceptable for cl" [operations/puppet] - 10https://gerrit.wikimedia.org/r/84901 (owner: 10Dzahn) [07:56:46] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:57:46] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [07:58:46] PROBLEM - DPKG on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:00:46] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:00:46] PROBLEM - Disk space on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:01:36] RECOVERY - Disk space on snapshot3 is OK: DISK OK [08:01:36] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [08:01:36] RECOVERY - DPKG on snapshot3 is OK: All packages OK [08:05:46] PROBLEM - DPKG on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:05:46] PROBLEM - Disk space on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:05:46] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:06:05] PROBLEM - SSH on snapshot3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:06:15] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [08:06:25] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:35] RECOVERY - SSH on snapshot3 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [08:06:45] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [08:07:05] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [08:09:35] RECOVERY - Disk space on snapshot3 is OK: DISK OK [08:09:45] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [08:09:45] RECOVERY - DPKG on snapshot3 is OK: All packages OK [08:10:45] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 08:10:35 UTC 2013 [08:10:55] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 08:10:46 UTC 2013 [08:11:05] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [08:11:15] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [08:12:45] PROBLEM - DPKG on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:12:45] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:13:35] RECOVERY - RAID on snapshot3 is OK: OK: no RAID installed [08:13:35] RECOVERY - DPKG on snapshot3 is OK: All packages OK [08:29:20] Ryan_Lane: european hours ? [08:34:05] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 08:33:54 UTC 2013 [08:34:25] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [08:36:15] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 08:36:06 UTC 2013 [08:36:45] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [08:56:42] paravoid, around? [08:57:18] i deployed the redirection just like you wanted :) [08:58:44] mark, the backend now returns "Enable-ESI: 1" header when it wants the result to be processed via ESI (also per paravoid suggestion) - do we now need to vary based on that header too? [08:59:05] also, please enable it in VCL [09:06:11] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [09:06:21] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [09:06:51] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [09:07:01] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [09:10:51] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 09:10:42 UTC 2013 [09:10:51] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 09:10:47 UTC 2013 [09:11:01] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [09:11:11] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [09:17:42] finally... root@lvs4003.... [09:27:38] (03PS1) 10Akosiaris: Allow from ulsfo.wmnet for puppetmasters [operations/puppet] - 10https://gerrit.wikimedia.org/r/86222 [09:29:45] (03PS1) 10ArielGlenn: wikiretriever: retrieve recent changes, pass extra params, bugfixes [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/86223 [09:29:48] (03CR) 10Akosiaris: [C: 032] Allow from ulsfo.wmnet for puppetmasters [operations/puppet] - 10https://gerrit.wikimedia.org/r/86222 (owner: 10Akosiaris) [09:35:01] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 09:34:52 UTC 2013 [09:35:31] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 09:35:28 UTC 2013 [09:35:51] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [09:36:21] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [09:41:14] (03CR) 10ArielGlenn: [C: 032] wikiretriever: retrieve recent changes, pass extra params, bugfixes [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/86223 (owner: 10ArielGlenn) [09:42:12] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 09:42:06 UTC 2013 [09:42:31] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 09:42:22 UTC 2013 [09:43:01] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [09:43:11] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [10:03:31] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 10:03:26 UTC 2013 [10:03:41] RECOVERY - DPKG on stafford is OK: All packages OK [10:03:51] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [10:04:01] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 10:03:52 UTC 2013 [10:04:21] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [10:08:41] PROBLEM - DPKG on stafford is CRITICAL: DPKG CRITICAL dpkg reports broken packages [10:11:22] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 10:11:17 UTC 2013 [10:11:32] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 10:11:22 UTC 2013 [10:11:52] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [10:12:02] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [10:12:52] (03PS1) 10Akosiaris: Adding puppet CNAME for ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/86232 [10:13:23] (03CR) 10Akosiaris: [C: 032] Adding puppet CNAME for ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/86232 (owner: 10Akosiaris) [10:33:52] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 10:33:42 UTC 2013 [10:34:02] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 10:33:52 UTC 2013 [10:34:22] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [10:34:42] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [10:42:25] PROBLEM - Disk space on lvs4002 is CRITICAL: Connection refused by host [10:42:25] PROBLEM - RAID on lvs4004 is CRITICAL: Connection refused by host [10:42:25] PROBLEM - DPKG on lvs4001 is CRITICAL: Connection refused by host [10:42:35] PROBLEM - Disk space on lvs4001 is CRITICAL: Connection refused by host [10:42:35] PROBLEM - RAID on lvs4003 is CRITICAL: Connection refused by host [10:42:45] PROBLEM - RAID on lvs4002 is CRITICAL: Connection refused by host [10:42:55] PROBLEM - DPKG on lvs4004 is CRITICAL: Connection refused by host [10:42:55] PROBLEM - RAID on lvs4001 is CRITICAL: Connection refused by host [10:43:05] PROBLEM - DPKG on lvs4003 is CRITICAL: Connection refused by host [10:43:05] PROBLEM - Disk space on lvs4004 is CRITICAL: Connection refused by host [10:43:15] PROBLEM - Disk space on lvs4003 is CRITICAL: Connection refused by host [10:43:15] PROBLEM - DPKG on lvs4002 is CRITICAL: Connection refused by host [10:54:25] PROBLEM - NTP on lvs4003 is CRITICAL: NTP CRITICAL: Offset unknown [10:54:36] PROBLEM - NTP on lvs4002 is CRITICAL: NTP CRITICAL: Offset unknown [10:54:45] PROBLEM - NTP on lvs4001 is CRITICAL: NTP CRITICAL: Offset unknown [10:55:15] PROBLEM - NTP on lvs4004 is CRITICAL: NTP CRITICAL: Offset unknown [10:57:55] RECOVERY - DPKG on lvs4004 is OK: All packages OK [10:58:05] RECOVERY - Disk space on lvs4004 is OK: DISK OK [10:58:05] RECOVERY - DPKG on lvs4003 is OK: All packages OK [10:58:15] RECOVERY - Disk space on lvs4003 is OK: DISK OK [10:58:15] RECOVERY - DPKG on lvs4002 is OK: All packages OK [10:58:18] (03PS1) 10Akosiaris: Add ulsfo to enable_proxy for apt [operations/puppet] - 10https://gerrit.wikimedia.org/r/86236 [10:58:25] RECOVERY - Disk space on lvs4002 is OK: DISK OK [10:58:25] RECOVERY - RAID on lvs4004 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [10:58:35] RECOVERY - RAID on lvs4003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [10:58:45] RECOVERY - RAID on lvs4002 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [10:59:24] (03CR) 10Akosiaris: [C: 032] Add ulsfo to enable_proxy for apt [operations/puppet] - 10https://gerrit.wikimedia.org/r/86236 (owner: 10Akosiaris) [10:59:25] RECOVERY - DPKG on lvs4001 is OK: All packages OK [10:59:35] RECOVERY - Disk space on lvs4001 is OK: DISK OK [10:59:55] RECOVERY - RAID on lvs4001 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [11:13:29] (03PS1) 10Faidon Liambotis: Switch eqiad's ms-fe & ms-be to Swift [operations/puppet] - 10https://gerrit.wikimedia.org/r/86238 [11:13:39] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [11:13:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [11:13:49] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [11:13:49] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [11:14:09] RECOVERY - NTP on lvs4004 is OK: NTP OK: Offset -0.01887404919 secs [11:14:19] RECOVERY - NTP on lvs4003 is OK: NTP OK: Offset -0.02353930473 secs [11:14:39] RECOVERY - NTP on lvs4002 is OK: NTP OK: Offset -0.02709567547 secs [11:15:19] RECOVERY - NTP on lvs4001 is OK: NTP OK: Offset -0.02019965649 secs [11:24:13] (03PS1) 10Faidon Liambotis: Remove role::ceph::*, unused now [operations/puppet] - 10https://gerrit.wikimedia.org/r/86241 [11:24:34] (03CR) 10Faidon Liambotis: [C: 032] Switch eqiad's ms-fe & ms-be to Swift [operations/puppet] - 10https://gerrit.wikimedia.org/r/86238 (owner: 10Faidon Liambotis) [11:24:44] (03CR) 10Faidon Liambotis: [C: 032] Remove role::ceph::*, unused now [operations/puppet] - 10https://gerrit.wikimedia.org/r/86241 (owner: 10Faidon Liambotis) [11:28:59] Ceph is dead, long live Swift? [11:29:42] yeah... [11:29:45] * paravoid sad [11:32:59] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 11:32:57 UTC 2013 [11:33:39] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [11:33:59] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 11:33:53 UTC 2013 [11:34:49] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [12:11:07] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [12:11:17] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [12:11:17] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:11:27] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [12:15:32] paravoid, will there be a post/email/wiki page describing the Ceph fail? [12:18:02] I didn't think anyone else was interested [12:18:19] but if you are, I guess I can, yes [12:26:00] mark: rt5848...the site.pp entries have been removed...that is where bonding would be set...correct? [12:33:17] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 12:33:09 UTC 2013 [12:33:17] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:34:16] (03PS2) 10Dzahn: retab misc/planet.pp from tabs to 4 spaces, do the cleanup before next attempt to turn into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/86126 [12:34:40] (03CR) 10Dzahn: [C: 032] retab misc/planet.pp from tabs to 4 spaces, do the cleanup before next attempt to turn into module [operations/puppet] - 10https://gerrit.wikimedia.org/r/86126 (owner: 10Dzahn) [12:35:17] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 12:35:10 UTC 2013 [12:35:28] (03PS2) 10Dzahn: planet.pp - wrong quoting, aligned arrows, ensure first and other puppet lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/86130 [12:36:17] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [12:37:25] (03CR) 10Dzahn: [C: 032] planet.pp - wrong quoting, aligned arrows, ensure first and other puppet lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/86130 (owner: 10Dzahn) [12:38:51] (03PS1) 10Krinkle: gerrit: Don't include '.' in the match for adjecent separator [operations/puppet] - 10https://gerrit.wikimedia.org/r/86250 [12:40:09] (03PS2) 10Krinkle: gerrit: Don't include '.' in the match for adjecent separator [operations/puppet] - 10https://gerrit.wikimedia.org/r/86250 [12:40:40] (03PS2) 10Dzahn: bugzilla.pp - fix unquoted resource titles and file modes (puppet-lint) [operations/puppet] - 10https://gerrit.wikimedia.org/r/86124 [12:42:37] (03CR) 10Dzahn: [C: 032] bugzilla.pp - fix unquoted resource titles and file modes (puppet-lint) [operations/puppet] - 10https://gerrit.wikimedia.org/r/86124 (owner: 10Dzahn) [12:51:40] (03CR) 10Dzahn: [C: 031] replace SSLCACertificatePath with SSLCertificateChainFile in Apache templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/84901 (owner: 10Dzahn) [12:55:06] PROBLEM - Host ms-fe1001 is DOWN: PING CRITICAL - Packet loss = 100% [12:56:06] PROBLEM - Ceph on ms-fe1004 is CRITICAL: Ceph HEALTH_WARN 1 mons down, quorum 1,2 ms-fe1003,ms-fe1004 [12:56:06] PROBLEM - Ceph on ms-fe1003 is CRITICAL: Ceph HEALTH_WARN 1 mons down, quorum 1,2 ms-fe1003,ms-fe1004 [12:59:46] PROBLEM - Host ms-fe1002 is DOWN: PING CRITICAL - Packet loss = 100% [13:00:06] PROBLEM - Swift HTTP on ms-fe4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:00:16] RECOVERY - Host ms-fe1001 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [13:01:16] PROBLEM - Host ms-be1001 is DOWN: PING CRITICAL - Packet loss = 100% [13:02:26] PROBLEM - Ceph on ms-fe1001 is CRITICAL: Connection refused by host [13:02:26] PROBLEM - SSH on ms-fe1001 is CRITICAL: Connection refused [13:02:36] PROBLEM - DPKG on ms-fe1001 is CRITICAL: Connection refused by host [13:02:36] PROBLEM - HTTP Apache on ms-fe1001 is CRITICAL: Connection refused [13:02:46] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: No successful Puppet run in the last 10 hours [13:02:48] !log dismantling & repurposing ceph cluster [13:02:56] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: Connection refused [13:03:01] Logged the message, Master [13:03:06] PROBLEM - Disk space on ms-fe1001 is CRITICAL: Connection refused by host [13:03:06] PROBLEM - RAID on ms-fe1001 is CRITICAL: Connection refused by host [13:03:19] wait, why is ms-fe @ pmtpa getting an load increase [13:03:48] (03CR) 10Dzahn: [C: 04-1] "so we have at least 3 different users running maint. crons, "apache", "mwdeploy" and "l10nupdate" and the latter have comments # which use" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [13:04:56] RECOVERY - Host ms-fe1002 is UP: PING OK - Packet loss = 0%, RTA = 1.05 ms [13:05:06] PROBLEM - Swift HTTP on ms-fe1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:05:26] wait, what [13:06:05] friendly autocompletion suggestion: .. the fuck [13:06:26] RECOVERY - Host ms-be1001 is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [13:06:56] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: Connection refused [13:07:06] PROBLEM - RAID on ms-fe1002 is CRITICAL: Connection refused by host [13:07:24] PROBLEM - DPKG on ms-be1001 is CRITICAL: Connection refused by host [13:07:34] PROBLEM - DPKG on ms-fe1002 is CRITICAL: Connection refused by host [13:07:34] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [13:07:44] PROBLEM - Disk space on ms-fe1002 is CRITICAL: Connection refused by host [13:07:54] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [13:07:54] PROBLEM - SSH on ms-fe1002 is CRITICAL: Connection refused [13:07:54] PROBLEM - HTTP Apache on ms-fe1002 is CRITICAL: Connection refused [13:08:04] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [13:08:14] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [13:08:34] PROBLEM - Disk space on ms-be1001 is CRITICAL: Connection refused by host [13:08:54] PROBLEM - SSH on ms-be1001 is CRITICAL: Connection refused [13:09:04] PROBLEM - RAID on ms-be1001 is CRITICAL: Connection refused by host [13:09:54] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.059 second response time [13:09:54] RECOVERY - Swift HTTP on ms-fe4 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.059 second response time [13:10:44] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 13:10:42 UTC 2013 [13:10:46] is this a joke [13:11:14] PROBLEM - Swift HTTP on ms-fe3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:11:34] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [13:12:00] holy crap [13:12:04] RECOVERY - Swift HTTP on ms-fe3 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.062 second response time [13:12:13] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&g=cpu_report&h=ms-be10.pmtpa.wmnet&c=Swift+pmtpa [13:12:34] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 13:12:27 UTC 2013 [13:12:54] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [13:14:13] 13:14:08 up 309 days, 4:11, 1 user, load average: 118.86, 47.69, 30.51 [13:14:17] that's the spirit [13:14:41] paravoid: ms-be1001 is down since then, is that also expected or just the fe- hosts [13:14:44] PROBLEM - NTP on ms-fe1001 is CRITICAL: NTP CRITICAL: No response from NTP server [13:14:53] mutante: it's everything [13:15:00] I'm going to do all of ms-fe10xx / ms-be10xx [13:15:07] but in the meantime, the pmtpa cluster is acting up [13:15:10] at exactly the same time [13:15:17] without me touching it [13:15:45] nod.. ah .. hm [13:15:54] RECOVERY - SSH on ms-be1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:17:14] PROBLEM - Host ms-fe1001 is DOWN: PING CRITICAL - Packet loss = 100% [13:18:24] RECOVERY - SSH on ms-fe1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:18:34] RECOVERY - Host ms-fe1001 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [13:19:04] PROBLEM - NTP on ms-fe1002 is CRITICAL: NTP CRITICAL: No response from NTP server [13:20:02] sounds like fail-over somehow because of the timing? paravoid, i have no idea, but it went down again on the ganglia graph [13:20:14] PROBLEM - Swift HTTP on ms-fe2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:20:22] yeah found part of the cause and fixing [13:20:26] cool [13:20:54] PROBLEM - NTP on ms-be1001 is CRITICAL: NTP CRITICAL: No response from NTP server [13:21:04] RECOVERY - Swift HTTP on ms-fe2 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.061 second response time [13:21:54] RECOVERY - SSH on ms-fe1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [13:34:34] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 13:34:31 UTC 2013 [13:35:04] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [13:35:44] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 13:35:42 UTC 2013 [13:36:14] PROBLEM - Puppet freshness on labstore4 is CRITICAL: No successful Puppet run in the last 10 hours [13:42:04] RECOVERY - Puppet freshness on titanium is OK: puppet ran at Fri Sep 27 13:41:54 UTC 2013 [13:42:44] RECOVERY - Puppet freshness on praseodymium is OK: puppet ran at Fri Sep 27 13:42:34 UTC 2013 [13:42:54] PROBLEM - Puppet freshness on titanium is CRITICAL: No successful Puppet run in the last 10 hours [13:43:30] (03PS1) 10Akosiaris: Remove jfsutils from base::standard-packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/86252 [13:43:34] PROBLEM - Puppet freshness on praseodymium is CRITICAL: No successful Puppet run in the last 10 hours [13:56:00] (03CR) 10Faidon Liambotis: [C: 031] "preseed.cfg says that apache.cfg used to use JFS -but that's unused and replaced by mw.cfg now, also needs a cleanup." [operations/puppet] - 10https://gerrit.wikimedia.org/r/86252 (owner: 10Akosiaris) [13:58:12] that was when apaches contained external storage [13:59:45] heh [14:01:03] YOoooo akosiaris! [14:01:55] hey ottomata. [14:01:59] wassup ? [14:02:03] ohhh just checking in! [14:02:10] cps almost done? you've just done base installs, right? [14:02:23] no puppet? [14:02:24] yes. [14:02:29] and puppet [14:02:50] hm, looking in site.pp [14:03:04] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Fri Sep 27 14:02:57 UTC 2013 [14:03:07] you should be able to log in at lvs and most cp4xxx [14:03:28] i just run puppet... no extra configuration [14:03:45] oh ok, not even a site entry, got it [14:03:45] ok [14:04:00] i suppose these will be done on the next step [14:04:04] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [14:04:18] when varnish and pybal and all that will be installed/configured etc [14:04:34] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Fri Sep 27 14:04:28 UTC 2013 [14:05:09]