[00:00:15] RECOVERY - check_missing_thank_yous on db1025 is OK: OK missing_thank_yous=0 [00:01:47] springle, this is so that someone can test out sanitizing a bugzilla Db, isn't it? [00:02:01] Krenair: correct [00:02:17] I wondered if someone was planning to do the same with phabricator [00:05:56] Krenair: I don't know if there are similar phab plans [00:06:15] but if there were, we could no doubt setup a test box for it [00:13:31] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: puppet fail [00:30:51] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [02:01:08] (03PS1) 10Andrew Bogott: Fix the projectgid fact -- take two [puppet] - 10https://gerrit.wikimedia.org/r/181535 [02:10:19] PROBLEM - Host tellurium is DOWN: CRITICAL - Plugin timed out after 15 seconds [02:11:17] PROBLEM - Host backup4001 is DOWN: CRITICAL - Plugin timed out after 15 seconds [02:11:21] PROBLEM - Host rendering.svc.eqiad.wmnet is DOWN: CRITICAL - Plugin timed out after 15 seconds [02:11:59] the last time backup4001 paged us, it was intentionally being worked on [02:12:03] is that the case tonight? [02:12:10] RECOVERY - Host tellurium is UP: PING OK - Packet loss = 0%, RTA = 1.17 ms [02:13:25] i can ping it but not ssh, which iirc was also the situation last time [02:13:32] jeff_green? [02:14:16] hm he's not here [02:15:10] RECOVERY - Host backup4001 is UP: PING OK - Packet loss = 0%, RTA = 76.94 ms [02:15:41] RECOVERY - Host rendering.svc.eqiad.wmnet is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [02:17:54] tellurium and backup4001 are both frack, afaik rendering is unrelated. looks like a monitoring or network glitch, though i hate to just leave it at that. [03:19:30] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.025 second response time [03:25:09] !log graceful'd apache2 on virt1000 (same intermittent passenger crash as always) [03:25:19] Logged the message, Master [03:43:30] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [03:49:14] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [03:53:32] (03Abandoned) 10Andrew Bogott: -- DRAFT -- [puppet] - 10https://gerrit.wikimedia.org/r/176670 (owner: 10Andrew Bogott) [03:53:40] (03PS1) 10Andrew Bogott: Don't install update-notifier-common on Jessie. [puppet] - 10https://gerrit.wikimedia.org/r/181539 [03:57:44] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [04:00:42] (03PS1) 10Andrew Bogott: Don't specify provider => upstart [puppet] - 10https://gerrit.wikimedia.org/r/181540 [04:00:44] (03PS1) 10Andrew Bogott: Don't include base::instance-upstarts on Debian. [puppet] - 10https://gerrit.wikimedia.org/r/181541 [04:03:31] (03CR) 10KartikMistry: "It is replaced by 'gnome-packagekit'. Not sure how it will be useful being gtk app here." [puppet] - 10https://gerrit.wikimedia.org/r/181539 (owner: 10Andrew Bogott) [04:10:01] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [04:16:09] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [04:25:40] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [04:28:07] PROBLEM - HTTPS_zero.wikipedia.org on cp1067 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:28:28] PROBLEM - HTTPS_m.wikidata.org on cp4003 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:28:40] PROBLEM - HTTPS_m.wikinews.org on amssq34 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:28:47] PROBLEM - HTTPS_m.mediawiki.org on cp1039 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:28:49] PROBLEM - HTTPS_wikiquote.org on amssq45 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:30:44] RECOVERY - HTTPS_zero.wikipedia.org on cp1067 is OK: SSL_CERT OK - X.509 certificate for *.zero.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:05 2015 GMT (expires in 335 days) [04:30:49] RECOVERY - HTTPS_m.wikidata.org on cp4003 is OK: SSL_CERT OK - X.509 certificate for *.m.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:04 2015 GMT (expires in 335 days) [04:30:52] RECOVERY - HTTPS_m.mediawiki.org on cp1039 is OK: SSL_CERT OK - X.509 certificate for *.m.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:04 2015 GMT (expires in 335 days) [04:30:56] RECOVERY - HTTPS_wikiquote.org on amssq45 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 335 days) [04:30:58] RECOVERY - HTTPS_m.wikinews.org on amssq34 is OK: SSL_CERT OK - X.509 certificate for *.m.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:02 2015 GMT (expires in 335 days) [04:34:42] PROBLEM - HTTPS_wikivoyage.org on cp3014 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:42] PROBLEM - HTTPS_m.wiktionary.org on cp1061 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:43] PROBLEM - HTTPS_m.wikimediafoundation.org on cp1067 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:43] PROBLEM - HTTPS_wikimedia.org on cp4001 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:43] PROBLEM - HTTPS_wikibooks.org on cp4008 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:43] PROBLEM - HTTPS_m.wiktionary.org on cp1055 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:43] PROBLEM - HTTPS_wiktionary.org on cp3016 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:44] PROBLEM - HTTPS_wikipedia.org on cp4015 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:44] PROBLEM - HTTPS_wikinews.org on cp3022 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:45] PROBLEM - HTTPS_m.wikipedia.org on cp1049 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:45] PROBLEM - HTTPS_wikidata.org on amssq33 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:46] PROBLEM - HTTPS_wikiquote.org on amssq37 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [04:34:46] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [04:37:08] RECOVERY - HTTPS_wikibooks.org on cp4008 is OK: SSL_CERT OK - X.509 certificate for *.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:03 2015 GMT (expires in 335 days) [04:37:15] RECOVERY - HTTPS_m.wiktionary.org on cp1061 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 335 days) [04:37:31] RECOVERY - HTTPS_m.wikipedia.org on cp1049 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 335 days) [04:37:34] RECOVERY - HTTPS_m.wiktionary.org on cp1055 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 335 days) [04:37:40] RECOVERY - HTTPS_wiktionary.org on cp3016 is OK: SSL_CERT OK - X.509 certificate for *.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:05 2015 GMT (expires in 335 days) [04:37:53] RECOVERY - HTTPS_m.wikimediafoundation.org on cp1067 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 335 days) [04:38:00] RECOVERY - HTTPS_wikinews.org on cp3022 is OK: SSL_CERT OK - X.509 certificate for *.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:09 2015 GMT (expires in 335 days) [04:38:00] RECOVERY - HTTPS_wikipedia.org on cp4015 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:06:02 2015 GMT (expires in 335 days) [04:38:00] RECOVERY - HTTPS_wikimedia.org on cp4001 is OK: SSL_CERT OK - X.509 certificate for *.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 4 21:06:06 2015 GMT (expires in 317 days) [04:38:00] RECOVERY - HTTPS_wikivoyage.org on cp3014 is OK: SSL_CERT OK - X.509 certificate for *.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:09 2015 GMT (expires in 335 days) [04:40:42] RECOVERY - HTTPS_wikiquote.org on amssq37 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 335 days) [04:40:52] RECOVERY - HTTPS_wikidata.org on amssq33 is OK: SSL_CERT OK - X.509 certificate for *.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:02 2015 GMT (expires in 335 days) [04:44:42] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [04:50:50] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [04:59:14] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [05:33:14] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [05:39:10] PROBLEM - HTTPS_wiktionary.org on cp3022 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:11] PROBLEM - HTTPS_m.wikipedia.org on cp4009 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:11] PROBLEM - HTTPS_m.wikibooks.org on amssq34 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:11] PROBLEM - HTTPS_wikivoyage.org on cp3016 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:11] PROBLEM - HTTPS_zero.wikipedia.org on amssq36 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:11] PROBLEM - HTTPS_wikimedia.org on cp4017 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:11] PROBLEM - HTTPS_m.wikipedia.org on cp4013 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:21] PROBLEM - HTTPS_m.wikimediafoundation.org on cp4012 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:21] PROBLEM - HTTPS_m.wikipedia.org on amssq41 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:21] PROBLEM - HTTPS_unified on cp3009 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:21] PROBLEM - HTTPS_m.mediawiki.org on amssq47 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:39:21] PROBLEM - HTTPS_m.mediawiki.org on cp3019 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:42:06] PROBLEM - HTTPS_m.wikibooks.org on cp4011 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:42:22] RECOVERY - HTTPS_zero.wikipedia.org on amssq36 is OK: SSL_CERT OK - X.509 certificate for *.zero.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:05 2015 GMT (expires in 335 days) [05:42:28] RECOVERY - HTTPS_m.wikibooks.org on amssq34 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 335 days) [05:42:29] RECOVERY - HTTPS_m.wikipedia.org on cp4009 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 335 days) [05:42:29] PROBLEM - HTTPS_m.wikipedia.org on cp3018 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:42:30] PROBLEM - HTTPS on sodium is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:44:24] RECOVERY - HTTPS_m.wikibooks.org on cp4011 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 335 days) [05:44:47] PROBLEM - HTTPS_m.wikipedia.org on cp1054 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:45:13] RECOVERY - HTTPS_wikivoyage.org on cp3016 is OK: SSL_CERT OK - X.509 certificate for *.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:09 2015 GMT (expires in 335 days) [05:45:20] RECOVERY - HTTPS_m.wikipedia.org on cp3018 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 335 days) [05:45:20] RECOVERY - HTTPS_wikimedia.org on cp4017 is OK: SSL_CERT OK - X.509 certificate for *.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 4 21:06:06 2015 GMT (expires in 317 days) [05:45:20] PROBLEM - HTTPS_m.wikivoyage.org on cp4019 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:45:21] PROBLEM - HTTPS_m.wikimedia.org on amssq31 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:45:21] PROBLEM - HTTPS_wikibooks.org on cp3020 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:48:17] PROBLEM - HTTPS on sodium is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:49:50] RECOVERY - HTTPS on sodium is OK: SSL_CERT OK - X.509 certificate for lists.wikimedia.org from RapidSSL CA valid until Jan 31 02:58:36 2016 GMT (expires in 404 days) [05:50:21] RECOVERY - HTTPS_m.wikipedia.org on cp1054 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 335 days) [05:50:44] RECOVERY - HTTPS_wikibooks.org on cp3020 is OK: SSL_CERT OK - X.509 certificate for *.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:03 2015 GMT (expires in 335 days) [05:50:44] RECOVERY - HTTPS_m.wikibooks.org on cp1048 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 335 days) [05:50:44] RECOVERY - HTTPS_m.wikinews.org on amssq42 is OK: SSL_CERT OK - X.509 certificate for *.m.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:02 2015 GMT (expires in 335 days) [05:50:44] RECOVERY - HTTPS_wikisource.org on cp3017 is OK: SSL_CERT OK - X.509 certificate for *.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:09 2015 GMT (expires in 335 days) [05:50:44] RECOVERY - HTTPS_m.wikipedia.org on amssq41 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 335 days) [05:57:02] PROBLEM - HTTPS_wikiquote.org on cp3005 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:57:02] PROBLEM - HTTPS_m.wiktionary.org on cp3013 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:57:02] PROBLEM - HTTPS_m.wikisource.org on cp4004 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [05:57:06] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [06:00:15] RECOVERY - HTTPS_m.wikisource.org on cp4004 is OK: SSL_CERT OK - X.509 certificate for *.m.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:02 2015 GMT (expires in 335 days) [06:00:18] RECOVERY - HTTPS_m.wiktionary.org on cp3013 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 335 days) [06:00:18] RECOVERY - HTTPS_wikiquote.org on cp3005 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 335 days) [06:18:07] PROBLEM - HTTPS_wikiquote.org on cp4013 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [06:18:08] PROBLEM - HTTPS_m.wikivoyage.org on cp1047 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [06:18:08] PROBLEM - HTTPS_wikidata.org on cp1064 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [06:18:08] PROBLEM - HTTPS_wikimedia.org on cp1068 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [06:18:08] PROBLEM - HTTPS_m.wikipedia.org on cp1064 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [06:18:08] PROBLEM - HTTPS_m.wikivoyage.org on amssq44 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [06:20:03] RECOVERY - HTTPS_m.wikimedia.org on amssq38 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:07 2015 GMT (expires in 335 days) [06:20:08] RECOVERY - HTTPS_m.mediawiki.org on amssq47 is OK: SSL_CERT OK - X.509 certificate for *.m.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:04 2015 GMT (expires in 335 days) [06:20:14] RECOVERY - HTTPS_m.wikivoyage.org on cp1047 is OK: SSL_CERT OK - X.509 certificate for *.m.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:02 2015 GMT (expires in 335 days) [06:20:25] RECOVERY - HTTPS_m.wikimedia.org on cp1052 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:07 2015 GMT (expires in 335 days) [06:20:25] RECOVERY - HTTPS_wikivoyage.org on cp3015 is OK: SSL_CERT OK - X.509 certificate for *.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:09 2015 GMT (expires in 335 days) [06:20:47] RECOVERY - HTTPS_m.wikibooks.org on cp1039 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 335 days) [06:20:48] RECOVERY - HTTPS_wikidata.org on cp1064 is OK: SSL_CERT OK - X.509 certificate for *.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:02 2015 GMT (expires in 335 days) [06:20:50] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [06:20:56] RECOVERY - HTTPS_wikipedia.org on cp1053 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:06:02 2015 GMT (expires in 334 days) [06:20:57] RECOVERY - HTTPS_m.wikinews.org on amssq59 is OK: SSL_CERT OK - X.509 certificate for *.m.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:02 2015 GMT (expires in 335 days) [06:21:23] RECOVERY - HTTPS_wikiquote.org on cp4013 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 335 days) [06:21:24] RECOVERY - HTTPS_m.wikivoyage.org on cp1057 is OK: SSL_CERT OK - X.509 certificate for *.m.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:02 2015 GMT (expires in 335 days) [06:21:24] RECOVERY - HTTPS_m.wikimediafoundation.org on cp3022 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 335 days) [06:21:24] RECOVERY - HTTPS_m.wiktionary.org on cp3012 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 335 days) [06:21:24] RECOVERY - HTTPS_m.wikibooks.org on cp4012 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 335 days) [06:21:24] RECOVERY - HTTPS_wikivoyage.org on amssq44 is OK: SSL_CERT OK - X.509 certificate for *.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:09 2015 GMT (expires in 335 days) [06:21:25] RECOVERY - HTTPS_m.wikibooks.org on cp3012 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 334 days) [06:21:25] RECOVERY - HTTPS_wikimedia.org on cp1068 is OK: SSL_CERT OK - X.509 certificate for *.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 4 21:06:06 2015 GMT (expires in 317 days) [06:21:25] RECOVERY - HTTPS_unified on cp3016 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:06:02 2015 GMT (expires in 334 days) [06:21:42] RECOVERY - HTTPS_m.wikibooks.org on cp3017 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 334 days) [06:21:42] RECOVERY - HTTPS_m.wikisource.org on amssq41 is OK: SSL_CERT OK - X.509 certificate for *.m.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:02 2015 GMT (expires in 335 days) [06:21:54] RECOVERY - HTTPS_wikipedia.org on amssq47 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:06:02 2015 GMT (expires in 334 days) [06:21:54] RECOVERY - HTTPS_m.wikipedia.org on cp1064 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 334 days) [06:21:54] RECOVERY - HTTPS_m.wikiversity.org on cp1055 is OK: SSL_CERT OK - X.509 certificate for *.m.wikiversity.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:07 2015 GMT (expires in 335 days) [06:21:54] RECOVERY - HTTPS_m.wikiversity.org on cp1038 is OK: SSL_CERT OK - X.509 certificate for *.m.wikiversity.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:07 2015 GMT (expires in 335 days) [06:21:55] RECOVERY - HTTPS_wikimedia.org on cp4001 is OK: SSL_CERT OK - X.509 certificate for *.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 4 21:06:06 2015 GMT (expires in 317 days) [06:21:55] RECOVERY - HTTPS_m.wikibooks.org on cp1052 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 334 days) [06:21:55] RECOVERY - HTTPS_m.wikipedia.org on cp1069 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 334 days) [06:21:56] RECOVERY - HTTPS_wikiquote.org on cp4012 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 335 days) [06:21:58] RECOVERY - HTTPS_m.wikivoyage.org on amssq44 is OK: SSL_CERT OK - X.509 certificate for *.m.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:02 2015 GMT (expires in 335 days) [06:21:58] RECOVERY - HTTPS_m.wikimediafoundation.org on cp3004 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 335 days) [06:21:58] RECOVERY - HTTPS_mediawiki.org on cp3017 is OK: SSL_CERT OK - X.509 certificate for *.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:02 2015 GMT (expires in 335 days) [06:24:32] RECOVERY - HTTPS_m.wikibooks.org on amssq41 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 334 days) [06:24:34] RECOVERY - HTTPS_wiktionary.org on cp4008 is OK: SSL_CERT OK - X.509 certificate for *.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:05 2015 GMT (expires in 335 days) [06:33:36] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:49] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:25] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:36] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:36] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:57] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [06:46:09] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:46:20] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:46:29] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:42] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:49:36] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:15:23] PROBLEM - HTTPS_m.wiktionary.org on amssq61 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:23] PROBLEM - HTTPS_wikiversity.org on cp1038 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:23] PROBLEM - HTTPS_mediawiki.org on amssq36 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:23] PROBLEM - HTTPS_m.wikimediafoundation.org on amssq32 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:31] PROBLEM - HTTPS_wikiquote.org on cp3015 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:31] PROBLEM - HTTPS_zero.wikipedia.org on cp1061 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:31] PROBLEM - HTTPS_wikisource.org on cp1039 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:31] PROBLEM - HTTPS_wikisource.org on cp4010 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:31] PROBLEM - HTTPS_wikiversity.org on cp3014 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:32] PROBLEM - HTTPS_wikiquote.org on amssq44 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:32] PROBLEM - HTTPS_m.wikipedia.org on cp1051 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:15:32] PROBLEM - HTTPS_m.wikidata.org on cp1050 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:17:58] RECOVERY - HTTPS_m.wiktionary.org on amssq61 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 334 days) [07:17:58] RECOVERY - HTTPS_wikiversity.org on cp1038 is OK: SSL_CERT OK - X.509 certificate for *.wikiversity.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:04 2015 GMT (expires in 334 days) [07:17:58] RECOVERY - HTTPS_mediawiki.org on amssq36 is OK: SSL_CERT OK - X.509 certificate for *.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:02 2015 GMT (expires in 334 days) [07:17:58] RECOVERY - HTTPS_m.wikimediafoundation.org on amssq32 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 334 days) [07:18:12] RECOVERY - HTTPS_zero.wikipedia.org on cp1061 is OK: SSL_CERT OK - X.509 certificate for *.zero.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:05 2015 GMT (expires in 334 days) [07:18:12] RECOVERY - HTTPS_wikiquote.org on cp3015 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 334 days) [07:18:13] RECOVERY - HTTPS_wikisource.org on cp1039 is OK: SSL_CERT OK - X.509 certificate for *.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:09 2015 GMT (expires in 334 days) [07:18:22] RECOVERY - HTTPS_m.wikipedia.org on cp1051 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 334 days) [07:18:27] RECOVERY - HTTPS_wikisource.org on cp4010 is OK: SSL_CERT OK - X.509 certificate for *.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:09 2015 GMT (expires in 334 days) [07:18:28] RECOVERY - HTTPS_wikiquote.org on amssq44 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 334 days) [07:18:29] RECOVERY - HTTPS_m.wikidata.org on cp1050 is OK: SSL_CERT OK - X.509 certificate for *.m.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:04 2015 GMT (expires in 334 days) [07:18:30] RECOVERY - HTTPS_wikiversity.org on cp3014 is OK: SSL_CERT OK - X.509 certificate for *.wikiversity.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:04 2015 GMT (expires in 334 days) [07:18:31] RECOVERY - HTTPS_m.wikisource.org on amssq41 is OK: SSL_CERT OK - X.509 certificate for *.m.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:02 2015 GMT (expires in 334 days) [07:18:42] RECOVERY - HTTPS_wikiquote.org on amssq51 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 334 days) [07:18:45] RECOVERY - HTTPS_m.wikisource.org on amssq57 is OK: SSL_CERT OK - X.509 certificate for *.m.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:02 2015 GMT (expires in 334 days) [07:18:45] RECOVERY - HTTPS_m.wikisource.org on cp3005 is OK: SSL_CERT OK - X.509 certificate for *.m.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:02 2015 GMT (expires in 334 days) [07:19:06] RECOVERY - HTTPS_m.wikimedia.org on amssq59 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:07 2015 GMT (expires in 334 days) [07:19:06] RECOVERY - HTTPS_zero.wikipedia.org on cp4020 is OK: SSL_CERT OK - X.509 certificate for *.zero.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:05 2015 GMT (expires in 334 days) [07:19:06] RECOVERY - HTTPS_wikimedia.org on cp1060 is OK: SSL_CERT OK - X.509 certificate for *.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 4 21:06:06 2015 GMT (expires in 317 days) [07:19:06] RECOVERY - HTTPS_m.wikibooks.org on amssq44 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 334 days) [07:19:06] RECOVERY - HTTPS_wikidata.org on amssq51 is OK: SSL_CERT OK - X.509 certificate for *.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:02 2015 GMT (expires in 334 days) [07:19:07] RECOVERY - HTTPS_m.wikidata.org on amssq48 is OK: SSL_CERT OK - X.509 certificate for *.m.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:04 2015 GMT (expires in 334 days) [07:19:07] RECOVERY - HTTPS_wikivoyage.org on cp3021 is OK: SSL_CERT OK - X.509 certificate for *.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:09 2015 GMT (expires in 334 days) [07:19:08] RECOVERY - HTTPS_wikiquote.org on cp4007 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 334 days) [07:19:08] RECOVERY - HTTPS_wiktionary.org on cp4004 is OK: SSL_CERT OK - X.509 certificate for *.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:05 2015 GMT (expires in 334 days) [07:19:09] RECOVERY - HTTPS_mediawiki.org on amssq62 is OK: SSL_CERT OK - X.509 certificate for *.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:02 2015 GMT (expires in 334 days) [07:19:09] RECOVERY - HTTPS_m.wikimediafoundation.org on amssq45 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 334 days) [07:19:10] RECOVERY - HTTPS_m.wikisource.org on cp4001 is OK: SSL_CERT OK - X.509 certificate for *.m.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:02 2015 GMT (expires in 334 days) [07:19:10] RECOVERY - HTTPS_m.wikibooks.org on cp1051 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 334 days) [07:22:01] RECOVERY - HTTPS_wikimedia.org on amssq51 is OK: SSL_CERT OK - X.509 certificate for *.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 4 21:06:06 2015 GMT (expires in 317 days) [07:22:19] RECOVERY - HTTPS_m.wikimediafoundation.org on amssq61 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 334 days) [07:22:23] RECOVERY - HTTPS_m.wikidata.org on amssq55 is OK: SSL_CERT OK - X.509 certificate for *.m.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:04 2015 GMT (expires in 334 days) [07:22:23] RECOVERY - HTTPS_wikiquote.org on amssq56 is OK: SSL_CERT OK - X.509 certificate for *.wikiquote.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:05 2015 GMT (expires in 334 days) [07:22:23] RECOVERY - HTTPS_m.wikinews.org on cp4002 is OK: SSL_CERT OK - X.509 certificate for *.m.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:02 2015 GMT (expires in 334 days) [07:22:24] RECOVERY - HTTPS_m.wikimedia.org on cp3005 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:07 2015 GMT (expires in 334 days) [07:22:24] RECOVERY - HTTPS_m.wikipedia.org on amssq51 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 334 days) [07:22:24] RECOVERY - HTTPS_wiktionary.org on cp3004 is OK: SSL_CERT OK - X.509 certificate for *.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:05 2015 GMT (expires in 334 days) [07:22:24] RECOVERY - HTTPS_m.wikimedia.org on cp4018 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:07 2015 GMT (expires in 334 days) [07:25:19] RECOVERY - HTTPS_m.wikinews.org on cp1037 is OK: SSL_CERT OK - X.509 certificate for *.m.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:02 2015 GMT (expires in 334 days) [07:25:19] RECOVERY - HTTPS_mediawiki.org on cp3006 is OK: SSL_CERT OK - X.509 certificate for *.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:02 2015 GMT (expires in 334 days) [07:25:19] RECOVERY - HTTPS_m.wikivoyage.org on cp4008 is OK: SSL_CERT OK - X.509 certificate for *.m.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:02 2015 GMT (expires in 334 days) [07:25:19] RECOVERY - HTTPS_wmfusercontent.org on cp1044 is OK: SSL_CERT OK - X.509 certificate for *.wmfusercontent.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Sep 12 13:41:12 2015 GMT (expires in 263 days) [07:25:19] RECOVERY - HTTPS_wikinews.org on cp4005 is OK: SSL_CERT OK - X.509 certificate for *.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:09 2015 GMT (expires in 334 days) [07:25:19] RECOVERY - HTTPS_zero.wikipedia.org on cp1062 is OK: SSL_CERT OK - X.509 certificate for *.zero.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:05 2015 GMT (expires in 334 days) [07:25:20] RECOVERY - HTTPS_wikiversity.org on amssq36 is OK: SSL_CERT OK - X.509 certificate for *.wikiversity.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:04 2015 GMT (expires in 334 days) [07:25:20] RECOVERY - HTTPS_mediawiki.org on cp1052 is OK: SSL_CERT OK - X.509 certificate for *.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:02 2015 GMT (expires in 334 days) [07:25:20] RECOVERY - HTTPS_wikiversity.org on cp3018 is OK: SSL_CERT OK - X.509 certificate for *.wikiversity.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:04 2015 GMT (expires in 334 days) [07:25:21] RECOVERY - HTTPS_unified on cp4006 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:06:02 2015 GMT (expires in 334 days) [07:25:21] RECOVERY - HTTPS_m.wikipedia.org on cp1060 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 334 days) [07:25:22] RECOVERY - HTTPS_mediawiki.org on amssq57 is OK: SSL_CERT OK - X.509 certificate for *.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:02 2015 GMT (expires in 334 days) [07:25:22] RECOVERY - HTTPS_wikidata.org on cp1067 is OK: SSL_CERT OK - X.509 certificate for *.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:02 2015 GMT (expires in 334 days) [07:25:23] RECOVERY - HTTPS_m.wikivoyage.org on cp1054 is OK: SSL_CERT OK - X.509 certificate for *.m.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:02 2015 GMT (expires in 334 days) [07:25:23] RECOVERY - HTTPS_m.wikinews.org on cp4003 is OK: SSL_CERT OK - X.509 certificate for *.m.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:02 2015 GMT (expires in 334 days) [07:25:24] RECOVERY - HTTPS_wikinews.org on cp3003 is OK: SSL_CERT OK - X.509 certificate for *.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:09 2015 GMT (expires in 334 days) [07:29:09] (03PS1) 10KartikMistry: WIP: Content Translation configuration for Production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181546 [07:31:33] PROBLEM - HTTPS_wikimediafoundation.org on amssq48 is CRITICAL: SSL_CERT CRITICAL: Error: [07:31:50] RECOVERY - HTTPS_m.wiktionary.org on cp1040 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 334 days) [07:33:23] (03PS2) 10KartikMistry: Don't install update-notifier-common on Jessie [puppet] - 10https://gerrit.wikimedia.org/r/181539 (owner: 10Andrew Bogott) [07:37:05] RECOVERY - HTTPS_wikimediafoundation.org on amssq48 is OK: SSL_CERT OK - X.509 certificate for *.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:02 2015 GMT (expires in 334 days) [07:48:04] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [07:51:49] (03PS1) 10Yuvipanda: toollabs: Add support for running uwsgi web services [puppet] - 10https://gerrit.wikimedia.org/r/181547 [07:51:58] (03CR) 10jenkins-bot: [V: 04-1] toollabs: Add support for running uwsgi web services [puppet] - 10https://gerrit.wikimedia.org/r/181547 (owner: 10Yuvipanda) [07:52:23] (03PS2) 10Yuvipanda: toollabs: Add support for running uwsgi web services [puppet] - 10https://gerrit.wikimedia.org/r/181547 [07:55:14] PROBLEM - Router interfaces on mr1-esams is CRITICAL: CRITICAL: host 91.198.174.247, interfaces up: 36, down: 1, dormant: 0, excluded: 1, unused: 0BRge-0/0/0: down - Core: msw-oe12-esamsBR [07:58:41] PROBLEM - HTTPS_m.wikibooks.org on amssq50 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [07:58:53] PROBLEM - HTTPS_m.wikimediafoundation.org on amssq31 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [08:01:35] RECOVERY - HTTPS_m.wikibooks.org on amssq50 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 334 days) [08:02:35] PROBLEM - HTTPS_wikibooks.org on cp4004 is CRITICAL: SSL_CERT CRITICAL: Error: [08:02:38] PROBLEM - HTTPS_m.wiktionary.org on cp4020 is CRITICAL: SSL_CERT CRITICAL: Error: [08:02:38] RECOVERY - HTTPS_m.wikimediafoundation.org on amssq31 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 334 days) [08:08:40] PROBLEM - HTTPS_m.wiktionary.org on cp4020 is CRITICAL: SSL_CERT CRITICAL: Error: [08:08:40] PROBLEM - HTTPS_wikipedia.org on amssq40 is CRITICAL: SSL_CERT CRITICAL: Error: [08:13:30] RECOVERY - HTTPS_wikibooks.org on cp4004 is OK: SSL_CERT OK - X.509 certificate for *.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:03 2015 GMT (expires in 334 days) [08:13:30] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [08:13:31] RECOVERY - HTTPS_m.wiktionary.org on cp4020 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 334 days) [08:13:31] RECOVERY - HTTPS_wikipedia.org on amssq40 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:06:02 2015 GMT (expires in 334 days) [08:20:05] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [08:22:57] <_joe_> whoa what's this list of alarms? [08:26:01] FYI, git.wikimedia.org has been down for the last few hours [08:26:21] Don't think that is linked to the icinga alarms :) [08:26:32] <_joe_> tto2: well it has its own icinga alarm [08:26:44] <_joe_> but gitblit is let's say "shaky" [08:26:59] indeed. Does it usually come back of its own accord? [08:27:09] <_joe_> sometimes yes [08:27:13] * _joe_ taking a look [08:30:24] (03PS1) 10Yuvipanda: tools: Add jdk-8 to trusty nodes [puppet] - 10https://gerrit.wikimedia.org/r/181548 [08:30:25] _joe_: wanna +1? ^ [08:30:32] <_joe_> !log restarting gitblit, stuck at 100% cpu on a thread [08:30:36] (03PS1) 10KartikMistry: Fix trailing spaces [puppet] - 10https://gerrit.wikimedia.org/r/181549 [08:30:39] <_joe_> YuviPanda: I actually want to -1 that [08:30:39] Logged the message, Master [08:30:56] _joe_: package not considered stable enough? [08:31:02] <_joe_> ph for toollabs [08:31:15] ph? [08:31:18] <_joe_> YuviPanda: it's a backport I made from vivid, so yes [08:31:22] <_joe_> "oh" [08:31:25] oh [08:31:51] _joe_: well, it’s better than not having it, plus toollabs has ensure => latest so any changes will be reflected pretty quickly. [08:32:17] (03CR) 10Giuseppe Lavagetto: [C: 031] "As long as it's clear to our users that the java 8 support is experimental and not supported right now, it should be ok." [puppet] - 10https://gerrit.wikimedia.org/r/181548 (owner: 10Yuvipanda) [08:32:25] _joe_: ty [08:32:27] <_joe_> YuviPanda: my point is, it's unmaintained [08:32:39] <_joe_> and I won't work on bugs if not strictly necessary [08:32:45] _joe_: hmm, but if we do end up using Titan, it will be maintained. [08:32:51] <_joe_> so present it as an experimental, alpha feature [08:32:51] _joe_: maybe I should wait until *that* determination is made. [08:32:55] <_joe_> *if* [08:32:57] <_joe_> yeah [08:33:02] hmm, I can do that [08:33:07] <_joe_> I'm honestly not sure about that [08:33:20] about what? wether we’ll use titan? [08:33:41] or if the java 8 packages will be maintained if we do use it? [08:33:42] <_joe_> I'm heavily underwhelmed by the quality of all the graph databases I've seen [08:33:51] <_joe_> that we'll use titan [08:34:00] heh, not surprised (re: quality) [08:34:12] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [08:34:14] <_joe_> I mean, it's probably the least lame candidate, but I don't like it either [08:34:54] <_joe_> tto2: why did we move gitblit behind misc-web? [08:34:59] <_joe_> oh sorry [08:35:04] <_joe_> that wasn't for you [08:35:05] <_joe_> :) [08:35:34] <_joe_> tto2: I'm working on it, but gitblit is in a bad state - this happens when repositories get pruned of old branches [08:35:56] _joe_: Thanks :) [08:36:02] (03CR) 10Yuvipanda: [C: 04-2] "We have a trusty package thanks to T78267, but it might or might not be maintained depending on wether we end up using Titan or not. I sha" [puppet] - 10https://gerrit.wikimedia.org/r/181548 (owner: 10Yuvipanda) [08:36:49] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57901 bytes in 0.115 second response time [08:49:27] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [08:57:27] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:57:32] (03CR) 10Yuvipanda: [C: 032] toollabs: Add support for running uwsgi web services [puppet] - 10https://gerrit.wikimedia.org/r/181547 (owner: 10Yuvipanda) [08:59:16] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [09:05:47] <_joe_> sigh, gitblit down again [09:06:34] greetings [09:06:39] <_joe_> hey [09:06:41] <_joe_> :P [09:06:55] <_joe_> this wasn't actually coordinated in any way [09:12:12] (03PS1) 10Giuseppe Lavagetto: hiera: make puppet fail if the mwyaml backend fails to lookup [puppet] - 10https://gerrit.wikimedia.org/r/181550 [09:13:25] <_joe_> YuviPanda: this ^^ should DTRT [09:13:35] <_joe_> but lemme test it first :) [09:14:32] :) [09:17:22] _joe_: haha thanks [09:17:46] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [09:22:51] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57901 bytes in 0.088 second response time [09:23:07] <_joe_> oh someone made one of our facts run choke when ran outside of our env, how nice [09:23:33] <_joe_> so I can't test our code locally on my machine [09:23:37] <_joe_> nice nice nice [09:37:02] PROBLEM - HTTPS_wikidata.org on cp4001 is CRITICAL: SSL_CERT CRITICAL: Error: [09:37:03] PROBLEM - HTTPS_m.wikimedia.org on cp3006 is CRITICAL: SSL_CERT CRITICAL: Error: [09:37:03] PROBLEM - HTTPS_wikinews.org on cp3015 is CRITICAL: SSL_CERT CRITICAL: Error: [09:37:03] PROBLEM - HTTPS_m.wikivoyage.org on amssq59 is CRITICAL: SSL_CERT CRITICAL: Error: [09:37:03] PROBLEM - HTTPS_zero.wikipedia.org on amssq35 is CRITICAL: SSL_CERT CRITICAL: Error: [09:37:03] PROBLEM - HTTPS_wikidata.org on cp3017 is CRITICAL: SSL_CERT CRITICAL: Error: [09:37:03] PROBLEM - HTTPS_m.mediawiki.org on amssq32 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:37:04] PROBLEM - HTTPS_m.wikimediafoundation.org on cp4003 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:37:04] PROBLEM - HTTPS_wikiversity.org on cp3022 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:37:05] PROBLEM - HTTPS_wikivoyage.org on cp1070 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:37:05] PROBLEM - HTTPS_mediawiki.org on cp3018 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:37:06] PROBLEM - HTTPS_wikipedia.org on cp1047 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:08] RECOVERY - HTTPS_wikidata.org on cp4001 is OK: SSL_CERT OK - X.509 certificate for *.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:02 2015 GMT (expires in 334 days) [09:40:08] RECOVERY - HTTPS_m.wikivoyage.org on amssq59 is OK: SSL_CERT OK - X.509 certificate for *.m.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:02 2015 GMT (expires in 334 days) [09:40:08] RECOVERY - HTTPS_m.wikimedia.org on cp3006 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:07 2015 GMT (expires in 334 days) [09:40:08] RECOVERY - HTTPS_zero.wikipedia.org on amssq35 is OK: SSL_CERT OK - X.509 certificate for *.zero.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:05 2015 GMT (expires in 334 days) [09:40:08] RECOVERY - HTTPS_wikinews.org on cp3015 is OK: SSL_CERT OK - X.509 certificate for *.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:09 2015 GMT (expires in 334 days) [09:40:08] RECOVERY - HTTPS_wikidata.org on cp3017 is OK: SSL_CERT OK - X.509 certificate for *.wikidata.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:26:02 2015 GMT (expires in 334 days) [09:40:08] PROBLEM - HTTPS_unified on amssq55 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:09] PROBLEM - HTTPS_m.wikipedia.org on cp3007 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:09] PROBLEM - HTTPS_wikiversity.org on amssq43 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:10] PROBLEM - HTTPS_wikivoyage.org on cp3019 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:10] PROBLEM - HTTPS_m.wikinews.org on cp1037 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:11] PROBLEM - HTTPS_m.wikisource.org on amssq50 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:11] PROBLEM - HTTPS_wikibooks.org on cp1059 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:12] PROBLEM - HTTPS_m.wiktionary.org on amssq43 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:12] PROBLEM - HTTPS_wikiversity.org on amssq34 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:13] PROBLEM - HTTPS_wikibooks.org on amssq59 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [09:40:26] mmh wut? [09:40:51] 'verify depth'? [09:42:40] RECOVERY - HTTPS_m.mediawiki.org on amssq32 is OK: SSL_CERT OK - X.509 certificate for *.m.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:04 2015 GMT (expires in 334 days) [09:42:42] RECOVERY - HTTPS_wikivoyage.org on cp1070 is OK: SSL_CERT OK - X.509 certificate for *.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:09 2015 GMT (expires in 334 days) [09:42:47] RECOVERY - HTTPS_m.wikipedia.org on cp3007 is OK: SSL_CERT OK - X.509 certificate for *.m.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:02 2015 GMT (expires in 334 days) [09:42:51] RECOVERY - HTTPS_m.wikimediafoundation.org on cp4003 is OK: SSL_CERT OK - X.509 certificate for *.m.wikimediafoundation.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:31:07 2015 GMT (expires in 334 days) [09:42:51] RECOVERY - HTTPS_wikiversity.org on cp3022 is OK: SSL_CERT OK - X.509 certificate for *.wikiversity.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:04 2015 GMT (expires in 334 days) [09:42:51] RECOVERY - HTTPS_wikipedia.org on cp1047 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:06:02 2015 GMT (expires in 334 days) [09:42:55] RECOVERY - HTTPS_unified on amssq55 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:06:02 2015 GMT (expires in 334 days) [09:42:59] RECOVERY - HTTPS_mediawiki.org on cp3018 is OK: SSL_CERT OK - X.509 certificate for *.mediawiki.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:51:02 2015 GMT (expires in 334 days) [09:42:59] RECOVERY - HTTPS_m.wikisource.org on amssq50 is OK: SSL_CERT OK - X.509 certificate for *.m.wikisource.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:02 2015 GMT (expires in 334 days) [09:43:00] RECOVERY - HTTPS_wikiversity.org on amssq34 is OK: SSL_CERT OK - X.509 certificate for *.wikiversity.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:04 2015 GMT (expires in 334 days) [09:43:00] RECOVERY - HTTPS_wikibooks.org on amssq59 is OK: SSL_CERT OK - X.509 certificate for *.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:03 2015 GMT (expires in 334 days) [09:43:00] RECOVERY - HTTPS_m.wiktionary.org on amssq43 is OK: SSL_CERT OK - X.509 certificate for *.m.wiktionary.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:46:07 2015 GMT (expires in 334 days) [09:43:00] RECOVERY - HTTPS_wikivoyage.org on cp3019 is OK: SSL_CERT OK - X.509 certificate for *.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:09 2015 GMT (expires in 334 days) [09:43:00] RECOVERY - HTTPS_wikiversity.org on amssq43 is OK: SSL_CERT OK - X.509 certificate for *.wikiversity.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:04 2015 GMT (expires in 334 days) [09:43:00] RECOVERY - HTTPS_m.wikinews.org on cp1037 is OK: SSL_CERT OK - X.509 certificate for *.m.wikinews.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:36:02 2015 GMT (expires in 334 days) [09:43:01] RECOVERY - HTTPS_wikibooks.org on cp1059 is OK: SSL_CERT OK - X.509 certificate for *.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:03 2015 GMT (expires in 334 days) [09:44:05] what the hell [09:49:40] That doesn't seem like the kind of thing that should break and then fix itself spontaneously [09:50:56] wild guess, it is the check barfing on the unified cert? [09:51:18] no [09:51:27] these errors usually mean connectivity is broken [09:52:13] seems widespread across though, including eqiad [09:53:05] godog: or just neon [09:53:06] (03PS2) 10Giuseppe Lavagetto: hiera: make puppet fail if the mwyaml backend fails to lookup [puppet] - 10https://gerrit.wikimedia.org/r/181550 [09:53:17] <_joe_> I think that's neon actually [09:53:47] yeah most probably [09:53:50] but why is the real question [09:54:33] 09:54:27 up 16:07, 1 user, load average: 499.38, 673.85, 688.95 [09:54:36] that would do it wouldn't it [09:54:37] <_joe_> paravoid: neon being horribly overloaded and the check timing out because it actually needs some cpu cycles to do the ssl check? [09:54:53] <_joe_> paravoid: I was about to link the load average ganglia graph :/ [09:55:54] faidon@neon:~$ ps aux |grep -c ssl [09:55:54] 4132 [09:56:11] http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=neon.wikimedia.org&m=cpu_report&r=week&s=by%20name&hc=4&mc=2&st=1419328446&g=load_report&z=large&c=Miscellaneous%20eqiad [09:56:15] http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=neon.wikimedia.org&m=cpu_report&r=week&s=by%20name&hc=4&mc=2&st=1419328446&g=mem_report&z=large&c=Miscellaneous%20eqiad [09:56:48] <_joe_> some check we added lately? [09:57:01] well, brandon added all those SSL checks lately [09:57:19] but if you see the yearly load gaph [09:57:20] <_joe_> maybe we don't need to perform those on every host? [09:57:36] the first spike starts at ~11/9 [09:57:57] then there's a drop and then a bigger spike at 28/11 [09:58:04] then getting progressively worse [09:59:01] <_joe_> can't we make those ssl checks be done via NRPE? [09:59:11] it can't be just the ssl checks [09:59:16] (and nrpe wouldn't help) [09:59:23] <_joe_> why not? [09:59:43] unless you wrote a new check that did multiple checks at once [09:59:47] a la check_mk or check_multi [09:59:58] <_joe_> (nrpe would move the processing on the client, that was my idea) [10:00:25] hmm, I see multiple sep 11 icing commits [10:00:36] <_joe_> that is when neon died [10:00:37] labmon for starters [10:00:54] <_joe_> of a hardware failure [10:00:56] check_graphite + labmon [10:01:04] it did? [10:01:08] remember, I wasn't around back then :) [10:01:08] <_joe_> yes [10:01:25] <_joe_> see the memory graph, we also upgraded the RAM [10:01:44] <_joe_> then, HHVM has added 2 check_graphite checks/mw server [10:02:01] <_joe_> but check_graphite should not use a lot of ram or cpu [10:02:43] so we upgraded the box, then the load started being in the hundreds all suddenly? that doesn't compute [10:02:53] http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&h=neon.wikimedia.org&m=cpu_report&s=by+name&mc=2&g=load_report&c=Miscellaneous+eqiad [10:03:23] <_joe_> we had a failed disk, so we had to rebuild neon from scratch, and I think there were quite a few unpuppetized things. Brandon surely has more info [10:04:07] <_joe_> mh actually the memory profile right now is quite strange [10:05:58] <_joe_> we're swapping but we're well below the server's memory limit [10:06:42] <_joe_> and check_ganglia is what's killing icinga as usual [10:07:05] <_joe_> every check_ganglia check eats up 500 mb of memory [10:07:12] <_joe_> at least [10:07:57] | |-icinga(420)---check_ssl_cert(425)---perl(25056) [10:08:00] | |-icinga(424)---check_ssl_cert(443)---perl(24940) [10:08:03] | |-icinga(429)---check_ssl_cert(444)---mktemp(21485) [10:08:06] | |-icinga(433)---check_ssl_cert(446)---mktemp(21369) [10:08:09] that ssl cert check is horribly inefficient [10:08:32] pretty sure that's what's killing the box right now [10:08:53] although I'm sure all those graphite checks against an overloaded graphite server aren't helping much [10:08:59] <_joe_> yeah apart from that, I was looking at the memory profile now, I should have specified that [10:09:26] CERT=$( mktemp -t "${0##*/}XXXXXX" 2> /dev/null ) [10:09:37] <_joe_> meh [10:09:42] if ! ${PERL} -e "use Date::Parse;" > /dev/null 2>&1 ; then [10:09:42] if [ -n "${VERBOSE}" ] ; then [10:09:42] echo "Perl module Date::Parse not installed: disabling date computations" [10:09:45] fi [10:09:47] PERL="" [10:09:50] fi [10:09:53] let's call perl -e "use Date::Parse" everytime! [10:10:01] <_joe_> ... [10:10:08] just to check if the module is installed! [10:10:47] and we run this 13 times for every varnish server I think [10:11:08] <_joe_> I guess so [10:12:17] * _joe_ is fighting with his own lame ruby skills right now [10:16:15] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [10:18:16] (03CR) 10Giuseppe Lavagetto: [C: 031] "My own testing (made by setting :host: http://localhost in the configuration) showed thisbehaves correctly now." [puppet] - 10https://gerrit.wikimedia.org/r/181550 (owner: 10Giuseppe Lavagetto) [10:19:00] <_joe_> paravoid: are you taking a look at this, or should I? (the ssl check in icinga) [10:36:29] (03CR) 10Dzahn: [C: 032] [English Planet] Add Geni, en.wiki/Commons sysop etc. [puppet] - 10https://gerrit.wikimedia.org/r/181104 (owner: 10Nemo bis) [10:42:09] (03CR) 10Dzahn: [C: 031] Give parsoid admins the ability to update/restart the RT testing service. [puppet] - 10https://gerrit.wikimedia.org/r/180221 (owner: 10Cscott) [10:48:35] (03CR) 10Dzahn: "cscott: re: scoping extra permissions to just ruthenium.we could make a new admin group like "parsoid-testers" or similar, give it all the" [puppet] - 10https://gerrit.wikimedia.org/r/180221 (owner: 10Cscott) [10:49:32] paravoid: feel like taking a look at that one? https://gerrit.wikimedia.org/r/#/c/180221/1/modules/admin/data/data.yaml [10:49:43] access request thing [10:50:04] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail [10:50:07] <_joe_> /home/parsoid-rt/update-code.sh [10:50:13] <_joe_> how is that file provisioned? [10:50:24] it's this: https://www.mediawiki.org/wiki/Parsoid/Round-trip_testing#Updating_the_code_to_test_.28and_being_run_by_the_clients.29 [10:50:30] <_joe_> I am already delighted [10:50:51] yea, dunno about "Normally, this is not needed since the code is updated every midnight (PST)" [10:51:41] <_joe_> well, technically the privs are correct. Didn't that became "privileges" btw?' [10:52:03] ah, good point, i think it did [10:52:56] <_joe_> mutante: is update-code.sh in puppet? [10:54:23] <_joe_> or in some other repo and deployed somehow? Also, I won't comment further on how bad it is to have a shell script that is writable by some user that probably is the same some web service run as [10:54:24] _joe_: i can't find it. looks like not :/ [10:56:32] hmmm "The instructions to set up a private instance of the round-trip test server are in Parsoid/tests/README." [10:57:16] <_joe_> round-trip test server? sounds interesting [10:57:20] <_joe_> I'll take a look [10:57:25] aka. "the other RT" :p [10:58:18] i guess update-code.sh must be in a parsoid repo..looking [11:00:46] /usr/lib/parsoid-clone on ruthenium has "mediawiki/services/parsoid" but dont see that script either [11:01:50] <_joe_> mh a few things baffle me, but I won't get involved in an architectural review of parsoid for this [11:02:04] <_joe_> I'll ask gabriel when he's around :) [11:02:12] heh, ok thanks [11:05:20] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:05:54] update-code.sh is mostly cd /usr/lib/parsoid ; git pull [11:13:37] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [11:28:14] (03CR) 10Dzahn: "https://phabricator.wikimedia.org/T78076 says this left the UID of Apache unspecified for all servers imaged/reimaged since 2014-05-29 and" [puppet] - 10https://gerrit.wikimedia.org/r/136151 (owner: 10Ori.livneh) [11:30:31] https://phabricator.wikimedia.org/T78076 sigh @ Apache UID [11:31:08] Apache UID on all appservers doesnt match Wikitech docs and prod vs labs [11:33:43] (03PS1) 10Filippo Giunchedi: admin: awight stats/hive access [puppet] - 10https://gerrit.wikimedia.org/r/181556 [11:38:15] mutante: isn't it 48? [11:38:56] matanya: uid=996(apache) gid=48(apache) [11:39:04] but should be 48:48 [11:39:17] ah, that is bad [11:40:57] _joe_: re uwsgi being nice, I discovered yesterday that its default SIGTERM behavior is a restart, rather than a terminate. You can enable the ‘classical’ behavior with a —die-on-term flag. [11:43:31] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [11:44:40] (03PS1) 10Giuseppe Lavagetto: puppet: enable the role-based backend in production [puppet] - 10https://gerrit.wikimedia.org/r/181557 [11:45:37] (03PS2) 10Giuseppe Lavagetto: puppet: enable the role-based backend in production [puppet] - 10https://gerrit.wikimedia.org/r/181557 [11:49:02] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [11:49:36] (03PS3) 10Giuseppe Lavagetto: puppet: enable the role-based backend in production [puppet] - 10https://gerrit.wikimedia.org/r/181557 [11:51:29] (03CR) 10Dzahn: "the request just talks about access to stat1002, so that would sound like the regular "users" class above should be enough. but for some r" [puppet] - 10https://gerrit.wikimedia.org/r/181556 (owner: 10Filippo Giunchedi) [11:53:32] godog: I wouldn’t exactly say ‘close to resolution’, more like ‘we have a working band aid!’. solution perhaps is to just, to disable coredumps entirely... [11:53:58] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:54:44] YuviPanda: hehehe "stop the bleeding", yeah what I meant that being able to disable coredumps for hhvm via hiera feels like resolution, judging by the ticket [11:55:19] godog: yeah, that’s true - it no l onger is filling up /var if that is enabled. I’ll let one of the betalabs folks close it [11:57:07] YuviPanda: ack, thanks! [12:07:37] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [12:08:43] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57889 bytes in 0.046 second response time [12:11:21] PROBLEM - HTTPS_zero.wikipedia.org on cp1068 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [12:11:23] PROBLEM - HTTPS_wikibooks.org on cp4016 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [12:11:24] PROBLEM - HTTPS_m.wikibooks.org on amssq53 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [12:11:25] PROBLEM - HTTPS_m.wikibooks.org on cp1070 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [12:11:29] PROBLEM - HTTPS_wikivoyage.org on amssq56 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [12:11:32] PROBLEM - HTTPS_m.wikipedia.org on amssq52 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [12:11:33] PROBLEM - HTTPS_wikinews.org on cp3015 is CRITICAL: SSL_CERT CRITICAL: Error: verify depth is 6 [12:13:24] RECOVERY - HTTPS_zero.wikipedia.org on cp1068 is OK: SSL_CERT OK - X.509 certificate for *.zero.wikipedia.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:16:05 2015 GMT (expires in 334 days) [12:13:24] RECOVERY - HTTPS_m.wikibooks.org on cp1070 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 334 days) [12:13:24] RECOVERY - HTTPS_m.wikibooks.org on amssq53 is OK: SSL_CERT OK - X.509 certificate for *.m.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:05 2015 GMT (expires in 334 days) [12:13:24] RECOVERY - HTTPS_wikibooks.org on cp4016 is OK: SSL_CERT OK - X.509 certificate for *.wikibooks.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:21:03 2015 GMT (expires in 334 days) [12:13:24] RECOVERY - HTTPS_wikivoyage.org on amssq56 is OK: SSL_CERT OK - X.509 certificate for *.wikivoyage.org from GlobalSign Organization Validation CA - SHA256 - G2 valid until Nov 22 18:41:09 2015 GMT (expires in 334 days) [12:31:11] There we go [12:35:54] that seems to wokr [12:35:58] haha, nice Bugzilla script [12:36:16] i run something with --dry-run and it confirms "** dry run : no changes to the database will be made **" [12:36:30] only to then complain that " The MariaDB server is running with the --read-only option so it cannot execute this statement [for Statement "INSERT INTO..." [12:36:41] (03CR) 10Yuvipanda: [C: 032] toollabs: Use a venv for uwsgi only if it exists [puppet] - 10https://gerrit.wikimedia.org/r/181559 (owner: 10Yuvipanda) [12:40:30] PROBLEM - puppet last run on mw1066 is CRITICAL: CRITICAL: puppet fail [12:40:31] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: puppet fail [12:40:40] PROBLEM - puppet last run on cp1040 is CRITICAL: CRITICAL: puppet fail [12:40:41] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: puppet fail [12:40:51] PROBLEM - puppet last run on mw1253 is CRITICAL: CRITICAL: puppet fail [12:41:03] PROBLEM - puppet last run on mw1155 is CRITICAL: CRITICAL: puppet fail [12:41:03] PROBLEM - puppet last run on dysprosium is CRITICAL: CRITICAL: puppet fail [12:41:16] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: puppet fail [12:41:45] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: puppet fail [12:42:07] <_joe_> mh [12:42:23] <_joe_> looking [12:42:31] PROBLEM - puppet last run on wtp1001 is CRITICAL: CRITICAL: puppet fail [12:42:33] mod_passenger again [12:42:38] on palladium [12:42:46] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: puppet fail [12:42:48] <_joe_> mutante: no [12:42:48] but it looks like already recovered [12:42:51] <_joe_> this is me [12:42:58] PROBLEM - puppet last run on lvs4004 is CRITICAL: CRITICAL: puppet fail [12:43:00] <_joe_> I screwed something up [12:43:02] oh,ok [12:43:12] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: puppet fail [12:43:12] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: puppet fail [12:43:12] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: puppet fail [12:43:12] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: puppet fail [12:43:14] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: puppet fail [12:43:14] PROBLEM - puppet last run on analytics1021 is CRITICAL: CRITICAL: puppet fail [12:43:21] <_joe_> can you kill icinga-wm ? [12:43:23] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail [12:43:24] i saw it finish a run on mw1066 [12:43:28] PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: puppet fail [12:43:29] yes [12:43:32] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: puppet fail [12:43:33] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: puppet fail [12:44:03] (03CR) 10Giuseppe Lavagetto: [C: 032] Revert "puppet: enable the role-based backend in production" [puppet] - 10https://gerrit.wikimedia.org/r/181561 (owner: 10Giuseppe Lavagetto) [12:44:20] <_joe_> this will let me see what exactly was wrong with it [12:49:44] <_joe_> why on earth is this not fixing itself [12:49:50] <_joe_> ah got it, I am dumb indeed [12:51:27] <_joe_> mutante: it's fixed now, but I'll wait before restarting icinga-wm [12:51:50] _joe_: ok, cool i also stopped the puppet agent on neon [12:51:57] because otherwise it restarts icinga-wm [12:52:29] <_joe_> well [12:53:55] <_joe_> when neon recovers it puppet runs, it will mean it's ok [12:54:37] ok:) i see the change being applied [12:55:21] <_joe_> what was ok in my test environ and wasn't in production? [12:57:14] <_joe_> aha! gotcha. [12:57:59] <_joe_> damn ruby 1.8 [12:59:55] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:59:57] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [12:59:57] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: puppet fail [12:59:57] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:59:58] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:03] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:03] PROBLEM - puppet last run on elastic1014 is CRITICAL: CRITICAL: puppet fail [13:00:03] PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: puppet fail [13:00:03] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:03] RECOVERY - puppet last run on analytics1021 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:04] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:04] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:06] PROBLEM - puppet last run on cp1038 is CRITICAL: CRITICAL: puppet fail [13:00:06] PROBLEM - puppet last run on mw1029 is CRITICAL: CRITICAL: puppet fail [13:00:16] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:16] PROBLEM - puppet last run on es1002 is CRITICAL: CRITICAL: puppet fail [13:00:16] PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: puppet fail [13:00:16] PROBLEM - puppet last run on analytics1014 is CRITICAL: CRITICAL: puppet fail [13:00:16] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:16] PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: puppet fail [13:00:16] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:16] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: puppet fail [13:00:16] RECOVERY - puppet last run on virt1012 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [13:00:24] RECOVERY - puppet last run on elastic1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:24] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: puppet fail [13:00:24] RECOVERY - puppet last run on analytics1019 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [13:00:24] RECOVERY - puppet last run on mw1047 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:24] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: puppet fail [13:00:24] RECOVERY - puppet last run on ms-be1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:25] RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:25] PROBLEM - puppet last run on db1057 is CRITICAL: CRITICAL: puppet fail [13:00:26] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [13:00:26] PROBLEM - puppet last run on search1015 is CRITICAL: CRITICAL: puppet fail [13:00:27] RECOVERY - puppet last run on search1021 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:27] RECOVERY - puppet last run on tmh1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:28] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:33] RECOVERY - puppet last run on mw1015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:34] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:34] RECOVERY - puppet last run on virt1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [13:00:34] PROBLEM - puppet last run on titanium is CRITICAL: CRITICAL: puppet fail [13:00:34] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:34] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:35] RECOVERY - puppet last run on db1009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:35] RECOVERY - puppet last run on erbium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:35] RECOVERY - puppet last run on es1006 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [13:00:36] RECOVERY - puppet last run on es1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:44] RECOVERY - puppet last run on db1041 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:44] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:44] RECOVERY - puppet last run on cp1043 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [13:00:44] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:44] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [13:00:44] RECOVERY - puppet last run on ms-be2015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:45] RECOVERY - puppet last run on wtp1024 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [13:00:48] <_joe_> meh [13:00:53] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:54] RECOVERY - puppet last run on search1014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:00:54] RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:00:54] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [13:00:55] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:55] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [13:01:02] <_joe_> it sent us some queued alarms as well [13:01:03] RECOVERY - puppet last run on wtp1021 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [13:01:03] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:01:03] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:01:03] RECOVERY - puppet last run on db1058 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:01:04] RECOVERY - puppet last run on mw1036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:01:21] RECOVERY - puppet last run on mw1096 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [13:01:21] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [13:01:21] RECOVERY - puppet last run on amslvs4 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:02:10] RECOVERY - puppet last run on pc1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:02:19] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [13:02:21] RECOVERY - puppet last run on mw1005 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [13:02:44] cool, grrrit-wm is now on trusty! [13:02:49] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:07] RECOVERY - puppet last run on analytics1017 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [13:03:11] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:12] sweet [13:03:14] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:15] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [13:03:30] RECOVERY - puppet last run on potassium is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [13:03:30] RECOVERY - puppet last run on mw1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:30] RECOVERY - puppet last run on mw1029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:30] RECOVERY - puppet last run on wtp1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:30] RECOVERY - puppet last run on search1020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:31] RECOVERY - puppet last run on mw1045 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [13:03:31] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:32] RECOVERY - puppet last run on mw1080 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:32] RECOVERY - puppet last run on search1016 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [13:03:33] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [13:03:33] RECOVERY - puppet last run on mw1040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:34] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [13:03:34] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:35] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:35] RECOVERY - puppet last run on amslvs2 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [13:03:36] RECOVERY - puppet last run on es1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:36] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:37] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:37] RECOVERY - puppet last run on gold is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [13:03:38] RECOVERY - puppet last run on elastic1001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [13:03:38] RECOVERY - puppet last run on elastic1031 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:39] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:39] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:40] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:46] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:46] RECOVERY - puppet last run on mw1059 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [13:03:47] RECOVERY - puppet last run on analytics1020 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [13:03:47] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [13:03:47] RECOVERY - puppet last run on dbproxy1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:47] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [13:03:47] RECOVERY - puppet last run on neptunium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:48] RECOVERY - puppet last run on mw1006 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [13:03:48] RECOVERY - puppet last run on ms-be1003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [13:03:49] RECOVERY - puppet last run on es2008 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [13:03:49] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:50] RECOVERY - puppet last run on lvs4002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:50] RECOVERY - puppet last run on mw1257 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:03:51] RECOVERY - puppet last run on mc1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:03:51] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:03:52] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:03:52] RECOVERY - puppet last run on mw1062 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:03:55] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [13:04:01] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:04:01] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [13:04:01] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:04:01] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:04:02] RECOVERY - puppet last run on logstash1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:04:02] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [13:04:02] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:04:04] valhallasw`cloud: let’s move logmsgbot to trusty! [13:04:06] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [13:04:10] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:04:10] RECOVERY - puppet last run on magnesium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:04:10] RECOVERY - puppet last run on db1031 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [13:04:19] RECOVERY - puppet last run on wtp1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:04:19] RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [13:04:19] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:04:19] RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [13:04:19] RECOVERY - puppet last run on mw1187 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [13:04:20] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:04:28] RECOVERY - puppet last run on mw1160 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [13:04:29] RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:04:31] YuviPanda: first +2 my suggested changes :> [13:04:39] valhallasw`cloud: oh? where? [13:04:48] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:04:50] https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/debs/adminbot,n,z [13:04:58] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:04:59] or is that not waht logmsgbot is [13:05:03] * valhallasw`cloud looks confused [13:05:03] valhallasw`cloud: why exactly is it a deb? [13:05:06] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:05:09] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:05:10] because no friggin' clue [13:05:17] !info [13:05:18] RECOVERY - puppet last run on platinum is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:05:18] RECOVERY - puppet last run on mw1031 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:05:18] RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:05:18] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [13:05:18] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:05:18] RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:05:19] RECOVERY - puppet last run on analytics1040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:05:19] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [13:05:30] RECOVERY - puppet last run on mw1117 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [13:05:34] RECOVERY - puppet last run on heze is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:05:42] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:05:47] !log help [13:05:47] I am a logbot running on tools-exec-14. [13:05:47] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [13:05:47] To log a message, type !log . [13:05:55] ah, that's morebots and not logmsgbot [13:06:00] then what is logmsgbot :| [13:06:16] RECOVERY - puppet last run on analytics1041 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:06:16] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:06:16] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [13:06:27] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:06:30] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:06:34] RECOVERY - puppet last run on mw1224 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:06:34] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:06:34] RECOVERY - puppet last run on elastic1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:06:34] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:06:34] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [13:06:34] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [13:06:39] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:06:40] "logmsgbot is udprec piped into ircecho. It outputs !log messages to IRC. These and other (manual) !log messages are read by morebots and saved to Server admin log." [13:06:43] I what [13:06:44] RECOVERY - puppet last run on mw1082 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:06:53] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [13:06:53] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:06:53] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [13:06:53] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:06:53] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [13:06:54] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:06:54] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:06:55] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [13:06:55] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:06:56] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:06:56] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:06:57] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:06:57] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:06:58] RECOVERY - puppet last run on analytics1010 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [13:06:58] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:03] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:05] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:05] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:06] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:11] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [13:07:11] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [13:07:11] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:13] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [13:07:13] RECOVERY - puppet last run on db1003 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [13:07:13] RECOVERY - puppet last run on db2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:13] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:13] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:14] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:14] RECOVERY - puppet last run on mw1026 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:15] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:15] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:07:16] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:16] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:23] RECOVERY - puppet last run on mw1217 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:26] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:26] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:26] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:26] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [13:07:27] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:27] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:07:27] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:28] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:28] RECOVERY - puppet last run on wtp1016 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:07:29] RECOVERY - puppet last run on elastic1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:29] RECOVERY - puppet last run on virt1004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [13:07:30] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:07:30] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [13:07:31] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [13:07:31] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [13:07:37] RECOVERY - puppet last run on cp1058 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [13:07:38] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:38] RECOVERY - puppet last run on mw1076 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [13:07:38] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:38] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [13:07:39] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [13:07:39] RECOVERY - puppet last run on elastic1008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:07:48] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [13:07:48] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:48] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:07:49] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:49] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [13:07:49] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:49] RECOVERY - puppet last run on search1018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:50] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:50] RECOVERY - puppet last run on db1043 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [13:07:51] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:53] RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:07:54] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:07:54] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:07:54] RECOVERY - puppet last run on stat1003 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [13:07:54] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [13:08:03] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:08:03] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:08:03] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:08:04] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [13:08:04] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:08:04] RECOVERY - puppet last run on virt1001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [13:08:07] RECOVERY - puppet last run on mw1149 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [13:08:07] RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:08:08] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:08:08] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:08:12] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [13:08:16] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [13:08:19] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:08:22] RECOVERY - puppet last run on db1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:08:22] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:08:22] RECOVERY - puppet last run on mw1044 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [13:08:22] RECOVERY - puppet last run on labstore1001 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [13:08:22] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [13:08:23] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [13:08:24] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:08:24] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:08:25] RECOVERY - puppet last run on search1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:08:25] RECOVERY - puppet last run on labnet1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:08:25] RECOVERY - puppet last run on lithium is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [13:08:43] RECOVERY - puppet last run on sodium is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [13:08:52] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:08:57] RECOVERY - puppet last run on wtp1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:08:58] <_joe_> brb [13:09:05] RECOVERY - puppet last run on polonium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:09:11] RECOVERY - puppet last run on wtp1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:09:27] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:09:27] RECOVERY - puppet last run on analytics1026 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [13:09:27] RECOVERY - puppet last run on db2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:09:27] RECOVERY - puppet last run on plutonium is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [13:09:27] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:09:28] RECOVERY - puppet last run on mw1055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:09:28] RECOVERY - puppet last run on mw1237 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:09:29] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:09:29] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:09:30] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:09:30] RECOVERY - puppet last run on ms-fe2003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:09:31] RECOVERY - puppet last run on labcontrol2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:09:40] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:09:44] RECOVERY - puppet last run on pc1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:09:45] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [13:09:46] RECOVERY - puppet last run on mw1180 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:02] RECOVERY - puppet last run on amslvs1 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:02] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:02] RECOVERY - puppet last run on es1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:02] RECOVERY - puppet last run on mw1034 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [13:10:03] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [13:10:03] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:03] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:04] RECOVERY - puppet last run on elastic1024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:11] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [13:10:12] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:12] RECOVERY - puppet last run on mw1054 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:12] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:12] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:12] RECOVERY - puppet last run on analytics1016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:13] RECOVERY - puppet last run on mw1114 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:13] RECOVERY - puppet last run on mw1049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:14] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:14] RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [13:10:15] RECOVERY - puppet last run on virt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:21] RECOVERY - puppet last run on mw1050 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [13:10:26] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [13:10:26] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [13:10:26] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:26] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:26] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:27] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [13:10:27] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:28] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:28] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:29] RECOVERY - puppet last run on mw1227 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:29] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:30] RECOVERY - puppet last run on rubidium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:30] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:31] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:31] RECOVERY - puppet last run on elastic1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:32] RECOVERY - puppet last run on mw1051 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:32] RECOVERY - puppet last run on elastic1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:33] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:33] RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [13:10:34] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [13:10:34] RECOVERY - puppet last run on db2037 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:35] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [13:10:35] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:36] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:36] RECOVERY - puppet last run on analytics1037 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [13:10:37] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:37] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:38] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:38] RECOVERY - puppet last run on db1055 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [13:10:41] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:41] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [13:10:41] RECOVERY - puppet last run on rdb1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:41] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:41] RECOVERY - puppet last run on analytics1022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:42] RECOVERY - puppet last run on analytics1032 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [13:10:42] RECOVERY - puppet last run on mw1081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:43] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:43] RECOVERY - puppet last run on virt1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:44] RECOVERY - puppet last run on labmon1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:44] RECOVERY - puppet last run on snapshot1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:45] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:45] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:46] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [13:10:46] RECOVERY - puppet last run on db2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:47] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [13:10:47] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:48] RECOVERY - puppet last run on mw1079 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:48] RECOVERY - puppet last run on search1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:49] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:49] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:50] RECOVERY - puppet last run on wtp1023 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [13:10:50] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:51] RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [13:10:51] RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:10:52] RECOVERY - puppet last run on elastic1011 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [13:10:52] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:53] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:53] RECOVERY - puppet last run on search1023 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [13:10:54] RECOVERY - puppet last run on mw1074 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [13:10:55] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:55] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:55] RECOVERY - puppet last run on thallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:56] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:56] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:57] RECOVERY - puppet last run on analytics1013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:57] RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [13:10:58] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:58] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:10:59] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:10:59] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:00] RECOVERY - puppet last run on mw1111 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:00] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [13:11:01] RECOVERY - puppet last run on haedus is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [13:11:09] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:09] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:09] RECOVERY - puppet last run on search1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:09] RECOVERY - puppet last run on mw1014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:09] RECOVERY - puppet last run on mw1056 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:09] RECOVERY - puppet last run on db2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:10] RECOVERY - puppet last run on virt1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:10] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [13:11:10] RECOVERY - puppet last run on mw1084 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:12] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [13:11:13] RECOVERY - puppet last run on mc1014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:24] RECOVERY - puppet last run on snapshot1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:27] RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:28] RECOVERY - puppet last run on wtp1011 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [13:11:35] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:35] RECOVERY - puppet last run on db1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:35] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:36] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:11:36] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:36] RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:36] RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:37] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:39] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:39] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:39] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:39] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:39] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:11:40] RECOVERY - puppet last run on search1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:11:40] RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:11:56] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:12:04] RECOVERY - puppet last run on mw1030 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:12:11] RECOVERY - puppet last run on db1062 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:11] RECOVERY - puppet last run on mw1057 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:14] RECOVERY - puppet last run on mw1097 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [13:12:14] RECOVERY - puppet last run on mw1146 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:14] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:12:14] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:14] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:12:32] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:34] RECOVERY - puppet last run on mw1087 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:12:34] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:12:34] RECOVERY - puppet last run on analytics1023 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:34] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:12:34] RECOVERY - puppet last run on ms-be1008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:12:35] RECOVERY - puppet last run on lvs1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:12:35] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:36] RECOVERY - puppet last run on hooft is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:12:36] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:12:37] RECOVERY - puppet last run on search1024 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:37] RECOVERY - puppet last run on wtp1022 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:12:38] RECOVERY - puppet last run on argon is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:12:38] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:13:18] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:25] RECOVERY - puppet last run on elastic1014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:27] RECOVERY - puppet last run on mw1023 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:28] RECOVERY - puppet last run on cp1038 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:28] RECOVERY - puppet last run on es1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:28] RECOVERY - puppet last run on analytics1014 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:28] RECOVERY - puppet last run on db1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:33] * valhallasw`cloud shoots icinga-wm [13:13:40] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:13:40] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:41] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:41] RECOVERY - puppet last run on db1057 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:48] RECOVERY - puppet last run on search1015 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:13:49] RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:13:59] RECOVERY - puppet last run on mw1053 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:14:00] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:14:00] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:14:00] RECOVERY - puppet last run on titanium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:14:00] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:14:01] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:14:10] shut up icinga [13:14:19] that worked [13:14:28] * YuviPanda flexes muscles [13:21:27] (03CR) 10Yuvipanda: "I've always been partial to max line length of 120.." [debs/adminbot] - 10https://gerrit.wikimedia.org/r/181054 (owner: 10Merlijn van Deen) [13:24:27] (03CR) 10Merlijn van Deen: "Because log messages are human messages, so it makes sense to show them in recent changes." [debs/adminbot] - 10https://gerrit.wikimedia.org/r/180889 (owner: 10Merlijn van Deen) [13:25:53] (03PS1) 10QChris: Set hive.stats.autogather to false, if Hive has stats disabled [puppet/cdh] - 10https://gerrit.wikimedia.org/r/181563 [13:26:29] (03CR) 10Merlijn van Deen: "Wmclient does not allow getting those values while getting the page (without basically rewriting all of wmclient, anyway). Besides, this i" [debs/adminbot] - 10https://gerrit.wikimedia.org/r/180890 (owner: 10Merlijn van Deen) [13:28:44] (03PS5) 10Merlijn van Deen: Flake8-ify everything [debs/adminbot] - 10https://gerrit.wikimedia.org/r/181054 [13:28:55] (03CR) 10Merlijn van Deen: "Then you get 120." [debs/adminbot] - 10https://gerrit.wikimedia.org/r/181054 (owner: 10Merlijn van Deen) [13:34:17] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:34:26] (03CR) 10Filippo Giunchedi: [C: 031] cache: install the planet SSL cert on misc-web [puppet] - 10https://gerrit.wikimedia.org/r/181415 (owner: 10Dzahn) [13:34:59] (03PS1) 10Dzahn: phab metrics: add number of accounts created [puppet] - 10https://gerrit.wikimedia.org/r/181564 [13:35:58] (03CR) 10Filippo Giunchedi: [C: 031] etherpad: add Varnish misc config [puppet] - 10https://gerrit.wikimedia.org/r/181412 (owner: 10John F. Lewis) [13:36:04] (03CR) 10Filippo Giunchedi: [C: 031] etherpad->misc-web-lb.eqiad [dns] - 10https://gerrit.wikimedia.org/r/181269 (owner: 10John F. Lewis) [13:36:51] (03CR) 10Filippo Giunchedi: "good point, I'll let Andrew comment here" [puppet] - 10https://gerrit.wikimedia.org/r/181556 (owner: 10Filippo Giunchedi) [13:37:25] (03CR) 10Dzahn: [C: 032] "Number of accounts created in (2014-11): 299" [puppet] - 10https://gerrit.wikimedia.org/r/181564 (owner: 10Dzahn) [13:47:51] (03PS3) 10Filippo Giunchedi: admin: grant twentyafterfour gallium [puppet] - 10https://gerrit.wikimedia.org/r/181211 (owner: 10John F. Lewis) [13:48:02] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] admin: grant twentyafterfour gallium [puppet] - 10https://gerrit.wikimedia.org/r/181211 (owner: 10John F. Lewis) [13:50:23] thanks godog :p [13:50:41] JohnLewis: yw! [13:51:18] PROBLEM - HTTPS on antimony is CRITICAL: SSL_CERT CRITICAL svn.wikimedia.org: certificate will expire on Jan 31 10:53:05 2015 GMT [13:54:02] (03PS1) 10RobH: setting install params for nembus [puppet] - 10https://gerrit.wikimedia.org/r/181565 [13:57:00] (03CR) 10RobH: [C: 032] setting install params for nembus [puppet] - 10https://gerrit.wikimedia.org/r/181565 (owner: 10RobH) [14:09:01] was taking another look at icinga con neon, it is leaving behind its own forks with zombies attached thus the swapping I think [14:10:10] <_joe_> godog: sigh [14:10:32] objections to a restart? it isn't very health now anyway, would be nice to understand why it did that tho [14:12:05] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57889 bytes in 0.076 second response time [14:12:35] <_joe_> restarting what? [14:12:41] <_joe_> icinga itself? [14:12:53] yep [14:13:27] <_joe_> yes, that may temporarily help [14:16:45] I'll coredump one of the icinga processes that's not reaping its zombie first and restart [14:17:45] !log restart icinga on neon [14:17:53] Logged the message, Master [14:21:58] <_joe_> godog: icinga is not working right now [14:22:03] <_joe_> just FYI [14:22:54] _joe_: yep just started back up [14:23:13] <_joe_> ok sorry [14:25:32] np, I'll followup in phab [14:30:43] (03CR) 10Ottomata: "Hive is just a Hadoop client. Whomever has Hadoop access also has Hive access." [puppet] - 10https://gerrit.wikimedia.org/r/181556 (owner: 10Filippo Giunchedi) [14:31:12] (03CR) 10Ottomata: [C: 031] admin: awight stats/hive access [puppet] - 10https://gerrit.wikimedia.org/r/181556 (owner: 10Filippo Giunchedi) [14:31:58] (03PS1) 10RobH: setting mgmt info for codfw osm servers [dns] - 10https://gerrit.wikimedia.org/r/181570 [14:32:22] (03CR) 10Ottomata: [C: 032] Set hive.stats.autogather to false, if Hive has stats disabled [puppet/cdh] - 10https://gerrit.wikimedia.org/r/181563 (owner: 10QChris) [14:35:06] (03PS1) 10Giuseppe Lavagetto: hiera: fix for the role backend under precise [puppet] - 10https://gerrit.wikimedia.org/r/181571 [14:35:08] (03PS1) 10Giuseppe Lavagetto: hiera: activate role-base backend [puppet] - 10https://gerrit.wikimedia.org/r/181572 [14:35:10] (03PS1) 10Giuseppe Lavagetto: hiera: make mw1017 use the role backend [puppet] - 10https://gerrit.wikimedia.org/r/181573 [14:36:08] (03PS2) 10Giuseppe Lavagetto: hiera: fix for the role backend under precise [puppet] - 10https://gerrit.wikimedia.org/r/181571 [14:36:22] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera: fix for the role backend under precise [puppet] - 10https://gerrit.wikimedia.org/r/181571 (owner: 10Giuseppe Lavagetto) [14:41:01] (03PS2) 10Giuseppe Lavagetto: hiera: activate role-base backend [puppet] - 10https://gerrit.wikimedia.org/r/181572 [14:41:06] T85222 -> icinga [14:41:35] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera: activate role-base backend [puppet] - 10https://gerrit.wikimedia.org/r/181572 (owner: 10Giuseppe Lavagetto) [14:49:40] PROBLEM - puppet last run on sca1002 is CRITICAL: CRITICAL: puppet fail [14:50:07] <_joe_> this may be me restarting apache on strontium [14:50:27] <_joe_> !log restarted apache on strontium to verify hiera is working [14:50:32] Logged the message, Master [14:51:56] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 1 failures [14:51:56] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:19] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet has 1 failures [14:52:38] PROBLEM - puppet last run on db1050 is CRITICAL: CRITICAL: Puppet has 2 failures [14:53:24] (03CR) 10RobH: [C: 032] setting mgmt info for codfw osm servers [dns] - 10https://gerrit.wikimedia.org/r/181570 (owner: 10RobH) [14:53:49] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Puppet has 1 failures [14:54:04] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail [14:54:50] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 1 failures [14:54:53] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 1 failures [15:03:40] (03PS2) 10Giuseppe Lavagetto: hiera: make mw1017 use the role backend [puppet] - 10https://gerrit.wikimedia.org/r/181573 [15:04:06] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera: make mw1017 use the role backend [puppet] - 10https://gerrit.wikimedia.org/r/181573 (owner: 10Giuseppe Lavagetto) [15:04:16] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:04:47] RECOVERY - puppet last run on db1050 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [15:04:47] RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:06:06] <_joe_> !log gracefully reloading apache on palladium to clean up old puppet master instances [15:06:08] Logged the message, Master [15:06:57] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:06:57] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:08:41] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [15:09:47] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:09:48] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [15:11:58] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:14:11] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [15:15:42] <_joe_> and of course 'cluster' doesn't work with the role-based backend, meh. [15:19:22] (03PS1) 10RobH: setting asset tag mgmt entries for dbstore2001-2002 [dns] - 10https://gerrit.wikimedia.org/r/181575 [15:19:59] (03CR) 10RobH: [C: 032] setting asset tag mgmt entries for dbstore2001-2002 [dns] - 10https://gerrit.wikimedia.org/r/181575 (owner: 10RobH) [15:26:51] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [15:34:08] (03PS1) 10Giuseppe Lavagetto: puppet: re-enter $cluster at the node level [puppet] - 10https://gerrit.wikimedia.org/r/181577 [15:35:56] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet: re-enter $cluster at the node level [puppet] - 10https://gerrit.wikimedia.org/r/181577 (owner: 10Giuseppe Lavagetto) [15:56:55] (03PS1) 10RobH: setting the install lease file info for es2004/2007/2010 [puppet] - 10https://gerrit.wikimedia.org/r/181579 [15:57:41] (03CR) 10RobH: [C: 032] setting the install lease file info for es2004/2007/2010 [puppet] - 10https://gerrit.wikimedia.org/r/181579 (owner: 10RobH) [16:01:05] (03PS1) 10Giuseppe Lavagetto: mediawiki: remove one stanza from regex.yaml [puppet] - 10https://gerrit.wikimedia.org/r/181581 [16:02:34] (03CR) 10Giuseppe Lavagetto: [V: 032] mediawiki: remove one stanza from regex.yaml [puppet] - 10https://gerrit.wikimedia.org/r/181581 (owner: 10Giuseppe Lavagetto) [16:02:46] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: remove one stanza from regex.yaml [puppet] - 10https://gerrit.wikimedia.org/r/181581 (owner: 10Giuseppe Lavagetto) [16:13:02] godog: are you 100% sure that JohnFLewis's plan to dump bugzilla at one bug/second won't DOS us? (I can't think why it would, just double-checking). [16:14:10] andrewbogott: mostly playing it by ear didn't check bugzilla proper but I'd be surprised if it couldn't handle one page view per second [16:14:23] yeah, that's what I think as well. [16:14:36] So I think he should just go ahead and run it on toollabs, barring any complains. [16:14:48] bblack: here? [16:14:52] If it develops to an issue - it can be killed as soon as they come to me anyway [16:15:12] (03PS1) 10Faidon Liambotis: nagios: kill currently unused check_cert check [puppet] - 10https://gerrit.wikimedia.org/r/181585 [16:15:14] (03PS1) 10Faidon Liambotis: nagios: add new custom-written check "check_ssl" [puppet] - 10https://gerrit.wikimedia.org/r/181586 [16:15:16] (03PS1) 10Faidon Liambotis: nagios: use check_ssl for the check_ssl_ldap check [puppet] - 10https://gerrit.wikimedia.org/r/181587 [16:15:18] (03PS1) 10Faidon Liambotis: nagios: use check_ssl for checking lists/otrs/svn [puppet] - 10https://gerrit.wikimedia.org/r/181588 [16:15:20] (03PS1) 10Faidon Liambotis: nagios: use check_ssl for checking localssl caches [puppet] - 10https://gerrit.wikimedia.org/r/181589 [16:15:22] (03PS1) 10Faidon Liambotis: nagios: kill check_ssl_cert, now unused [puppet] - 10https://gerrit.wikimedia.org/r/181590 [16:15:53] any perl coders that would like to review this? [16:17:45] <_joe_> paravoid: I'll take a look [16:18:09] I also figured out why check_ssl_cert is so slow [16:18:14] and we can do several things to speed it up [16:18:25] but I don't think we should [16:18:47] but I can tell you, if you want to cry [16:19:02] <_joe_> 221 lines in gerrit? no way [16:19:06] yeah full disclosure! [16:19:11] like seriously, you'll probably shed a tear [16:19:17] <_joe_> lol [16:19:22] <_joe_> please share the pain [16:19:27] ok, for starters [16:19:38] check_ssl_cert is ~890 lines in bash [16:19:42] we call it ~2600 times [16:20:01] <_joe_> (which is a little bit too much, too) [16:20:17] it calls a bunch of "which timeout", "which openssl", "which perl" etc. [16:20:29] as well as a perl -e "use Date::Parse" to check if the module is there [16:20:47] <_joe_> ahah that is the "safe bash for script kiddies" - I've seen a lot of that crazy [16:20:59] so it calls a bunch of executables all over the place just for feature checks as well the regular openssl calls (fork/exec) [16:21:05] but the most funny part of it all is [16:21:22] that because it can't really do multiple checks while connecting [16:21:42] it calls openssl, tells it to dump the certificate to $(mktemp -d) [16:21:50] and then runs openssl all over it [16:22:03] and its default for that is /tmp [16:22:04] <_joe_> wonderful [16:22:14] ...which isn't a tmpfs [16:22:31] I have an ls -la /tmp running [16:22:46] paravoid: yeah [16:22:51] for about 10 minutes now [16:23:01] you literally cannot ls /tmp [16:23:14] <_joe_> ahah [16:23:18] ah wait, it works without all the stat calls (-la) [16:23:37] neon's disks are thrashing [16:23:44] faidon@neon:~$ ls /tmp | grep -c check_ssl [16:23:44] 184106 [16:23:48] so, yeah... :P [16:23:57] check_ssl_cert is awesome, yeah [16:24:04] bblack: see commits above, I rewrote it in perl [16:25:09] nice! [16:25:14] https://gerrit.wikimedia.org/r/#/q/project:operations/puppet+topic:check_ssl,n,z [16:25:29] sigh [16:25:32] it will take me a few to review thoroughly, but I love the idea [16:25:41] I was pinging you to review since I know you're proficient in perl, but _joe_ also offered [16:25:46] either (or both) works for me [16:26:01] my changes deploy it in stages anyway though, leaving cache.pp last [16:26:03] I assume perl because it works better with nagios having that module, etc? [16:26:16] that and also because python's ssl module is very wtf [16:26:24] they added SNI... in 3.4 [16:26:33] and some of the things I needed in 3.4.3 which isn't even in trusty [16:26:36] <_joe_> yeah I use pyopenssl almost always [16:26:51] I thought of that, as well as another module I found [16:26:57] but meh [16:28:59] <_joe_> paravoid: no I agree with using perl for this [16:30:13] can it be used to actually check unified's as well? one of the issues with all the manual tools I use is that they usually don't have a way to convince to not send the host info, but still check that the resulting cert matches, because they're all SNI-capable [16:30:18] paravoid: it didn't support cert checking at all, if I remember correctly :-P [16:30:19] <_joe_> Net::SSLeay::P_ASN1_UTCTIME_put2string is a nice function name [16:30:28] bblack: it can't :( [16:30:28] luckily wget on precise is outdated enough that it serves for that check manually, currently. [16:30:35] I found that out at the end :( [16:30:48] well, regardless, we didn't have that before either :) [16:30:58] IO::Socket::SSL hardcodes a Net:SSLeay::ctrl call in the connect function [16:31:02] that enables SNI [16:31:16] I can't even monkey-patch that :/ [16:31:19] yeah [16:31:39] it's a minor point anyways, and hopefully over the next couple of years, one of decreasing relevance [16:37:03] 17:24:04 bblack: see commits above, I rewrote it in perl [16:37:09] what's wrong with this sentence :P [16:37:14] :P [16:37:40] why didn't you use Go [16:39:24] hmmm [16:39:36] with a newer IO::Socket::SSL we could monkeypatch it [16:40:15] not with precise's, though [16:40:32] precise's IO::Socket::SSL has [16:40:32] if (Net::SSLeay::OPENSSL_VERSION_NUMBER() >= 0x009080ef) { [16:40:36] my $host; [16:40:37] [16:40:40] } [16:40:50] but http://cpansearch.perl.org/src/SULLR/IO-Socket-SSL-2.008/lib/IO/Socket/SSL.pm has [16:40:59] BEGIN { $can_client_sni = Net::SSLeay::OPENSSL_VERSION_NUMBER() >= 0x01000000; [16:41:05] and then [16:41:08] if ( $can_client_sni ) { [16:41:28] so we could override $can_client_sni [16:41:28] IIRC there are some advantages to perl with nagios in particular, because it can (if so configured) not launch a new perl interpreter for each check. [16:41:44] (which is kind of a poorly designed feature that has bit us in a bad way before, but it's nice when it works) [16:41:51] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Minor omission." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/181586 (owner: 10Faidon Liambotis) [16:42:59] (03CR) 10Faidon Liambotis: nagios: add new custom-written check "check_ssl" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/181586 (owner: 10Faidon Liambotis) [16:43:15] _joe_: ^ [16:44:25] <_joe_> paravoid: I didn't remember it also gave you timeout [16:44:30] it does [16:45:25] did anyone notice how I'm using an internal IO::Socket::SSL function? :) [16:45:43] that is commented as [16:45:44] # _get_ssl_object is for internal use ONLY! [16:45:53] in IO::Socket::SSL's source :) [16:46:10] <_joe_> I noticed the _, but that doesn't mean much in perl :P [16:46:16] (or python) [16:46:31] <_joe_> I mean you /can/ make methods really private, but it's a weird syntax [16:46:36] in Perl if you're not using private methods and monkey-patching module internals, you're not trying hard enough :) [16:46:59] <_joe_> uh, "you can but it's a weird syntax" applies to most of perl [16:47:14] @sans = @sans[ grep { $_ & 1 } 1..$#sans ]; [16:47:18] (03CR) 10Giuseppe Lavagetto: [C: 031] nagios: add new custom-written check "check_ssl" [puppet] - 10https://gerrit.wikimedia.org/r/181586 (owner: 10Faidon Liambotis) [16:47:21] is my response to this :P [16:47:41] <_joe_> ahah, nice [16:48:16] <_joe_> (still figuring it out btw, my perl is rusty) [16:49:17] <_joe_> lol [16:49:21] <_joe_> wow that is cool [16:49:39] everyone's perl is rusty [16:50:01] mine is as well, yes [16:50:10] Perl is rusty [16:50:13] rusty is perl's middle name [16:50:42] so we replaced RT & Bugzilla, both written in Perl, with Phabricator [16:50:47] ...written in PHP [16:50:49] we're so last decade [16:51:16] I still think Perl's a pretty awesome language fwiw. It has evolved well, and you can write very clean and modern code in Perl these days with advanced, clean techniques that most competing languages can't even offer. [16:51:53] it's just that it's one of those things where the expressive power and freedom is so great, it's easy, and perhaps even the default, to shoot yourself in the foot. [16:53:06] if Python is a set of Legos, Perl is a plastics manufacturing and mold-making plant :) [16:54:03] (and a vast collection of predefined pieces of plastic, many of which were designed by crazy people) [16:55:25] plastic and razorblades [16:55:39] :) [16:59:00] dunno, Perl has always been a good choice for network programming as I understand it. other uses maybe not so much. [17:00:01] * chrismcmahon worked through this book some years ago: http://www.amazon.com/Network-Programming-Perl-Lincoln-Stein/dp/0201615711 [17:05:59] it's really hard to abstract low-level network programming in a sufficiently-generic way [17:06:22] I mean, I guess abstract and sufficient-generic are always hard to put together, but particularly in this case [17:06:52] bblack: are you planning to review this or should I take _joe_'s +1 as a go-ahead? :) [17:07:07] it's a day of much distraction! [17:07:15] give me like 5-10 more minutes :) [17:07:26] sure, no pressure, I was just wondering :) [17:07:34] * mark hovers over the -2 button [17:07:52] you can stay with an almost broken neon for the holidays then :) [17:08:10] without paging it'll be quiet [17:08:14] ;) [17:08:43] bblack: besides these improvements, I really wonder if there's much point into running all these checks for all of our hosts [17:09:06] (PS1) Mark Bergsma: nagios is slow and annoying, so let's kill neon. [17:09:42] it only ever gives me bad news [17:10:06] root@neon:/etc/icinga# grep -c check_ssl_cert puppet_services.cfg [17:10:06] 2633 [17:10:22] out of 14k [17:10:35] i've said a few times that it may make sense to have a consolidated general health check script on each host for the common stuff [17:10:35] well [17:10:39] <_joe_> it's anyway too much [17:10:40] but noone likes my idea :( [17:11:26] there may be better ways to structure things, but imho it's important to check certs on the hosts in the current environment. we have little other defense against 1/110 caches missing a cert or having a wrong cert, due to some mishap. [17:11:38] right [17:11:48] especially given they're puppet-managed, yet puppet doesn't reload nginx to make any changes take effect, either. [17:11:51] I thought there's a good reason for that which is why I went and rewrote check_ssl :) [17:12:28] but maybe we need a higher-level check that's check_ssl_inconsistent or something that goes through all of our SNI certs [17:12:45] so it's just one per server [17:12:45] could we let puppet check when it runs instead? [17:12:56] and that doesn't shell out to check_ssl 24 times that is [17:13:59] s/24/26/ [17:15:17] well puppet not only checks but enforces correct certs + config, as it generates them [17:15:28] but it doesn't enforce that nginx is running the config it generated [17:15:52] (or that the results of that config seem independently to be as expected, which could be a check on not-all-hosts) [17:16:54] paravoid: subject and issue have the same short-option "s" [17:17:01] s/issuer/issue/ [17:17:03] meh [17:17:13] :) [17:18:32] (03CR) 10BBlack: [C: 031] "LGTM, aside from --issuer and --subject both using "-s"" [puppet] - 10https://gerrit.wikimedia.org/r/181586 (owner: 10Faidon Liambotis) [17:19:41] (03PS2) 10Faidon Liambotis: nagios: use check_ssl for the check_ssl_ldap check [puppet] - 10https://gerrit.wikimedia.org/r/181587 [17:19:43] (03PS2) 10Faidon Liambotis: nagios: add new custom-written check "check_ssl" [puppet] - 10https://gerrit.wikimedia.org/r/181586 [17:19:45] (03PS2) 10Faidon Liambotis: nagios: use check_ssl for checking localssl caches [puppet] - 10https://gerrit.wikimedia.org/r/181589 [17:19:47] (03PS2) 10Faidon Liambotis: nagios: use check_ssl for checking lists/otrs/svn [puppet] - 10https://gerrit.wikimedia.org/r/181588 [17:19:49] (03PS2) 10Faidon Liambotis: nagios: kill check_ssl_cert, now unused [puppet] - 10https://gerrit.wikimedia.org/r/181590 [17:21:05] (03CR) 10Faidon Liambotis: [C: 032] nagios: kill currently unused check_cert check [puppet] - 10https://gerrit.wikimedia.org/r/181585 (owner: 10Faidon Liambotis) [17:21:19] (03CR) 10Faidon Liambotis: [C: 032] nagios: add new custom-written check "check_ssl" [puppet] - 10https://gerrit.wikimedia.org/r/181586 (owner: 10Faidon Liambotis) [17:22:34] (03CR) 10Faidon Liambotis: [C: 032] nagios: use check_ssl for the check_ssl_ldap check [puppet] - 10https://gerrit.wikimedia.org/r/181587 (owner: 10Faidon Liambotis) [17:22:50] * paravoid deploys [17:23:06] just ldap for now [17:34:53] (03CR) 10Faidon Liambotis: [C: 032] nagios: use check_ssl for checking lists/otrs/svn [puppet] - 10https://gerrit.wikimedia.org/r/181588 (owner: 10Faidon Liambotis) [17:39:16] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:41:02] Status Information: (Service check did not exit properly) [17:41:03] dammit [17:44:03] how do we feel about the 'foomodule_new' pattern? [17:44:07] i hate the redis module [17:44:27] but changing it is risky, i'd rather migrate things piecemeal [17:44:31] PROBLEM - Certificate expiration on neptunium is CRITICAL: (Service check did not exit properly) [17:45:30] (03PS1) 10Giuseppe Lavagetto: mediawiki: use role keyword in node defs, get rid of duplicate regexes [puppet] - 10https://gerrit.wikimedia.org/r/181596 [17:46:34] PROBLEM - Certificate expiration on labcontrol2001 is CRITICAL: (Service check did not exit properly) [17:46:41] <_joe_> ori: we feel _bad_, but if the original committer of the _new module sacrifices a goat to the Ancient Ones swearing on a precise expiration date, rare expceptions have been made [17:46:41] (03PS1) 10Faidon Liambotis: nagios: fix check_ssl_ldap arguments [puppet] - 10https://gerrit.wikimedia.org/r/181597 [17:46:54] _joe_: 'role foo' not 'role(foo)'? you can omit the parens? [17:47:01] <_joe_> yeah [17:47:07] (03CR) 10Faidon Liambotis: [C: 032 V: 032] nagios: fix check_ssl_ldap arguments [puppet] - 10https://gerrit.wikimedia.org/r/181597 (owner: 10Faidon Liambotis) [17:47:11] is that the case for all puppet functions? [17:47:13] <_joe_> I realized that once I've seen include is a function [17:47:15] <_joe_> yes [17:47:16] <_joe_> :P [17:47:29] <_joe_> brb [17:47:44] http://i3.kym-cdn.com/photos/images/newsfeed/000/288/648/776.gif [17:48:00] i had no idea [17:49:04] believe it or not, neither the puppet authors did [17:50:23] wow icinga is seriously bottlenecked [17:50:30] checks lag for many minutes [17:57:35] PROBLEM - HTTPS on sodium is CRITICAL: (Service check did not exit properly) [17:57:53] (false alarm, this is the check_ssl stuff above, I'm debugging it) [17:58:44] PROBLEM - HTTPS on iodine is CRITICAL: (Service check did not exit properly) [17:59:33] PROBLEM - HTTPS on antimony is CRITICAL: (Service check did not exit properly) [17:59:43] PROBLEM - Certificate expiration on labcontrol2001 is CRITICAL: (Service check did not exit properly) [18:03:05] RECOVERY - Certificate expiration on labcontrol2001 is OK: SSL OK [18:04:08] neon: 18:03:57 up 1 day, 17 min, 4 users, load average: 358.42, 467.64, 433.61 [18:04:12] yeah I know [18:04:15] it was 600+ before [18:04:17] :) [18:04:25] RECOVERY - HTTPS on sodium is OK: SSL OK [18:04:44] robh: is rancid putting diffs in cvs ? [18:04:46] so the above errors were... [18:04:54] my plugin is not embedded-perl-compatible [18:04:59] matanya: uh, i have no idea [18:05:03] yeah [18:05:04] matanya: yes [18:05:10] I had to disable it in the old script too I think [18:05:10] oh, the pain [18:05:16] RECOVERY - HTTPS on iodine is OK: SSL OK [18:05:26] RECOVERY - Certificate expiration on neptunium is OK: SSL OK [18:05:29] why not move to a git based tool ? [18:05:38] there is none and noone cares [18:05:41] i just added myself so i get emailed diffs on network switch stuff [18:05:44] since i edit them a lot. [18:05:51] paravoid: there is [18:05:55] (i assume you asked me since i just changed rancid alias?) [18:06:08] i did [18:06:21] paravoid: to effectively disable embedded perl in the bgp ones, I used two comment lines: [18:06:24] +# nagios: -epn [18:06:26] yes [18:06:26] +# icinga: -epn [18:06:36] the nagios one alone works [18:06:37] paravoid: https://github.com/FluentTradeTechnologies/netconfigit [18:07:06] I think I added the icinga one for a reason, though. something about whether icinga looks at the nagios one depends on the version of icinga or something. [18:07:11] (03PS1) 10Faidon Liambotis: nagios: disable embedded perl for check_ssl [puppet] - 10https://gerrit.wikimedia.org/r/181600 [18:07:13] fair enough [18:07:15] I don't remember tbh [18:07:33] (03CR) 10Faidon Liambotis: [C: 032] nagios: disable embedded perl for check_ssl [puppet] - 10https://gerrit.wikimedia.org/r/181600 (owner: 10Faidon Liambotis) [18:07:44] (03CR) 10Faidon Liambotis: [V: 032] nagios: disable embedded perl for check_ssl [puppet] - 10https://gerrit.wikimedia.org/r/181600 (owner: 10Faidon Liambotis) [18:08:05] ok, I'm going to manually switch everything to check_ssl now [18:08:13] with puppet disabled [18:10:25] bblack: watch the load [18:10:29] I am [18:10:33] :)) [18:10:43] it's dropping pretty fast :) [18:10:49] 18:10:41 up 1 day, 24 min, 4 users, load average: 218.02, 393.32, 420.28 [18:11:16] (03PS3) 10Faidon Liambotis: nagios: use check_ssl for checking localssl caches [puppet] - 10https://gerrit.wikimedia.org/r/181589 [18:11:18] (03PS3) 10Faidon Liambotis: nagios: kill check_ssl_cert, now unused [puppet] - 10https://gerrit.wikimedia.org/r/181590 [18:11:34] (03CR) 10Faidon Liambotis: [C: 032] nagios: use check_ssl for checking localssl caches [puppet] - 10https://gerrit.wikimedia.org/r/181589 (owner: 10Faidon Liambotis) [18:12:09] check_ganglia's another big offender IIRC [18:12:35] I'll finish with check_ssl first [18:12:35] I seem to remember that it does something crazy like pull down all the data in each check just to filter it locally to the results for that check, then do it again for the next check, etc [18:12:40] wtf [18:12:43] yeah we should kill that entirely [18:12:48] I think only ottomata actually uses it [18:13:03] if you see top, it's full of check_ssl & check_ganglia [18:13:10] yeah [18:14:13] I'll write check_ssl_multi next I think [18:14:44] PROBLEM - HTTPS_unified on amssq56 is CRITICAL: Return code of 255 is out of bounds [18:14:53] root@neon:/etc/icinga# rm -rf /tmp/check_ssl_cert* [18:14:53] bash: /bin/rm: Argument list too long [18:15:15] heh [18:15:44] PROBLEM - HTTPS_unified on cp4001 is CRITICAL: Return code of 255 is out of bounds [18:15:53] PROBLEM - HTTPS_unified on cp4007 is CRITICAL: Return code of 255 is out of bounds [18:15:54] PROBLEM - HTTPS_unified on cp4019 is CRITICAL: Return code of 255 is out of bounds [18:15:54] PROBLEM - HTTPS_unified on amssq48 is CRITICAL: Return code of 255 is out of bounds [18:15:56] ouch [18:15:59] this is going to spam us [18:16:04] PROBLEM - HTTPS_unified on cp3019 is CRITICAL: Return code of 255 is out of bounds [18:16:04] PROBLEM - HTTPS_unified on cp1052 is CRITICAL: Return code of 255 is out of bounds [18:16:05] PROBLEM - HTTPS_unified on cp4012 is CRITICAL: Return code of 255 is out of bounds [18:16:19] PROBLEM - HTTPS_unified on amssq55 is CRITICAL: Return code of 255 is out of bounds [18:16:25] PROBLEM - HTTPS_unified on cp1048 is CRITICAL: Return code of 255 is out of bounds [18:16:26] PROBLEM - HTTPS_unified on amssq32 is CRITICAL: Return code of 255 is out of bounds [18:16:26] PROBLEM - HTTPS_unified on amssq51 is CRITICAL: Return code of 255 is out of bounds [18:16:26] PROBLEM - HTTPS_unified on amssq38 is CRITICAL: Return code of 255 is out of bounds [18:16:36] wow /tmp is ridiculous [18:17:35] <_joe_> bblack: I fixed that in check_ganglia some time ago [18:17:37] HTTPS_unified is check_ssl_http!*.wikipedia.org [18:17:50] yeah the unified check is mostly bogus right now [18:17:56] cannot handle international domains, please install Net::LibIDN, Net::IDN::Encode or URI at /usr/lib/nagios/plugins/check_ssl line 148 [18:17:59] lol [18:18:07] <_joe_> but I kinda remember it got reverted inadvertedly or something [18:18:11] why does it think it's IDN? [18:18:30] because * is not a valid character in a domain [18:18:35] oh, right! [18:18:37] what are you trying to do with this check [18:18:56] nothing really, it's not effective in its current form, it's just a placeholder for "find a way to actually check unified" [18:19:01] we can fix the star easily [18:19:28] manifests/role/cache.pp: [18:19:28] $check_cert = $certname ? { [18:19:28] 'unified.wikimedia.org' => '*.wikipedia.org', [18:19:29] 'uni.wikimedia.org' => '*.wikipedia.org', [18:19:32] yeah I'm seeing this [18:19:37] I'm wondering what to convert it to [18:19:54] I'll install a newer libio-socket-ssl-perl and monkey-patch for a --no-sni argument later [18:20:00] just make the check_cert name be foo.wikipedia.org or whatever [18:20:02] (for now) [18:20:17] wikipedia.org? [18:20:49] any of the domains really [18:21:00] ah, right [18:21:05] it won't do anything right now [18:21:09] either way it's going to do SNI [18:21:12] right [18:21:46] maybe just get rid of that whole clause for now [18:22:01] (and just make $check_cert == $certname) [18:22:04] <_joe_> the load is still in the high hundreds [18:22:24] <_joe_> going down quickly, I'd have expected more [18:22:38] I'm working on cleaning out tmp in little chunks bash can handle [18:22:46] (the check_ssl_cert files, anyways) [18:23:14] xargs -n is your friend [18:23:16] <_joe_> bblack: find . -type -f -name "foo*" -maxdepth 1 -delete doesn't work? [18:23:27] that works too, although it's slow [18:23:41] I'm just using a for-loop and subsetting them based on the first char of the random part [18:23:43] <_joe_> paravoid: I was seeing that as a plus right now [18:23:44] just ls /tmp | grep ssl_cert | xargs -n 100 rm [18:23:54] or 1000 or something :) [18:23:59] <_joe_> paravoid: of course that's faster [18:24:05] <_joe_> paravoid: 1023 :P [18:24:54] (03PS1) 10Faidon Liambotis: nagios: fix check for unified certificate [puppet] - 10https://gerrit.wikimedia.org/r/181604 [18:25:12] <_joe_> I usually prefer find on high-load machines because it's actually a bit slower and gentler on the disk [18:25:37] the load is not I/O bound atm and your solution spawns more processes :P [18:25:44] (03CR) 10Faidon Liambotis: [C: 032] nagios: fix check for unified certificate [puppet] - 10https://gerrit.wikimedia.org/r/181604 (owner: 10Faidon Liambotis) [18:25:47] <_joe_> yes I was thinking of that [18:26:00] <_joe_> btw it is iobound now :P [18:26:00] (03PS4) 10Faidon Liambotis: nagios: kill check_ssl_cert, now unused [puppet] - 10https://gerrit.wikimedia.org/r/181590 [18:26:09] (03CR) 10Faidon Liambotis: [C: 032] nagios: kill check_ssl_cert, now unused [puppet] - 10https://gerrit.wikimedia.org/r/181590 (owner: 10Faidon Liambotis) [18:26:15] neon should really have a tmpfs /tmp [18:26:22] <_joe_> +1 [18:26:22] but let's clean up/drop the load first :) [18:27:00] <_joe_> as soon as cleaning tmp is done, it should be relatively easy [18:27:14] <_joe_> ok off for today, tomorrow I should be online in the afternoon [18:27:23] <_joe_> in the morning, not so much [18:27:46] <_joe_> paravoid: load on neon is below 100, good work :) [18:28:09] I think the crazy /tmp was slowing down the rest of the workload [18:28:22] <_joe_> admittedly, doing as bad as the old check is *hard* [18:29:53] I am assuming the page I just got is spurious? paravoid ? [18:29:58] no it's not [18:30:05] well, I don't know if it is, but it's not related [18:30:27] although icinga is under I/O load, so this may have to do with that? [18:30:42] I'm also noticing that sdb is considerably slower than sda [18:30:44] it may be dying [18:30:44] heh "plugin timeout" could be related alright [18:30:51] sda 0.00 813.40 0.00 540.80 0.00 5364.00 19.84 4.63 7.81 0.00 7.81 0.34 18.48 [18:30:54] sdb 0.00 827.20 0.00 546.40 0.00 6257.60 22.90 147.49 281.50 0.00 281.50 1.83 100.00 [18:30:58] Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util [18:31:33] tmp is emptied of check_ssl_cert now [18:31:47] apparently that was basically everything, I see no files left now [18:32:20] !log restarting icing [18:32:24] Logged the message, Master [18:32:25] a :) [18:32:44] it's still syncing to disk I think [18:33:17] yeah [18:33:23] filesystem is not happy :) [18:34:44] I'm worried about a paging storm [18:34:54] yeah I'm trying "sync" now, and it's taking for-freaking-ever [18:35:05] I think it has to catch up on reclaiming all those inodes [18:50:42] sync finished [18:52:58] and now it's taking a while again [18:53:09] I wonder if this fs is broken beyond repair [18:53:34] my sync is still going from above [18:53:51] mostly we're not having much iowait anymore though, just lack of total cputime [18:54:45] iostat shows I/O at > 80% [18:54:52] still, even the 15m load now is sub-100, which is a big improvement [18:55:05] yeah but process iowait is holding down near zero [18:55:09] earlier it was ~30%+ [18:55:10] yeah [18:55:42] I'll write a new check later today or tomorrow [18:56:05] to check multiple certs in one execution? [18:56:06] that will eval check_ssl and do 26 checks [18:56:10] yeah [18:56:16] if you think about it, it's what we need too [18:56:32] these certs will expire all together or close together [18:56:41] that's 2600 alerts :P [18:56:51] not that 101 alerts is okay [18:56:53] then we'll be sure to notice I guess! [18:56:55] but it's significantly better [18:57:15] what we really care is "are the SSL certificates okay in this host" [18:57:20] ideally almost all complex checks should run distributed anyways [18:57:35] it's not about distributed here, though [18:57:55] check_sslx26 could be running on the cp host itself and just returning a simple status back to nagios over the network [18:57:56] conceptually, we don't want 26 more checks on every cp* host, we just need one [18:58:07] sure we can do that too [18:58:57] (if there's a real difference between hitting the host's external IP from itself or from elsewhere, we have other issues anyways) [18:59:03] it won't help all that much with neon's CPU though [18:59:16] well, it won't help if we run check_ssl distributed now [18:59:20] sure it would. it wouldn't have to actually do the crypto part of the SSL check [18:59:33] (or the launching of many perl procs) [18:59:58] you assume we aren't going to do NRPE over SSL :P [19:00:09] I don't think we do this right now, but we might :) [19:00:19] heh [19:00:59] we shouldn't need SSL for internal traffic in the general case because we have ipsec! :) [19:01:06] http://ganglia.wikimedia.org/latest/graph.php?r=4hr&z=xlarge&h=neon.wikimedia.org&m=cpu_report&s=by+name&mc=2&g=load_report&c=Miscellaneous+eqiad [19:01:10] not too bad [19:01:15] but I'll make it even better [19:01:24] I might even try making my plugin epn-compatible [19:02:25] oh man... we ship check_nrpe ? a binary from our puppet repo ? sigh [19:03:04] # WMF custom service checks [19:03:09] http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&h=neon.wikimedia.org&m=cpu_report&s=by+name&mc=2&g=load_report&c=Miscellaneous+eqiad [19:03:12] there is not custom about check_nrpe :-( [19:03:13] I really wonder what's up with that [19:03:28] the first spike is before the SSL checks [19:03:34] well the second one is SSL [19:05:01] _joe_ was saying that the first one is when we switched to a new machine [19:05:08] or something :) [19:05:09] I wasn't here [19:05:36] oh yeah [19:05:43] that was an interesting event! [19:06:42] oh the machine replacement thing was the tiny spike at the end of july [19:07:18] the first spike is circa Sep 11th [19:07:23] I traced it to some labmon commits [19:07:25] which are gone now [19:07:55] I'm also suspecting that this gradual increase for all of December may be /tmp filling up [19:08:13] "Neon (Icinga) outage post-mortem" <- subject line from ops mailing list, circa July 30 [19:08:32] check_ssl_cert used mktemp, which would try to find a unique name [19:08:47] so it probably opendirs and scans the whole directory [19:08:57] (03PS1) 10Manybubbles: Temporarily add Elasticsearch to einsteinium [puppet] - 10https://gerrit.wikimedia.org/r/181612 [19:09:04] and considering our syncs haven't finished yet... [19:15:27] (03PS2) 10Anomie: Enable ApiFeatureUsage on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180529 [19:15:45] I'm suspecting we might have to rebuild neon's filesystem... [19:15:52] this is getting a bit ridiculous [19:16:34] (03CR) 10BryanDavis: "Needed change in MediaWiki core has been merged so this is ready for testing in beta." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181349 (owner: 10BryanDavis) [19:17:23] ok, off for dinner [19:17:24] we just did that in july, and supposedly we're on a pair of identical 1TB SSDs that are fresh from then as well [19:17:24] ttyl [19:17:36] (03CR) 10Anomie: [C: 032] "Per MW Core meeting yesterday, let's do this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180529 (owner: 10Anomie) [19:17:42] (03Merged) 10jenkins-bot: Enable ApiFeatureUsage on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/180529 (owner: 10Anomie) [19:17:46] enjoy the food! [19:27:10] (03PS1) 10Rush: phab add ldap user field to userpage [puppet] - 10https://gerrit.wikimedia.org/r/181616 [19:28:40] (03CR) 10Rush: [C: 032] phab add ldap user field to userpage [puppet] - 10https://gerrit.wikimedia.org/r/181616 (owner: 10Rush) [19:35:14] (03PS4) 10Rush: Phabricator Sprint (0.6.1.4) [puppet] - 10https://gerrit.wikimedia.org/r/179155 (owner: 10Christopher Johnson (WMDE)) [19:36:44] (03CR) 10Rush: [C: 032] Phabricator Sprint (0.6.1.4) [puppet] - 10https://gerrit.wikimedia.org/r/179155 (owner: 10Christopher Johnson (WMDE)) [19:40:25] !log updated phab sprint app to 0.6.1.4 [19:40:33] Logged the message, Master [19:51:51] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [19:52:15] anomie: ^ [19:53:04] !log anomie Synchronized wmf-config: Labs-only change (duration: 00m 06s) [19:53:08] Logged the message, Master [19:53:45] (03CR) 10GWicke: [C: 031] Temporarily add Elasticsearch to einsteinium [puppet] - 10https://gerrit.wikimedia.org/r/181612 (owner: 10Manybubbles) [19:55:23] Reedy: maybe today ? [19:55:30] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [19:58:41] Does carbon not speak ipv6? :/ [20:01:43] doh, I used http_proxy not https proxy for wget [20:08:40] https://phabricator.wikimedia.org/T78537 :S [20:09:05] Think I'm filling up terbium with that :S [20:11:16] * hoo moves to tin :P [20:13:45] tin :p [20:13:59] I didn't come up with these... [20:14:21] hoo: they're far too creative for you ;) [20:14:43] :P [20:15:29] if you named them, they'd be 'hoo, wut, were, hen, y' :D [20:15:30] IMO we should name them after alcoholic drinks: "I just had a rough debugging session on vodka" [20:15:39] +1 indeed [20:16:54] we don't have an ethanol.wikimedia.org/ethanol.{$site}.wmnet do we? :( [20:17:18] Sadly not, no [20:17:27] But I think were mostly doing elements [20:18:31] yeah but ethanol isn't exactly one [20:19:18] If we get ethanol, I demand "caffeine"! :D [20:19:29] To much molecules, to few servers [20:20:42] hoo: do you know when will go out? [20:20:43] that's the beauty of chemistry, don't have enough elements? add a few carbons or hydrogens and you'll get a new one :p [20:21:04] hoo: as you can tell from , wikidata's composer calls are quite expensive [20:21:25] ori: Saw that exact thing earlier on :S [20:22:41] ori: Will go out with wmf13 [20:22:48] ah, cool [20:22:59] Although that's so trivial that we could even bring it earlier [20:23:39] ori: Did you see that AFAICT ishmael and graphite have no data atm [20:23:46] ishmael was broken before [20:23:50] but graphite is awry [20:25:55] hoo: who can help me with a server side upload ? [20:26:18] Sure thing! [20:26:25] godog: do you know about that? [20:26:38] Although I'm preparing one right now, that is larger than I expected [20:26:49] didn't even know you can upload 50gb files to googledrive [20:29:02] hoo: +1 for ealier [20:29:59] hoo: if i add you to my project on labs, you can pull from there ? [20:30:20] (03CR) 10Ottomata: [C: 031] Add abacist module & role; provision on stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/181110 (owner: 10Ori.livneh) [20:30:28] matanya: Are those accessible via http or so? [20:30:35] scp [20:30:49] mh... would need to forward agent to do that (into production) [20:30:53] but doable [20:32:23] hoo: added you [20:32:56] matanya: How much data is that? [20:32:57] please try /data/project/wikimania2014/ready/Evaluation_I_Metrics.webm [20:33:08] total 1.5TB [20:33:12] wow [20:33:14] but i want to upload only one [20:33:19] to test the qulity [20:33:29] How large is that one? [20:33:35] 4.7GB [20:33:40] Ok, taht should be ok [20:33:50] Let me first finish T78537, though [20:33:55] + to know how much effort I'll need to put in splitting/merging etc [20:34:58] thanks hoo [20:35:51] (03CR) 10Nikerabbit: Create a special alias router for RT (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/180641 (owner: 10Mark Bergsma) [20:51:05] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: Puppet has 1 failures [21:01:55] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:05:56] matanya: 4294967296bytes is the upper limit [21:06:11] so i'm above it ? [21:06:25] yeah, I guess :( [21:07:00] need to cut [21:07:10] ok, will poke again shortly [21:12:25] hoo: please take /data/project/wikimania2014/ready/Evaluation_I_Metrics_p1-001.webm [21:15:39] matanya: Other upload still ongoing... after that I'll give it a shot [21:15:49] thanks hoo [21:21:12] James_F: Hi! How's it going? :) I just opened an office IT support ticket for access to CollabWiki, and Joel Krauska said instead maybe I should ping you... [21:27:17] gitblit may need to be kicked again: https://git.wikimedia.org/ [21:28:23] Request: GET http://git.wikimedia.org/, from 10.64.0.171 via cp1043 cp1043 ([10.64.0.171]:80), Varnish XID 1230447677 [21:28:23] Forwarded for: (redacted), 10.64.0.171 [21:28:23] Error: 503, Service Unavailable at Tue, 23 Dec 2014 21:26:55 GMT [21:30:30] (03CR) 10Legoktm: [C: 031] "____" [puppet] - 10https://gerrit.wikimedia.org/r/122621 (owner: 10Reedy) [21:43:56] matanya: Ok, will take care of yours in a bit [21:44:12] thanks [21:45:46] (03PS1) 10Kaldari: Enable WikiGrok roulette on enwiki Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181682 [21:46:23] (03CR) 10Kaldari: [C: 032] Enable WikiGrok roulette on enwiki Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181682 (owner: 10Kaldari) [21:46:28] (03Merged) 10jenkins-bot: Enable WikiGrok roulette on enwiki Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/181682 (owner: 10Kaldari) [21:46:30] (03PS2) 10Alexandros Kosiaris: WIP: Reuse parsoid varnish for cxserver in beta [puppet] - 10https://gerrit.wikimedia.org/r/181613 [21:50:02] akosiaris: ping [21:50:09] gwicke: pong [21:50:10] matanya: Where is that file? [21:50:11] hey [21:50:17] I mean in which project? [21:50:21] hoo: video [21:50:22] (03CR) 10Alexandros Kosiaris: "Mostly an effort to measure if it is plausible to reuse the parsoid varnishes to also serve other Services (mathoid/citoid/cxserver). The " [puppet] - 10https://gerrit.wikimedia.org/r/181613 (owner: 10Alexandros Kosiaris) [21:50:38] akosiaris: I thought that we'd retire the parsoid varnishes once restbase comes online [21:50:45] matanya: Ah [21:50:56] It used to be that I get notifications when someone adds me to a project [21:51:08] it is in pref's [21:51:19] Just wondered [21:51:24] thought it was maybe in tools [21:51:40] btw, can you ping stat1001 ? [21:51:53] yep [21:52:00] akosiaris: https://phabricator.wikimedia.org/T78194 [21:52:42] gwicke: that would be cool TBH, cause those VCLs are a PITA. But in the meantime, I am trying to see if it makes sense to use them for cxserver. Turns out it is not the best idea as bblack has already said [21:53:34] akosiaris: I was wondering if we could start with an nginx-only entry point, as that should be easier to configure [21:54:05] http://nginx.org/en/docs/http/ngx_http_proxy_module.html [21:54:28] gwicke: well it would, but at the same time we do want the landscape to have some homogeneity as well [21:54:53] after all, we did move all other services behind misc-web [21:55:31] yeah, but misc-web is not supposed to be production-worthy it seems [21:55:31] gwicke: anyway, kind of late here, care to comment on that patch ? I have submitted it mostly to jumpstart a discussion than anything else [21:55:41] not production-worthy ? [21:55:45] who says that ? [21:55:52] faidon iirc [21:56:06] he did not say production-worthy [21:56:23] roan had a patch to hook up citoid using misc-web [21:56:33] yeah, I know he abandoned it [21:56:43] but it is not about production worthy or not [21:56:55] it is about putting all other "non-wp" services under there [21:57:57] gerrit, git, etherpad are misc services. parsoid/restbase/citoid/mathoid are not [21:58:10] okay [21:58:50] anyway, going to sleep, please do comment on that change, your input is valuable :-) [21:58:55] I was meaning to start that discussion too, but didn't get to it yet [21:59:15] kk, will comment & perhaps send a mail to the ops list [21:59:39] bblack: that goes for you too :-) [21:59:52] ttyl [21:59:58] akosiaris: goodnight! [22:00:54] matanya: hoo@bast1001:~$ rsync --progress -e 'ssh -a -o "ProxyCommand ssh -a -W %h:%p bastion.wmflabs.org"' encoding02.eqiad.wmflabs:/data/project/wikimania2014/ready/Evaluation_I_Metrics_p1-001.webm . [22:01:06] just for reference in case someone else needs that [22:01:17] nice :) ETA ? [22:01:26] 30s [22:01:35] copying at 33MiB/s [22:01:42] joy [22:01:50] hard work might pay at last [22:02:12] oh, where's the wikitext, btw? [22:02:22] Can also upload it w/o and you add that after, as it's only one file [22:02:26] in that dir [22:02:35] txt file [22:02:44] upload under my user name ? [22:02:46] rsync: link_stat "/data/project/wikimania2014/ready/Evaluation_I_Metrics_p1-001.webm.txt" failed: No such file or directory (2) [22:03:08] it is called Evaluation_I_Metrics.txt [22:03:30] ah, yep :) [22:03:37] Please fix that in the future, if oyu have mor efiles [22:03:50] i will, didn't plan to spilt [22:03:55] *split [22:04:14] sadly, i have to, files range from 6.5 GB to 25 GB [22:04:22] :( [22:04:39] Not sure what's needed to rise that limit w/o blowing everything up [22:04:59] godog said iy will break swift [22:05:03] it [22:05:17] Feared that :( [22:05:30] user=matanya and file name as given? [22:05:40] yes [22:06:02] started [22:06:12] a week or so ? [22:06:47] mh? [22:07:06] will it take a week or so? [22:07:34] few minutes, at max. [22:09:24] done :) [22:09:34] yep! [22:11:35] hoo: want to be brave and do part 2 as well ? [22:11:47] the qulity is lower than i expected [22:11:58] esp, for the file size [22:13:41] matanya: on that :) [22:14:09] i will open a task in phab to do all the rest once i split and add text to them [22:14:13] Same wikitext? [22:14:17] yes [22:14:20] i'll fix later [22:16:51] upload started [22:18:08] matanya: Quality really is not that good [22:18:21] is that coming form vp8 or just a crappy video? :( [22:18:34] I wonder why, it seems like the source isn't great [22:18:49] mh... do we allow VP9? [22:18:49] so i wonder why it is so huge [22:18:51] * hoo has no clue [22:18:56] not yet [22:19:18] done [22:19:23] hoo: you can see the source at /data/project/wikimania2014/ [22:19:31] depends on trusty update for the video scalers? [22:19:39] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [22:20:21] not sure hoo you should ask the multimedia team [22:21:19] uah, bad cut :/ [22:21:36] !log hoo Synchronized wmf-config/: Syncing Kaldari's beta-only change (duration: 00m 07s) [22:21:41] 1:24 short [22:21:41] Logged the message, Master [22:21:51] 1:35 [22:22:04] matanya: Replacing them via shell upload is possible [22:22:17] i'll fix that [22:23:15] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [22:24:59] hoo: please upload the newer version of both [22:26:36] hoo: sorry, hold it [22:26:46] killed the rsync [22:26:47] copy didn't finish [22:28:25] hoo: now done [22:29:15] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 57863 bytes in 0.823 second response time [22:37:57] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: puppet fail [22:52:25] matanya: ... and we're done :) [22:52:25] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:52:33] thanks so much hoo [23:15:28] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [23:18:27] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [23:20:09] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Puppet has 1 failures [23:20:19] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:25:55] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [23:26:34] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [23:30:56] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:35:09] springle, can you take a look at the schema change at https://gerrit.wikimedia.org/r/180704 please? alternatively, maybe you know a way to avoid it? [23:48:11] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:48:28] I'm trying to use gitblit and having great frustrations [23:50:09] Why does it keep killing itself? [23:51:13] gitblit sucks