[00:00:04] Deploy window HOLIDAY (observed) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170102T0000) [00:01:04] ohi jouncebot [00:04:13] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [00:08:43] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [00:17:23] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:33:13] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [00:46:23] RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [01:10:13] PROBLEM - Disk space on iridium is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=54%) [01:16:18] twentyafterfour godog ^^ [01:17:58] 06Operations, 10Phabricator: Iridium: Disk space is low - https://phabricator.wikimedia.org/T154407#2910375 (10Paladox) [01:18:09] 06Operations, 10Phabricator: Iridium: Disk space is low - https://phabricator.wikimedia.org/T154407#2910387 (10Paladox) [01:18:42] (03PS5) 10BryanDavis: l10nupdate: acquire scap lock before changing files [puppet] - 10https://gerrit.wikimedia.org/r/303923 (https://phabricator.wikimedia.org/T72752) [01:19:23] 06Operations, 10Phabricator: Iridium (phabricator): Disk space is low - https://phabricator.wikimedia.org/T154407#2910375 (10Paladox) [01:20:17] (03CR) 10BryanDavis: l10nupdate: acquire scap lock before changing files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303923 (https://phabricator.wikimedia.org/T72752) (owner: 10BryanDavis) [01:20:31] (03PS6) 10BryanDavis: l10nupdate: acquire scap lock before changing files [puppet] - 10https://gerrit.wikimedia.org/r/303923 (https://phabricator.wikimedia.org/T72752) [01:25:15] (03PS8) 10BryanDavis: Provision MediaWiki-Vagrant on Jessie hosts [puppet] - 10https://gerrit.wikimedia.org/r/245920 (https://phabricator.wikimedia.org/T154340) [01:25:33] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [01:45:13] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [01:45:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [01:50:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [01:50:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [01:52:33] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [01:55:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [01:55:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:00:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:00:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:05:13] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:05:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:10:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:10:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:15:13] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:15:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:20:13] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:20:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:23:13] PROBLEM - Disk space on iridium is CRITICAL: DISK CRITICAL - free space: / 324 MB (3% inode=54%) [02:25:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:25:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:30:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:30:14] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:35:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:35:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:40:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:40:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:45:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:45:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:50:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:50:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:53:13] PROBLEM - Disk space on iridium is CRITICAL: DISK CRITICAL - free space: / 313 MB (3% inode=54%) [02:55:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:55:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [02:58:55] gee, someone should probably get those SSL certs checked [03:00:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:00:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:05:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:05:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:10:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:10:13] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:15:03] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:15:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:20:13] PROBLEM - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:20:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:23:30] ACKNOWLEDGEMENT - check_ssl on barium is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) Jeff_Green already in progress [03:25:13] PROBLEM - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) [03:25:46] ACKNOWLEDGEMENT - check_ssl on mintaka is CRITICAL: SSL CRITICAL - Certificate civicrm.wikimedia.org valid until 2017-01-09 01:41:03 +0000 (expires in 6 days) Jeff_Green already in progress [03:29:36] _joe_: Hey [03:30:01] _joe_: Can I please nuke your Commons user page so the global one shows up? [03:30:26] (your commons page is a bit useless, lol) [03:32:15] The template you used doesn’t work right here. [06:28:33] PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:44:23] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:49:23] PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:50:13] PROBLEM - puppet last run on ms-be1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:54:33] PROBLEM - Check HHVM threads for leakage on mw1260 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:56:23] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:12:23] RECOVERY - Check HHVM threads for leakage on mw1259 is OK: OK [07:14:23] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [07:19:13] RECOVERY - puppet last run on ms-be1021 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [07:30:23] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [07:36:33] RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK [08:01:13] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:29:13] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [08:37:33] RECOVERY - Check HHVM threads for leakage on mw1260 is OK: OK [08:44:08] (03CR) 10Gilles: "I don't think that the UX is comparable. Being able to see the alerts in Grafana and have them directly connected to the alerting is a sup" [puppet] - 10https://gerrit.wikimedia.org/r/328673 (https://phabricator.wikimedia.org/T153167) (owner: 10Gilles) [08:57:33] PROBLEM - Disk space on ms-be1006 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdk1 is not accessible: Input/output error [09:01:33] RECOVERY - Disk space on ms-be1006 is OK: DISK OK [09:13:33] PROBLEM - MegaRAID on ms-be1006 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [09:13:44] ACKNOWLEDGEMENT - MegaRAID on ms-be1006 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T154418 [09:13:47] 06Operations, 10ops-eqiad: Degraded RAID on ms-be1006 - https://phabricator.wikimedia.org/T154418#2910802 (10ops-monitoring-bot) [09:16:34] PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[mkfs-/dev/sdk1] [09:39:24] PROBLEM - puppet last run on restbase1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:07:24] RECOVERY - puppet last run on restbase1015 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [10:32:43] <_joe_> Revent: I'll fix it when I'm at a computer :) [10:37:18] (03CR) 10Addshore: [C: 031] Add badge for "digitaldocument" in Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329453 (https://phabricator.wikimedia.org/T153186) (owner: 10Ladsgroup) [10:55:33] (03PS1) 10Ema: varnishxcache: port to cachestats.CacheStatsSender [puppet] - 10https://gerrit.wikimedia.org/r/330111 (https://phabricator.wikimedia.org/T151643) [10:55:34] PROBLEM - puppet last run on mc1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:04:25] (03CR) 10Ema: [V: 032 C: 032] varnishxcache: port to cachestats.CacheStatsSender [puppet] - 10https://gerrit.wikimedia.org/r/330111 (https://phabricator.wikimedia.org/T151643) (owner: 10Ema) [11:09:57] (03PS2) 10Elukey: Add the HHVM and Apache videoscaler clusters to Prometheus polling [puppet] - 10https://gerrit.wikimedia.org/r/328913 (https://phabricator.wikimedia.org/T147316) [11:10:50] jouncebot: next [11:10:51] In 26 hour(s) and 49 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T1400) [11:11:34] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:23:34] RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [11:38:34] PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:39:34] PROBLEM - parsoid on wtp2017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:39:34] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:40:24] RECOVERY - parsoid on wtp2017 is OK: HTTP OK: HTTP/1.1 200 OK - 1014 bytes in 4.380 second response time [11:40:42] hashar: Hi! Seems like all Wikibase CI builds are failing for some odd reason (see e.g. https://gerrit.wikimedia.org/r/#/c/329659/). Do you happen to have any idea what's going on? [11:44:05] leszek_wmde: yeah sorry I broke it :( [11:44:17] leszek_wmde: should be good again now [11:44:23] hashar: good to know :) Thanks! [11:44:34] PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:44:35] leszek_wmde: pass the word in the wmde office [11:44:54] hashar: will do, thanks! [12:01:44] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [12:07:34] RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [12:12:34] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [12:29:34] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [12:30:54] PROBLEM - Host mw1280 is DOWN: PING CRITICAL - Packet loss = 100% [13:09:54] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:18:52] anybody restarting mw1280? [13:19:10] probably not, 40 mins ago, checking [13:23:34] RECOVERY - Host mw1280 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [13:24:13] !log powercycled mw1280, not pingable and mgmt console frozen [13:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:13] has anything changed in how icinga alers us for host down? I would have expected more alarms to get fired for mw1280 down [13:36:27] Hey, I have this super super straightforward patch in puppet for a module only being used in labs in one instance, can you take a look? https://gerrit.wikimedia.org/r/#/c/329316/ [13:36:54] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [13:58:04] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:58:30] (03PS1) 10Ema: apt.w.o: redirect / to wikitech article [puppet] - 10https://gerrit.wikimedia.org/r/330140 [14:00:04] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:00:54] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [14:00:55] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [14:05:14] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [14:06:24] PROBLEM - puppet last run on mc1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:32:20] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [14:35:20] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [14:57:11] 06Operations, 10DBA, 10Gerrit, 06Release-Engineering-Team: Gerrit: Schedule downtime for T154205 (To do with data loss) - https://phabricator.wikimedia.org/T154327#2911141 (10Paladox) [15:24:40] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [15:25:20] 06Operations, 06Performance-Team, 10Thumbor: Implement DC-local cache failure limiter in Thumbor - https://phabricator.wikimedia.org/T151065#2911169 (10Gilles) I now think that poolcounter is unsuitable for this because processing individual requests can take several seconds and using PC for failure throttli... [15:44:20] PROBLEM - Disk space on iridium is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=54%) [15:46:33] (03PS1) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [15:47:38] (03CR) 10jerkins-bot: [V: 04-1] Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) (owner: 10Elukey) [15:47:52] thanks jenkins! First -1 of 2017 for me :/ [15:48:03] checking iridium.. [15:50:31] ah snap the root is 10G only [15:52:40] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [15:57:03] https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=iridium&var-network=eth0&panelId=17&fullscreen&from=now-7d&to=now - steady growth for the past days [15:58:19] elukey: there's 1.6G of stuff in /var/log/account [15:59:13] plus lots of kernels (with headers) [15:59:47] yeah... [16:00:47] "1.7G account" is weird in /var/log [16:01:01] (having the analytics standup, will write with a bit of lag) [16:01:55] no worries, I'll start doing some cleaning [16:03:16] thanks :) [16:05:20] !log removing old kernels and kernel headers from iridium to free up some disk space [16:05:20] RECOVERY - Disk space on iridium is OK: DISK OK [16:05:20] PROBLEM - DPKG on iridium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:05:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:50] elukey, that's not very weird [16:07:20] RECOVERY - DPKG on iridium is OK: All packages OK [16:07:21] https://phabricator.wikimedia.org/T107052 [16:07:27] https://phabricator.wikimedia.org/T107617 [16:14:56] Krenair: sorry I didn't finish to write due to the standup, but from a quick glance I didn't see anything that would have sum to 1.7G in account (also noticed a deleted file still held by a process in lsof) [16:15:07] but probably I was not super precise [16:15:09] re-checking [16:16:56] yes my math was not correct, a lot of 40MB files at some point take space :D [16:27:28] 06Operations, 06Performance-Team, 10Thumbor: Implement rate limiter in Thumbor - https://phabricator.wikimedia.org/T151067#2911260 (10Gilles) Upstream PR: https://github.com/thumbor/thumbor/pull/847 Can be worked around by looking at X-Forwarded-For manually. [16:59:41] PROBLEM - puppet last run on elastic1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:20:32] !log iridium: removed /var/log/account/pacct.2[0-9].gz to free up more disk space [17:20:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:24] \o/ [17:27:40] RECOVERY - puppet last run on elastic1039 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:44:09] (03PS2) 10Madhuvishy: nfs: Dual mount misc projects from labstore-secondary cluster [puppet] - 10https://gerrit.wikimedia.org/r/329711 (https://phabricator.wikimedia.org/T154336) [17:48:41] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video, and 3 others: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2911374 (10elukey) Adding some info about https://bz.apache.org/bugzilla/show_bug.cgi?id=56188, that we are pre... [17:54:37] (03PS2) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [17:55:47] (03CR) 10jerkins-bot: [V: 04-1] Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) (owner: 10Elukey) [17:57:49] (03PS3) 10Elukey: Add JVM Heap usage alarms for basic Hadoop daemons [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) [18:13:42] p858snake, if you're around, please have a look at T154312. We need the IP(s) of the venue, you said it'll be announced but the first event is at January 04th. Do you have them? [18:13:42] T154312: Request for a temporary lift of account creation cap on IPs (2017-01-04,2017-01-06,2017-01-10) - https://phabricator.wikimedia.org/T154312 [18:13:53] Please see the Dereckson 's comment there too. [18:14:43] Hello. [18:15:13] Urbanecm: I fear p858snake triaged the task and we need to ask to Mahitgar. [18:15:29] Dereckson, I noticed it before a moment. Sorry p858snake! [18:16:11] Dereckson, thank you. [18:23:24] hmm, does anyone know where I can find the echo_event etc tables for ruwiki on the analytics cluster from stat1002? [18:34:30] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [18:43:50] PROBLEM - mailman list info on fermium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:44:00] PROBLEM - mailman archives on fermium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:51:40] RECOVERY - mailman list info on fermium is OK: HTTP OK: HTTP/1.1 200 OK - 15503 bytes in 0.089 second response time [18:51:50] RECOVERY - mailman archives on fermium is OK: HTTP OK: HTTP/1.1 200 OK - 66747 bytes in 0.008 second response time [19:01:40] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [19:04:00] 06Operations, 10MediaWiki-Export-or-Import, 10Wikimedia-General-or-Unknown, 13Patch-For-Review: Special:Import error: "Import failed: Could not open import file" - https://phabricator.wikimedia.org/T17000#2911470 (10FilipGCI) a:03FilipGCI @TTO: Added $wgHTTPImportTimeout setting. [19:38:54] 06Operations, 10ops-codfw, 06Discovery, 06Discovery-Search, 10Elasticsearch: rack/setup/install elastic2025-2036 - https://phabricator.wikimedia.org/T154251#2911498 (10Papaul) [19:40:18] 06Operations, 10ops-codfw: rack/setup/install mw2051-mw2060 - https://phabricator.wikimedia.org/T152698#2911500 (10Papaul) [19:46:57] (03PS1) 10Madhuvishy: nfs-mounts: Remove wikidata-quality from nfs-mount yaml [puppet] - 10https://gerrit.wikimedia.org/r/330173 [19:54:40] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [19:58:41] PROBLEM - puppet last run on snapshot1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:08:30] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [20:09:40] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [20:10:30] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:13:30] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [20:15:30] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:15:40] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:20:30] PROBLEM - puppet last run on scb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:22:40] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [20:28:40] RECOVERY - puppet last run on snapshot1006 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [20:35:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [20:35:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [20:40:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [20:40:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [20:45:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [20:45:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [20:45:27] There seems to be warnning about ssl ^^ [20:47:50] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 633 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2879157 keys, up 63 days 12 hours - replication_delay is 633 [20:48:50] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2866941 keys, up 63 days 12 hours - replication_delay is 0 [20:49:30] RECOVERY - puppet last run on scb1003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [20:50:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [20:50:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [20:55:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [20:55:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:00:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:00:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:05:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:05:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:10:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:10:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:15:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:15:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:20:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:20:20] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:25:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:25:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:30:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:30:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:35:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:35:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:40:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:40:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:45:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:45:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:49:58] 06Operations, 10fundraising-tech-ops: SSL cert for payments-listener.wikimedia.org expires on 2017-01-09 (~6 days) - https://phabricator.wikimedia.org/T154448#2911652 (10Peachey88) [21:50:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:50:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:55:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [21:55:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:00:10] (03PS1) 10DatGuy: Redirect https://toolserver.org/~magnus/ [puppet] - 10https://gerrit.wikimedia.org/r/330178 (https://phabricator.wikimedia.org/T113696) [22:00:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:00:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:05:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:05:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:10:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:10:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:15:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:15:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:20:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:20:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:25:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:25:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:30:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:30:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:32:24] (03CR) 10Tim Landscheidt: [C: 04-1] "No, the existing redirects seem to work fine and so there is no need to remove them. "Redirect" does not support regular expressions, but" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/330178 (https://phabricator.wikimedia.org/T113696) (owner: 10DatGuy) [22:35:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:35:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:40:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:40:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:45:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:45:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:50:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:50:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:52:33] (03PS2) 10DatGuy: Redirect https://toolserver.org/~magnus/ [puppet] - 10https://gerrit.wikimedia.org/r/330178 (https://phabricator.wikimedia.org/T113696) [22:55:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [22:55:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:00:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:00:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:04:50] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 7 failures. Last run 2 minutes ago with 7 failures. Failed resources (up to 3 shown): Package[tzdata],Service[zotero],Exec[zotero-admin_ensure_members],Exec[sc-admins_ensure_members] [23:05:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:05:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:09:24] !log Removed 2fa from an account, per T154450 [23:09:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:10:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:10:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:15:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:15:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:20:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:20:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:25:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:25:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:30:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:30:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:32:00] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [23:33:00] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [23:35:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:35:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:40:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:40:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:45:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:45:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:45:56] /win go #nasqueron-ops [23:50:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:50:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:55:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [23:55:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days)