[00:23:30] <icinga-wm>	 RECOVERY - puppet last run on db2011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[00:27:30] <icinga-wm>	 PROBLEM - Swift HTTP backend on ms-fe2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:29:19] <icinga-wm>	 RECOVERY - Swift HTTP backend on ms-fe2002 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.084 second response time
[00:32:39] <icinga-wm>	 PROBLEM - cassandra CQL 10.64.32.160:9042 on restbase1004 is CRITICAL: Connection refused
[00:45:50] <icinga-wm>	 PROBLEM - Check size of conntrack table on ms-be2019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:47:30] <icinga-wm>	 PROBLEM - Disk space on ms-be2019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:49:28] <icinga-wm>	 RECOVERY - Disk space on ms-be2019 is OK: DISK OK
[00:49:30] <icinga-wm>	 RECOVERY - Check size of conntrack table on ms-be2019 is OK: OK: nf_conntrack is 0 % full
[00:54:58] <icinga-wm>	 PROBLEM - DPKG on ms-be2019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:55:19] <icinga-wm>	 PROBLEM - Disk space on ms-be2019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:55:28] <icinga-wm>	 PROBLEM - configured eth on ms-be2019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:55:29] <icinga-wm>	 PROBLEM - Check size of conntrack table on ms-be2019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[00:58:58] <icinga-wm>	 PROBLEM - puppet last run on mw1001 is CRITICAL: CRITICAL: Puppet has 54 failures
[01:04:08] <icinga-wm>	 PROBLEM - Swift HTTP backend on ms-fe2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:05:58] <icinga-wm>	 RECOVERY - Swift HTTP backend on ms-fe2001 is OK: HTTP OK: HTTP/1.1 200 OK - 396 bytes in 0.097 second response time
[01:06:49] <icinga-wm>	 PROBLEM - Disk space on restbase1008 is CRITICAL: DISK CRITICAL - free space: /srv 73994 MB (3% inode=99%)
[01:16:29] <icinga-wm>	 PROBLEM - DPKG on mw1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:16:40] <icinga-wm>	 PROBLEM - puppet last run on mw1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:16:48] <icinga-wm>	 PROBLEM - RAID on mw1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:16:58] <icinga-wm>	 PROBLEM - configured eth on mw1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:17:10] <icinga-wm>	 PROBLEM - nutcracker process on mw1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:18:18] <icinga-wm>	 RECOVERY - DPKG on mw1002 is OK: All packages OK
[01:18:29] <icinga-wm>	 RECOVERY - RAID on mw1002 is OK: OK: no RAID installed
[01:18:30] <icinga-wm>	 RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 10 minutes ago with 0 failures
[01:18:40] <icinga-wm>	 RECOVERY - configured eth on mw1002 is OK: OK - interfaces up
[01:18:58] <icinga-wm>	 RECOVERY - nutcracker process on mw1002 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[01:28:20] <icinga-wm>	 RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[01:28:52] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra CQL 10.64.32.160:9042 on restbase1004 is CRITICAL: Connection refused eevans Decommission complete This node will be down for the remaining of the weekend. - The acknowledgement expires at: 2015-12-21 01:28:02.
[01:30:39] <icinga-wm>	 ACKNOWLEDGEMENT - Disk space on restbase1008 is CRITICAL: DISK CRITICAL - free space: /srv 62089 MB (3% inode=99%): gwicke Keeping an eye on this one. 62G left with a big compaction at 92% will likely make it.
[01:30:49] <icinga-wm>	 PROBLEM - NTP on ms-be2019 is CRITICAL: NTP CRITICAL: No response from NTP server
[02:18:59] <icinga-wm>	 RECOVERY - Disk space on restbase1008 is OK: DISK OK
[02:21:55] <logmsgbot>	 !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 08m 59s)
[02:22:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:28:49] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Dec 20 02:28:49 UTC 2015 (duration 6m 54s)
[02:29:00] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:52:49] <icinga-wm>	 PROBLEM - puppet last run on mw2142 is CRITICAL: CRITICAL: Puppet has 1 failures
[04:05:59] <icinga-wm>	 PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [100000000.0]
[04:18:19] <icinga-wm>	 RECOVERY - puppet last run on mw2142 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[04:44:39] <grrrit-wm>	 (03PS1) 10Gergő Tisza: [DO NOT MERGE] Switch FlaggedRevs to "flagged protection" mode on huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260224 (https://phabricator.wikimedia.org/T121995) 
[05:14:19] <icinga-wm>	 PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 12.00% of data above the critical threshold [100000000.0]
[05:28:40] <icinga-wm>	 PROBLEM - puppet last run on mw1001 is CRITICAL: CRITICAL: Puppet has 60 failures
[05:33:59] <icinga-wm>	 PROBLEM - puppet last run on mc2006 is CRITICAL: CRITICAL: puppet fail
[05:57:59] <icinga-wm>	 RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[06:01:28] <icinga-wm>	 RECOVERY - puppet last run on mc2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:05:18] <icinga-wm>	 RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]
[06:29:09] <icinga-wm>	 PROBLEM - nutcracker process on mw1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:29:19] <icinga-wm>	 PROBLEM - RAID on mw1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:29:28] <icinga-wm>	 PROBLEM - dhclient process on mw1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:29:29] <icinga-wm>	 PROBLEM - puppet last run on mw1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:29:48] <icinga-wm>	 PROBLEM - SSH on mw1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:29:58] <icinga-wm>	 PROBLEM - Disk space on mw1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:29:59] <icinga-wm>	 PROBLEM - salt-minion processes on mw1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:30:29] <icinga-wm>	 PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:50] <icinga-wm>	 PROBLEM - DPKG on mw1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:30:58] <icinga-wm>	 PROBLEM - nutcracker port on mw1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:30:58] <icinga-wm>	 PROBLEM - configured eth on mw1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[06:31:09] <icinga-wm>	 PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:38] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:31:39] <icinga-wm>	 PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:49] <icinga-wm>	 PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:29] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 4 failures
[06:33:38] <icinga-wm>	 PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:33:40] <icinga-wm>	 PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 3 failures
[06:34:59] <icinga-wm>	 RECOVERY - nutcracker port on mw1001 is OK: TCP OK - 0.000 second response time on port 11212
[06:35:09] <icinga-wm>	 RECOVERY - nutcracker process on mw1001 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[06:35:19] <icinga-wm>	 RECOVERY - dhclient process on mw1001 is OK: PROCS OK: 0 processes with command name dhclient
[06:35:48] <icinga-wm>	 RECOVERY - SSH on mw1001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0)
[06:36:39] <icinga-wm>	 RECOVERY - DPKG on mw1001 is OK: All packages OK
[06:36:40] <icinga-wm>	 RECOVERY - configured eth on mw1001 is OK: OK - interfaces up
[06:37:09] <icinga-wm>	 RECOVERY - RAID on mw1001 is OK: OK: no RAID installed
[06:37:48] <icinga-wm>	 RECOVERY - Disk space on mw1001 is OK: DISK OK
[06:37:49] <icinga-wm>	 RECOVERY - salt-minion processes on mw1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[06:54:58] <icinga-wm>	 RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:19] <icinga-wm>	 RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[06:56:58] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[06:56:59] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[06:56:59] <icinga-wm>	 RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures
[06:57:00] <icinga-wm>	 RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[06:57:09] <icinga-wm>	 RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures
[06:57:59] <icinga-wm>	 RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:30] <icinga-wm>	 RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:07:00] <icinga-wm>	 PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: puppet fail
[07:32:19] <icinga-wm>	 RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[07:51:29] <icinga-wm>	 PROBLEM - puppet last run on mw1253 is CRITICAL: CRITICAL: Puppet has 1 failures
[08:16:58] <icinga-wm>	 RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:17:19] <icinga-wm>	 PROBLEM - puppet last run on analytics1043 is CRITICAL: CRITICAL: Puppet has 1 failures
[08:44:39] <icinga-wm>	 RECOVERY - puppet last run on analytics1043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:00:21] <godog>	 !log powercycle ms-be2019, xfs lockup
[09:00:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:03:28] <icinga-wm>	 RECOVERY - configured eth on ms-be2019 is OK: OK - interfaces up
[09:03:28] <icinga-wm>	 RECOVERY - swift-container-replicator on ms-be2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator
[09:03:28] <icinga-wm>	 RECOVERY - Disk space on ms-be2019 is OK: DISK OK
[09:03:29] <icinga-wm>	 RECOVERY - Check size of conntrack table on ms-be2019 is OK: OK: nf_conntrack is 0 % full
[09:03:29] <icinga-wm>	 RECOVERY - swift-account-server on ms-be2019 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server
[09:03:29] <icinga-wm>	 RECOVERY - swift-account-reaper on ms-be2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[09:03:29] <icinga-wm>	 RECOVERY - swift-account-replicator on ms-be2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[09:03:59] <icinga-wm>	 RECOVERY - RAID on ms-be2019 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[09:04:09] <icinga-wm>	 RECOVERY - swift-object-replicator on ms-be2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[09:04:39] <icinga-wm>	 RECOVERY - swift-container-server on ms-be2019 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server
[09:04:39] <icinga-wm>	 RECOVERY - swift-container-updater on ms-be2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater
[09:04:39] <icinga-wm>	 RECOVERY - swift-account-auditor on ms-be2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[09:04:39] <icinga-wm>	 RECOVERY - very high load average likely xfs on ms-be2019 is OK: OK - load average: 9.48, 4.07, 1.52
[09:04:39] <icinga-wm>	 RECOVERY - swift-object-server on ms-be2019 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[09:04:39] <icinga-wm>	 RECOVERY - dhclient process on ms-be2019 is OK: PROCS OK: 0 processes with command name dhclient
[09:04:39] <icinga-wm>	 RECOVERY - swift-object-updater on ms-be2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater
[09:04:40] <icinga-wm>	 RECOVERY - swift-object-auditor on ms-be2019 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor
[09:05:10] <icinga-wm>	 RECOVERY - SSH on ms-be2019 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0)
[09:05:10] <icinga-wm>	 RECOVERY - swift-container-auditor on ms-be2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor
[09:05:10] <icinga-wm>	 RECOVERY - DPKG on ms-be2019 is OK: All packages OK
[09:05:10] <icinga-wm>	 RECOVERY - salt-minion processes on ms-be2019 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[09:52:13] <wikibugs_>	 6operations, 10Wikimedia-Site-Requests, 7I18n, 7Tracking: Wikis waiting to be renamed (tracking) - https://phabricator.wikimedia.org/T21986#1893351 (10Nemo_bis)
[10:35:08] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 870
[10:40:08] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1000
[10:45:09] <icinga-wm>	 RECOVERY - check_mysql on db1008 is OK: Uptime: 154151 Threads: 95 Questions: 8263215 Slow queries: 1648 Opens: 2061 Flush tables: 2 Open tables: 400 Queries per second avg: 53.604 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[12:46:59] <grrrit-wm>	 (03PS1) 10Southparkfan: Correctly order MediaWiki servers in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/260232 
[12:54:19] <icinga-wm>	 PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: puppet fail
[13:21:40] <icinga-wm>	 RECOVERY - puppet last run on ms-be2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[13:28:49] <icinga-wm>	 PROBLEM - HTTPS-toolserver on www.toolserver.org is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: connect: Connection timed out
[13:29:19] <icinga-wm>	 PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: puppet fail
[13:30:49] <icinga-wm>	 RECOVERY - HTTPS-toolserver on www.toolserver.org is OK: SSL OK - Certificate toolserver.org valid until 2016-06-30 17:56:02 +0000 (expires in 193 days)
[13:43:01] <wikibugs_>	 6operations, 10Traffic, 7Pybal: pybal fails to detect dead servers under production lb IPs for port 80 - https://phabricator.wikimedia.org/T113151#1893571 (10Aklapper)
[13:54:40] <icinga-wm>	 RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[14:50:59] <icinga-wm>	 PROBLEM - HHVM rendering on mw1228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:52:58] <icinga-wm>	 PROBLEM - puppet last run on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:54:39] <icinga-wm>	 RECOVERY - HHVM rendering on mw1228 is OK: HTTP OK: HTTP/1.1 200 OK - 65334 bytes in 0.128 second response time
[14:54:39] <icinga-wm>	 RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 29 minutes ago with 0 failures
[14:58:49] <icinga-wm>	 PROBLEM - HHVM rendering on mw1228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[14:59:39] <icinga-wm>	 PROBLEM - nutcracker process on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:59:39] <icinga-wm>	 PROBLEM - RAID on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:59:59] <icinga-wm>	 PROBLEM - DPKG on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:59:59] <icinga-wm>	 PROBLEM - HHVM processes on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:00:00] <icinga-wm>	 PROBLEM - salt-minion processes on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:00:09] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:00:39] <icinga-wm>	 PROBLEM - configured eth on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:00:49] <icinga-wm>	 PROBLEM - puppet last run on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:00:58] <icinga-wm>	 PROBLEM - Disk space on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:01:08] <icinga-wm>	 PROBLEM - nutcracker port on mw1228 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[15:01:18] <icinga-wm>	 PROBLEM - Apache HTTP on mw1228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[15:01:38] <icinga-wm>	 RECOVERY - RAID on mw1228 is OK: OK: no RAID installed
[15:01:38] <icinga-wm>	 RECOVERY - nutcracker process on mw1228 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[15:01:58] <icinga-wm>	 RECOVERY - HHVM processes on mw1228 is OK: PROCS OK: 6 processes with command name hhvm
[15:01:59] <icinga-wm>	 RECOVERY - DPKG on mw1228 is OK: All packages OK
[15:01:59] <icinga-wm>	 RECOVERY - salt-minion processes on mw1228 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[15:02:10] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1228 is OK: OK: nf_conntrack is 0 % full
[15:02:38] <icinga-wm>	 RECOVERY - configured eth on mw1228 is OK: OK - interfaces up
[15:02:40] <icinga-wm>	 RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 37 minutes ago with 0 failures
[15:02:49] <icinga-wm>	 RECOVERY - Disk space on mw1228 is OK: DISK OK
[15:02:59] <icinga-wm>	 RECOVERY - nutcracker port on mw1228 is OK: TCP OK - 0.000 second response time on port 11212
[15:08:04] <yurik>	 did everything just die?  https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes
[15:08:38] <yurik>	 bblack, mark ^
[15:08:53] <aude>	 yurik: like what die?
[15:08:58] <yurik>	 aude, https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes
[15:09:03] <yurik>	 all status codes went to 0
[15:09:26] <yurik>	 but it seems that wiki is still ok, so my guess that the reporting stuff died somehow
[15:09:35] <aude>	 aggregation died
[15:09:37] <aude>	 ?
[15:11:04] <yurik>	 i guess so
[15:13:38] <icinga-wm>	 PROBLEM - HHVM processes on mw1228 is CRITICAL: PROCS CRITICAL: 0 processes with command name hhvm
[15:14:00] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[15:15:20] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[15:23:09] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[15:38:56] <grrrit-wm>	 (03PS1) 10Reedy: Remove wgArticlePath from InitialiseSettings as it's in CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260242 
[15:41:30] <logmsgbot>	 !log reedy@tin Purged l10n cache for 1.27.0-wmf.7
[15:41:34] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:42:06] <Reedy>	 !log mw1228 reporting readonly fs
[15:42:10] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:43:54] <wikibugs_>	 6operations: mw1228 reporting readonly file system - https://phabricator.wikimedia.org/T122005#1893636 (10Reedy) 3NEW
[15:45:46] <wikibugs>	 6operations: mw1228 reporting readonly file system - https://phabricator.wikimedia.org/T122005#1893651 (10Reedy) ``` The authenticity of host 'mw1228.eqiad.wmnet (<no hostip for proxy command>)' can't be established. ECDSA key fingerprint is SHA256:rdCU2vs6Jctc96R4kDXnIdrhl0DaKizk8ctz0vTcr2M. Are you sure you wa...
[15:46:54] <wikibugs_>	 6operations: mw1228 reporting readonly file system - https://phabricator.wikimedia.org/T122005#1893652 (10Reedy) ``` [14:50:59] <icinga-wm> PROBLEM - HHVM rendering on mw1228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:52:06] --> govg (~govg@unaffiliated/govg) has joined #wikimedia-operations [14...
[15:50:20] <Reedy>	 !log reedy@tin Purged l10n cache for 1.27.0-wmf.6 (hanging due to mw1228 issue)
[15:50:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:50:29] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[15:50:46] <Reedy>	 Wonder what versions of MW can be deleted
[15:51:08] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[15:51:34] <hoo>	 Reedy: Everything beyond 1.5 was bloat... so... *hides*
[15:51:53] <Reedy>	 hoo: I mean from tin :P
[15:53:50] <logmsgbot>	 !log reedy@tin Synchronized README: noop (duration: 00m 32s)
[15:53:53] <hoo>	 Didn't we have that (maybe not 100% serious) page which asked to downgrade Wikipedia to 1.5 (or something along these lines)?
[15:53:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:54:24] <Reedy>	 I think we've had a few
[15:54:25] <Reedy>	 :D
[16:15:09] <icinga-wm>	 PROBLEM - Disk space on restbase1003 is CRITICAL: DISK CRITICAL - free space: /var 110978 MB (3% inode=99%)
[16:21:35] <wikibugs>	 6operations: Investigate idle/depooled eqiad appservers - https://phabricator.wikimedia.org/T116256#1893687 (10Southparkfan) mw1061, mw1083, mw1161 and mw1169 are showing (normal) activity again, so that's good.  mw1118, mw1141 and mw1196 still seem idle, and mw1228 is idle per a (likely) broken disk (T122005) s...
[16:26:49] <icinga-wm>	 PROBLEM - Disk space on restbase1003 is CRITICAL: DISK CRITICAL - free space: /var 111502 MB (3% inode=99%)
[16:54:02] <grrrit-wm>	 (03CR) 10Hoo man: [C: 04-1] "No longer relevant (code should be developed in operations/dumps/dcat now)" [puppet] - 10https://gerrit.wikimedia.org/r/251492 (https://phabricator.wikimedia.org/T117533) (owner: 10Lokal Profil)
[16:54:15] <grrrit-wm>	 (03CR) 10Hoo man: [C: 04-1] "No longer relevant (code should be developed in operations/dumps/dcat now)" [puppet] - 10https://gerrit.wikimedia.org/r/251493 (owner: 10Lokal Profil)
[17:00:44] <grrrit-wm>	 (03PS3) 10Hoo man: snapshot: mv wikidatadumps classes to autoloader layout [puppet] - 10https://gerrit.wikimedia.org/r/260186 (owner: 10Dzahn)
[17:10:48] <icinga-wm>	 PROBLEM - HTTPS on magnesium is CRITICAL: SSL CRITICAL - Certificate rt.wikimedia.org valid until 2016-01-09 09:48:57 +0000 (expires in 19 days)
[17:11:38] <godog>	 !log depool mw1228, reported ro fs
[17:11:41] <godog>	 Reedy: ^
[17:11:42] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:16:04] <grrrit-wm>	 (03PS1) 10Hoo man: snapshot: Deploy DCAT from operations/dumps/dcat [puppet] - 10https://gerrit.wikimedia.org/r/260247 (https://phabricator.wikimedia.org/T120932) 
[17:16:41] <grrrit-wm>	 (03CR) 10Hoo man: "Note: I reviewed the changes between the version deployed before and the master of operations/dumps/dcat." [puppet] - 10https://gerrit.wikimedia.org/r/260247 (https://phabricator.wikimedia.org/T120932) (owner: 10Hoo man)
[17:24:34] <Reedy>	 godog: Thanks!
[17:24:45] <Reedy>	 I filed a task about it as ssh is down, so that should be good for now
[17:25:07] <grrrit-wm>	 (03PS5) 10Andrew Bogott: WIP: Set up special dhcp behavior for bare-metal boxes [puppet] - 10https://gerrit.wikimedia.org/r/259787 
[17:25:58] <Reedy>	 godog: Wonder if it should go from the dsh lists for now too
[17:32:08] <wikibugs_>	 6operations: mw1228 reporting readonly file system - https://phabricator.wikimedia.org/T122005#1893735 (10Reedy) godog has now depooled it. Still needs investigation
[17:33:24] <godog>	 Reedy: yeah good idea
[17:33:46] <Reedy>	 are the dsh lists made dynamically now?
[17:34:01] <Reedy>	 quick glance and I couldn't see it in the puppet repo
[17:34:48] <godog>	 yeah it is in mediawiki-installation isn't it?
[17:35:02] <Reedy>	 yeah, that's the list
[17:36:13] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: scap: mw1228 reported ro fs [puppet] - 10https://gerrit.wikimedia.org/r/260251 (https://phabricator.wikimedia.org/T122005) 
[17:36:30] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] scap: mw1228 reported ro fs [puppet] - 10https://gerrit.wikimedia.org/r/260251 (https://phabricator.wikimedia.org/T122005) (owner: 10Filippo Giunchedi)
[17:36:58] <godog>	 {{done}}
[17:47:38] <Reedy>	 thanks! :D
[17:47:40] <logmsgbot>	 !log reedy@tin Purged l10n cache for 1.27.0-wmf.6
[17:47:44] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:59:59] <grrrit-wm>	 (03PS5) 10Andrew Bogott: WIP: nova-network: have dnsmasq advertise the network host as a tftp server [puppet] - 10https://gerrit.wikimedia.org/r/259788 
[18:00:01] <grrrit-wm>	 (03PS6) 10Andrew Bogott: Set up special dhcp behavior for bare-metal boxes [puppet] - 10https://gerrit.wikimedia.org/r/259787 
[18:00:03] <grrrit-wm>	 (03PS5) 10Andrew Bogott: Insert dns entries for labs bare-metal systems. [puppet] - 10https://gerrit.wikimedia.org/r/260037 
[18:13:18] <grrrit-wm>	 (03PS6) 10Andrew Bogott: WIP: nova-network: have dnsmasq advertise the network host as a tftp server [puppet] - 10https://gerrit.wikimedia.org/r/259788 
[18:13:20] <grrrit-wm>	 (03PS7) 10Andrew Bogott: Set up special dhcp behavior for bare-metal boxes [puppet] - 10https://gerrit.wikimedia.org/r/259787 
[18:13:22] <grrrit-wm>	 (03PS6) 10Andrew Bogott: Insert dns entries for labs bare-metal systems. [puppet] - 10https://gerrit.wikimedia.org/r/260037 
[18:16:29] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1228 is CRITICAL: Host mw1228 is not in mediawiki-installation dsh group
[18:20:38] <andrewbogott>	 hm, did I break jenkins?
[18:31:21] <andrewbogott>	 !log restarting stuck Jenkins
[18:31:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:34:25] <Reedy>	 godog: we broke icinga-wm :P
[18:43:00] <grrrit-wm>	 (03CR) 10Andrew Bogott: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/259787 (owner: 10Andrew Bogott)
[18:46:14] <grrrit-wm>	 (03CR) 10Andrew Bogott: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/260037 (owner: 10Andrew Bogott)
[18:46:55] <andrewbogott>	 !log graceful restart of zuul as per https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Restart
[18:47:00] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:48:52] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Set up special dhcp behavior for bare-metal boxes [puppet] - 10https://gerrit.wikimedia.org/r/259787 (owner: 10Andrew Bogott)
[18:53:04] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Revert "Set up special dhcp behavior for bare-metal boxes" [puppet] - 10https://gerrit.wikimedia.org/r/260252 
[18:53:30] <icinga-wm>	 PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: puppet fail
[18:53:39] <icinga-wm>	 PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: puppet fail
[18:53:48] <icinga-wm>	 PROBLEM - puppet last run on restbase1002 is CRITICAL: CRITICAL: puppet fail
[18:53:59] <icinga-wm>	 PROBLEM - puppet last run on db2001 is CRITICAL: CRITICAL: puppet fail
[18:54:10] <icinga-wm>	 PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: puppet fail
[18:54:28] <icinga-wm>	 PROBLEM - puppet last run on wtp1013 is CRITICAL: CRITICAL: puppet fail
[18:54:29] <icinga-wm>	 PROBLEM - puppet last run on wtp2009 is CRITICAL: CRITICAL: puppet fail
[18:54:30] <icinga-wm>	 PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: puppet fail
[18:54:38] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Revert "Set up special dhcp behavior for bare-metal boxes" [puppet] - 10https://gerrit.wikimedia.org/r/260252 (owner: 10Andrew Bogott)
[18:54:40] <icinga-wm>	 PROBLEM - puppet last run on labvirt1005 is CRITICAL: CRITICAL: puppet fail
[18:54:49] <icinga-wm>	 PROBLEM - puppet last run on wtp1002 is CRITICAL: CRITICAL: puppet fail
[18:54:49] <icinga-wm>	 PROBLEM - puppet last run on wdqs1001 is CRITICAL: CRITICAL: puppet fail
[18:54:50] <icinga-wm>	 PROBLEM - puppet last run on ms-be2016 is CRITICAL: CRITICAL: puppet fail
[18:54:58] <icinga-wm>	 PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: puppet fail
[18:54:59] <icinga-wm>	 PROBLEM - puppet last run on mw1134 is CRITICAL: CRITICAL: puppet fail
[18:55:09] <icinga-wm>	 PROBLEM - puppet last run on mw2027 is CRITICAL: CRITICAL: puppet fail
[18:55:09] <icinga-wm>	 PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: puppet fail
[18:55:10] <icinga-wm>	 PROBLEM - puppet last run on mw2068 is CRITICAL: CRITICAL: puppet fail
[18:55:30] <icinga-wm>	 PROBLEM - puppet last run on db1010 is CRITICAL: CRITICAL: puppet fail
[18:55:38] <icinga-wm>	 PROBLEM - puppet last run on mw1116 is CRITICAL: CRITICAL: puppet fail
[18:55:38] <icinga-wm>	 PROBLEM - puppet last run on mw1245 is CRITICAL: CRITICAL: puppet fail
[18:55:39] <icinga-wm>	 PROBLEM - puppet last run on magnesium is CRITICAL: CRITICAL: puppet fail
[18:55:49] <icinga-wm>	 PROBLEM - puppet last run on ms-be1004 is CRITICAL: CRITICAL: puppet fail
[18:55:55] <SPF|Cloud>	 hm?
[18:55:58] <icinga-wm>	 PROBLEM - puppet last run on mw2091 is CRITICAL: CRITICAL: puppet fail
[18:56:00] <icinga-wm>	 PROBLEM - puppet last run on mw2026 is CRITICAL: CRITICAL: puppet fail
[18:56:19] <icinga-wm>	 PROBLEM - puppet last run on labstore2001 is CRITICAL: CRITICAL: puppet fail
[18:56:28] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1003 is CRITICAL: CRITICAL: puppet fail
[18:56:29] <icinga-wm>	 PROBLEM - puppet last run on mw2209 is CRITICAL: CRITICAL: puppet fail
[18:56:31] <icinga-wm>	 PROBLEM - puppet last run on mw1001 is CRITICAL: CRITICAL: puppet fail
[18:56:31] <icinga-wm>	 PROBLEM - puppet last run on dbstore1002 is CRITICAL: CRITICAL: puppet fail
[18:56:38] <icinga-wm>	 PROBLEM - puppet last run on mw1045 is CRITICAL: CRITICAL: puppet fail
[18:56:39] <icinga-wm>	 PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: puppet fail
[18:56:48] <icinga-wm>	 PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail
[18:56:49] <icinga-wm>	 PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: puppet fail
[18:56:58] <icinga-wm>	 PROBLEM - puppet last run on graphite1002 is CRITICAL: CRITICAL: puppet fail
[18:56:59] <icinga-wm>	 PROBLEM - puppet last run on mc2004 is CRITICAL: CRITICAL: puppet fail
[18:57:08] <icinga-wm>	 PROBLEM - puppet last run on ms-be1019 is CRITICAL: CRITICAL: puppet fail
[18:57:08] <icinga-wm>	 PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: puppet fail
[18:57:18] <icinga-wm>	 PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: puppet fail
[18:57:18] <icinga-wm>	 PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: puppet fail
[18:57:18] <icinga-wm>	 PROBLEM - puppet last run on cp3019 is CRITICAL: CRITICAL: puppet fail
[18:57:19] <icinga-wm>	 PROBLEM - puppet last run on logstash1005 is CRITICAL: CRITICAL: puppet fail
[18:57:19] <icinga-wm>	 PROBLEM - puppet last run on wdqs1002 is CRITICAL: CRITICAL: puppet fail
[18:57:28] <icinga-wm>	 PROBLEM - puppet last run on mc1006 is CRITICAL: CRITICAL: puppet fail
[18:57:39] <icinga-wm>	 PROBLEM - puppet last run on db2050 is CRITICAL: CRITICAL: puppet fail
[18:57:39] <icinga-wm>	 PROBLEM - puppet last run on mw2057 is CRITICAL: CRITICAL: puppet fail
[18:57:41] <icinga-wm>	 PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: puppet fail
[18:57:41] <icinga-wm>	 PROBLEM - puppet last run on mc1016 is CRITICAL: CRITICAL: puppet fail
[18:57:48] <icinga-wm>	 PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: puppet fail
[18:57:49] <icinga-wm>	 PROBLEM - puppet last run on mw1093 is CRITICAL: CRITICAL: puppet fail
[18:57:58] <icinga-wm>	 PROBLEM - puppet last run on mw2118 is CRITICAL: CRITICAL: puppet fail
[18:57:58] <icinga-wm>	 PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: puppet fail
[18:57:58] <icinga-wm>	 PROBLEM - puppet last run on wtp1006 is CRITICAL: CRITICAL: puppet fail
[18:57:59] <icinga-wm>	 PROBLEM - puppet last run on es2005 is CRITICAL: CRITICAL: puppet fail
[18:57:59] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1008 is CRITICAL: CRITICAL: puppet fail
[18:58:00] <icinga-wm>	 PROBLEM - puppet last run on ganeti2003 is CRITICAL: CRITICAL: puppet fail
[18:58:00] <icinga-wm>	 PROBLEM - puppet last run on analytics1053 is CRITICAL: CRITICAL: puppet fail
[18:58:08] <icinga-wm>	 PROBLEM - puppet last run on kafka1018 is CRITICAL: CRITICAL: puppet fail
[18:58:08] <icinga-wm>	 PROBLEM - puppet last run on rdb2003 is CRITICAL: CRITICAL: puppet fail
[18:58:10] <icinga-wm>	 PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: puppet fail
[18:58:10] <icinga-wm>	 PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: puppet fail
[18:58:18] <icinga-wm>	 PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: puppet fail
[18:58:18] <icinga-wm>	 PROBLEM - puppet last run on mw1219 is CRITICAL: CRITICAL: puppet fail
[18:58:19] <icinga-wm>	 PROBLEM - puppet last run on mc1018 is CRITICAL: CRITICAL: puppet fail
[18:58:29] <icinga-wm>	 PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: puppet fail
[18:58:30] <icinga-wm>	 PROBLEM - puppet last run on mc2001 is CRITICAL: CRITICAL: puppet fail
[18:58:31] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: puppet fail
[18:58:31] <icinga-wm>	 PROBLEM - puppet last run on mw1112 is CRITICAL: CRITICAL: puppet fail
[18:58:31] <icinga-wm>	 PROBLEM - puppet last run on mw2141 is CRITICAL: CRITICAL: puppet fail
[18:58:31] <icinga-wm>	 PROBLEM - puppet last run on cp2002 is CRITICAL: CRITICAL: puppet fail
[18:58:31] <icinga-wm>	 PROBLEM - puppet last run on wtp1016 is CRITICAL: CRITICAL: puppet fail
[18:58:32] <icinga-wm>	 PROBLEM - puppet last run on elastic1012 is CRITICAL: CRITICAL: puppet fail
[18:58:32] <icinga-wm>	 PROBLEM - puppet last run on dubnium is CRITICAL: CRITICAL: puppet fail
[18:58:32] <icinga-wm>	 RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[18:58:38] <icinga-wm>	 PROBLEM - puppet last run on mw1090 is CRITICAL: CRITICAL: puppet fail
[18:58:39] <icinga-wm>	 PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: puppet fail
[18:58:39] <icinga-wm>	 PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: puppet fail
[18:58:39] <icinga-wm>	 PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: puppet fail
[18:58:59] <icinga-wm>	 PROBLEM - puppet last run on mc1002 is CRITICAL: CRITICAL: puppet fail
[18:58:59] <icinga-wm>	 PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: puppet fail
[18:58:59] <icinga-wm>	 PROBLEM - puppet last run on mc2005 is CRITICAL: CRITICAL: puppet fail
[18:59:08] <icinga-wm>	 PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: puppet fail
[18:59:18] <icinga-wm>	 PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: puppet fail
[18:59:19] <icinga-wm>	 PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: puppet fail
[18:59:29] <icinga-wm>	 PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: puppet fail
[18:59:39] <icinga-wm>	 PROBLEM - puppet last run on sca1002 is CRITICAL: CRITICAL: puppet fail
[18:59:39] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: puppet fail
[18:59:39] <icinga-wm>	 PROBLEM - puppet last run on db2058 is CRITICAL: CRITICAL: puppet fail
[18:59:49] <icinga-wm>	 PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: puppet fail
[18:59:49] <icinga-wm>	 PROBLEM - puppet last run on db2060 is CRITICAL: CRITICAL: puppet fail
[18:59:59] <icinga-wm>	 PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: puppet fail
[18:59:59] <icinga-wm>	 PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: puppet fail
[19:00:00] <icinga-wm>	 PROBLEM - puppet last run on wtp1008 is CRITICAL: CRITICAL: puppet fail
[19:00:08] <icinga-wm>	 PROBLEM - puppet last run on mc1017 is CRITICAL: CRITICAL: puppet fail
[19:00:09] <icinga-wm>	 PROBLEM - puppet last run on aqs1002 is CRITICAL: CRITICAL: puppet fail
[19:00:18] <icinga-wm>	 PROBLEM - puppet last run on mc2015 is CRITICAL: CRITICAL: puppet fail
[19:00:19] <icinga-wm>	 PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: puppet fail
[19:00:19] <icinga-wm>	 PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: puppet fail
[19:00:28] <icinga-wm>	 PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: puppet fail
[19:00:29] <icinga-wm>	 PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: puppet fail
[19:00:38] <icinga-wm>	 PROBLEM - puppet last run on wtp2017 is CRITICAL: CRITICAL: puppet fail
[19:00:39] <icinga-wm>	 PROBLEM - puppet last run on wtp2015 is CRITICAL: CRITICAL: puppet fail
[19:01:00] <icinga-wm>	 PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: puppet fail
[19:01:10] <icinga-wm>	 PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail
[19:01:10] <icinga-wm>	 PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: puppet fail
[19:01:10] <icinga-wm>	 PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: puppet fail
[19:01:18] <icinga-wm>	 PROBLEM - puppet last run on mw2077 is CRITICAL: CRITICAL: puppet fail
[19:01:19] <icinga-wm>	 PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: puppet fail
[19:01:28] <icinga-wm>	 PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: puppet fail
[19:01:29] <Reedy>	 Has apache died on the puppetmaster again?
[19:01:59] <icinga-wm>	 PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: puppet fail
[19:03:08] <SPF|Cloud>	 Reedy: could be, although config-master.wm.o still seems up
[19:03:14] <andrewbogott>	 Reedy: whatever it is seems to have passed
[19:03:40] <andrewbogott>	 Reedy: oh, I know what it is — I merged a syntax error in hiera and then reverted.
[19:03:43] <andrewbogott>	 Jenkins didn’t catch it, oddly
[19:04:18] <icinga-wm>	 RECOVERY - puppet last run on mc1017 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[19:17:39] <icinga-wm>	 RECOVERY - puppet last run on restbase1002 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[19:19:58] <icinga-wm>	 RECOVERY - puppet last run on db2001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[19:20:38] <icinga-wm>	 RECOVERY - puppet last run on labvirt1005 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[19:20:40] <icinga-wm>	 RECOVERY - puppet last run on wtp1002 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[19:20:40] <icinga-wm>	 RECOVERY - puppet last run on wdqs1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[19:20:49] <icinga-wm>	 RECOVERY - puppet last run on db1062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:21:19] <icinga-wm>	 RECOVERY - puppet last run on db1010 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[19:21:28] <icinga-wm>	 RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[19:21:29] <icinga-wm>	 RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[19:22:08] <icinga-wm>	 RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:22:18] <icinga-wm>	 RECOVERY - puppet last run on wtp1013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:22:19] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1003 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[19:22:19] <icinga-wm>	 RECOVERY - puppet last run on labstore2001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[19:22:20] <icinga-wm>	 RECOVERY - puppet last run on wtp2009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:22:48] <icinga-wm>	 RECOVERY - puppet last run on ms-be2016 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[19:22:49] <icinga-wm>	 RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[19:22:58] <icinga-wm>	 RECOVERY - puppet last run on mc2004 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[19:22:59] <icinga-wm>	 RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:23:08] <icinga-wm>	 RECOVERY - puppet last run on mw2027 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[19:23:09] <icinga-wm>	 RECOVERY - puppet last run on mw2068 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[19:23:09] <icinga-wm>	 RECOVERY - puppet last run on logstash1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:23:09] <icinga-wm>	 RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:23:10] <icinga-wm>	 RECOVERY - puppet last run on wdqs1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:23:19] <icinga-wm>	 RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:23:28] <icinga-wm>	 RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:23:29] <icinga-wm>	 RECOVERY - puppet last run on magnesium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:23:29] <icinga-wm>	 RECOVERY - puppet last run on db2050 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[19:23:29] <icinga-wm>	 RECOVERY - puppet last run on mc1016 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[19:23:38] <icinga-wm>	 RECOVERY - puppet last run on mw2057 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[19:23:40] <icinga-wm>	 RECOVERY - puppet last run on ms-be1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:23:49] <icinga-wm>	 RECOVERY - puppet last run on wtp1006 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[19:23:49] <icinga-wm>	 RECOVERY - puppet last run on mw2091 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[19:23:50] <icinga-wm>	 RECOVERY - puppet last run on es2005 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[19:23:58] <icinga-wm>	 RECOVERY - puppet last run on analytics1053 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[19:23:59] <icinga-wm>	 RECOVERY - puppet last run on mw2026 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:23:59] <icinga-wm>	 RECOVERY - puppet last run on ganeti2003 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[19:23:59] <icinga-wm>	 RECOVERY - puppet last run on kafka1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:24:08] <icinga-wm>	 RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[19:24:09] <icinga-wm>	 RECOVERY - puppet last run on mw1219 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[19:24:09] <icinga-wm>	 RECOVERY - puppet last run on mc1018 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[19:24:19] <icinga-wm>	 RECOVERY - puppet last run on mc2001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[19:24:19] <icinga-wm>	 RECOVERY - puppet last run on mw1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[19:24:20] <icinga-wm>	 RECOVERY - puppet last run on dubnium is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures
[19:24:28] <icinga-wm>	 RECOVERY - puppet last run on dbstore1002 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[19:24:28] <icinga-wm>	 RECOVERY - puppet last run on mw2209 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:24:29] <icinga-wm>	 RECOVERY - puppet last run on mw1045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:24:38] <icinga-wm>	 RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:24:40] <icinga-wm>	 RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[19:24:40] <icinga-wm>	 RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:24:49] <icinga-wm>	 RECOVERY - puppet last run on mc1002 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[19:24:49] <icinga-wm>	 RECOVERY - puppet last run on graphite1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[19:24:58] <icinga-wm>	 RECOVERY - puppet last run on ms-be1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:24:58] <icinga-wm>	 RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[19:24:59] <icinga-wm>	 RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures
[19:25:09] <icinga-wm>	 RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:25:09] <icinga-wm>	 RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:25:10] <icinga-wm>	 RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[19:25:10] <icinga-wm>	 RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[19:25:10] <icinga-wm>	 RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:25:19] <icinga-wm>	 RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[19:25:28] <icinga-wm>	 RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[19:25:39] <icinga-wm>	 RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[19:25:40] <icinga-wm>	 RECOVERY - puppet last run on mw1093 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:25:48] <icinga-wm>	 RECOVERY - puppet last run on db2060 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[19:25:49] <icinga-wm>	 RECOVERY - puppet last run on mw2118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:25:49] <icinga-wm>	 RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:25:49] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:25:50] <icinga-wm>	 RECOVERY - puppet last run on wtp1008 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[19:25:59] <icinga-wm>	 RECOVERY - puppet last run on rdb2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:25:59] <icinga-wm>	 RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:00] <icinga-wm>	 RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:00] <icinga-wm>	 RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[19:26:09] <icinga-wm>	 RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:18] <icinga-wm>	 RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[19:26:18] <icinga-wm>	 RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[19:26:19] <icinga-wm>	 RECOVERY - puppet last run on mw1112 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:19] <icinga-wm>	 RECOVERY - puppet last run on wtp1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:20] <icinga-wm>	 RECOVERY - puppet last run on elastic1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:28] <icinga-wm>	 RECOVERY - puppet last run on mw2141 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[19:26:28] <icinga-wm>	 RECOVERY - puppet last run on cp2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:28] <icinga-wm>	 RECOVERY - puppet last run on wtp2017 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[19:26:28] <icinga-wm>	 RECOVERY - puppet last run on mw1090 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:38] <icinga-wm>	 RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[19:26:38] <icinga-wm>	 RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[19:26:49] <icinga-wm>	 RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[19:26:50] <icinga-wm>	 RECOVERY - puppet last run on db1056 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[19:26:50] <icinga-wm>	 RECOVERY - puppet last run on mc2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:26:59] <icinga-wm>	 RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[19:27:09] <icinga-wm>	 RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[19:27:29] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[19:27:38] <icinga-wm>	 RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:27:38] <icinga-wm>	 RECOVERY - puppet last run on db2058 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:27:40] <icinga-wm>	 RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:27:49] <icinga-wm>	 RECOVERY - puppet last run on mw1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:27:59] <icinga-wm>	 RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[19:28:00] <icinga-wm>	 RECOVERY - puppet last run on aqs1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:28:09] <icinga-wm>	 RECOVERY - puppet last run on mc2015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:28:19] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:28:19] <icinga-wm>	 RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:28:29] <icinga-wm>	 RECOVERY - puppet last run on wtp2015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:28:38] <icinga-wm>	 RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:28:49] <icinga-wm>	 RECOVERY - puppet last run on mw1039 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures
[19:28:50] <icinga-wm>	 RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[19:28:58] <icinga-wm>	 RECOVERY - puppet last run on mw2146 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[19:29:00] <icinga-wm>	 RECOVERY - puppet last run on mw2077 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[19:29:49] <icinga-wm>	 RECOVERY - puppet last run on mw2120 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[19:59:17] <grrrit-wm>	 (03PS7) 10Andrew Bogott: WIP: nova-network: have dnsmasq advertise the network host as a tftp server [puppet] - 10https://gerrit.wikimedia.org/r/259788 
[19:59:19] <grrrit-wm>	 (03PS7) 10Andrew Bogott: Insert dns entries for labs bare-metal systems. [puppet] - 10https://gerrit.wikimedia.org/r/260037 
[19:59:21] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Set up special dhcp behavior for bare-metal boxes [puppet] - 10https://gerrit.wikimedia.org/r/260253 
[20:01:11] <grrrit-wm>	 (03PS8) 10Andrew Bogott: WIP: nova-network: have dnsmasq advertise the network host as a tftp server [puppet] - 10https://gerrit.wikimedia.org/r/259788 
[20:01:13] <grrrit-wm>	 (03PS2) 10Andrew Bogott: Set up special dhcp behavior for bare-metal boxes [puppet] - 10https://gerrit.wikimedia.org/r/260253 
[20:01:15] <grrrit-wm>	 (03PS8) 10Andrew Bogott: Insert dns entries for labs bare-metal systems. [puppet] - 10https://gerrit.wikimedia.org/r/260037 
[20:01:17] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Set up special dhcp behavior for bare-metal boxes [puppet] - 10https://gerrit.wikimedia.org/r/260253 (owner: 10Andrew Bogott)
[20:01:44] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] WIP: nova-network: have dnsmasq advertise the network host as a tftp server [puppet] - 10https://gerrit.wikimedia.org/r/259788 (owner: 10Andrew Bogott)
[20:03:02] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Set up special dhcp behavior for bare-metal boxes [puppet] - 10https://gerrit.wikimedia.org/r/260253 (owner: 10Andrew Bogott)
[20:25:59] <icinga-wm>	 PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago
[20:29:33] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 031] "seems fine to me on deployment-bastion" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260242 (owner: 10Reedy)
[20:30:24] <grrrit-wm>	 (03CR) 10Reedy: "In theory, it should be removed from CommonSettings, maybe. But there'll be a reason it's in both (it didn't work in one?)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260242 (owner: 10Reedy)
[20:34:39] <icinga-wm>	 RECOVERY - RAID on dataset1001 is OK: OK: optimal, 3 logical, 36 physical
[21:19:59] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1005 is CRITICAL: PYBAL CRITICAL - ocg_8000 - Could not depool server ocg1002.eqiad.wmnet because of too many down!
[21:20:00] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1008 is CRITICAL: PYBAL CRITICAL - ocg_8000 - Could not depool server ocg1001.eqiad.wmnet because of too many down!
[21:23:49] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1005 is OK: PYBAL OK - All pools are healthy
[21:23:58] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1008 is OK: PYBAL OK - All pools are healthy
[21:56:40] <icinga-wm>	 PROBLEM - Disk space on restbase1003 is CRITICAL: DISK CRITICAL - free space: /var 111411 MB (3% inode=99%)
[22:00:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:05:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:10:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:15:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:16:15] <Krenair>	 frack
[22:16:35] <Krenair>	 codfw
[22:20:09] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:25:09] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:30:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:35:09] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:40:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:45:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:50:09] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[22:55:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:00:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:05:09] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:10:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:15:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:20:10] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:24:25] <Reedy>	 !log Katie and Jeff paged about bellatrix
[23:24:29] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:25:07] <gwicke>	 keeping an eye on the restbase node's disk space
[23:25:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:25:41] <wikibugs>	 6operations, 10fundraising-tech-ops: bellatrix raid predictive failure - https://phabricator.wikimedia.org/T122026#1894061 (10Reedy) 3NEW
[23:28:18] <icinga-wm>	 PROBLEM - puppet last run on mw2069 is CRITICAL: CRITICAL: puppet fail
[23:30:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:35:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:40:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:45:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:50:09] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:55:08] <icinga-wm>	 PROBLEM - check_raid on bellatrix is CRITICAL: CRITICAL: HPSA [P420i/slot0: OK, log_1: 16.4TB,RAID6 OK, phy_1I:1:12: Predictive Failure]
[23:55:39] <icinga-wm>	 RECOVERY - puppet last run on mw2069 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures