[00:19:54] <icinga-wm>	 PROBLEM - puppet last run on mw2082 is CRITICAL: CRITICAL: puppet fail
[00:45:53] <icinga-wm>	 RECOVERY - puppet last run on mw2082 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures
[00:53:22] <icinga-wm>	 PROBLEM - puppet last run on mw2152 is CRITICAL: CRITICAL: puppet fail
[01:20:24] <icinga-wm>	 RECOVERY - puppet last run on mw2152 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[01:42:08] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [1000.0]
[01:45:29] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [1000.0]
[02:13:47] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[02:15:27] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[02:22:53] <logmsgbot>	 !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 09m 47s)
[02:22:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:29:49] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Dec 25 02:29:49 UTC 2015 (duration 6m 56s)
[02:29:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:51:29] <wikibugs>	 6operations, 6Performance-Team: jobrunner memory leaks - https://phabricator.wikimedia.org/T122069#1903886 (10Krinkle) >>! In T122069#1895995, @aaron wrote: > The first change only effected JobChron. From http://graphite.wikimedia.org/render/?width=1887&height=960&_salt=1450730658.495&target=jobrunner.memory.*...
[03:37:29] <grrrit-wm>	 (03PS2) 10KartikMistry: CX: Use config.yaml to read registry [puppet] - 10https://gerrit.wikimedia.org/r/260575 
[04:04:00] <grrrit-wm>	 (03PS3) 10KartikMistry: CX: Use config.yaml to read registry [puppet] - 10https://gerrit.wikimedia.org/r/260575 
[04:11:52] <icinga-wm>	 PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [100000000.0]
[04:14:32] <icinga-wm>	 PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail
[04:37:26] <icinga-wm>	 RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]
[04:42:17] <icinga-wm>	 RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:30:01] <icinga-wm>	 PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: puppet fail
[06:30:40] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:31] <icinga-wm>	 PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:40] <icinga-wm>	 PROBLEM - puppet last run on db2060 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:40] <icinga-wm>	 PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:51] <icinga-wm>	 PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 2 failures
[06:32:35] <icinga-wm>	 PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:43] <icinga-wm>	 PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:54] <icinga-wm>	 PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:35:03] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:35:23] <icinga-wm>	 PROBLEM - puppet last run on analytics1021 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:35:53] <icinga-wm>	 PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:36:53] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:38:33] <icinga-wm>	 PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:55:35] <icinga-wm>	 RECOVERY - puppet last run on db2060 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures
[06:55:54] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[06:56:03] <icinga-wm>	 RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[06:56:44] <icinga-wm>	 RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[06:57:04] <icinga-wm>	 RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[06:57:14] <icinga-wm>	 RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:14] <icinga-wm>	 RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:15] <icinga-wm>	 RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:34] <icinga-wm>	 RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[06:57:34] <icinga-wm>	 RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:14] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:14] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:34] <icinga-wm>	 RECOVERY - puppet last run on analytics1021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:03:10] <icinga-wm>	 RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:35:53] <icinga-wm>	 PROBLEM - puppet last run on mw2166 is CRITICAL: CRITICAL: puppet fail
[07:59:14] <icinga-wm>	 RECOVERY - puppet last run on mw2166 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[08:40:56] <icinga-wm>	 PROBLEM - Host cp4007 is DOWN: PING CRITICAL - Packet loss = 100%
[08:43:36] <icinga-wm>	 PROBLEM - IPsec on cp1049 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:43:36] <icinga-wm>	 PROBLEM - IPsec on cp1074 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:43:36] <icinga-wm>	 PROBLEM - IPsec on cp1099 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:43:45] <icinga-wm>	 PROBLEM - IPsec on cp1061 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:43:45] <icinga-wm>	 PROBLEM - IPsec on kafka1020 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp4007_v4, cp4007_v6
[08:43:47] <icinga-wm>	 PROBLEM - IPsec on kafka1014 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp4007_v4, cp4007_v6
[08:44:06] <icinga-wm>	 PROBLEM - IPsec on cp1063 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:44:06] <icinga-wm>	 PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:44:06] <icinga-wm>	 PROBLEM - IPsec on cp1050 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:44:17] <icinga-wm>	 PROBLEM - IPsec on cp1062 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:44:26] <icinga-wm>	 PROBLEM - IPsec on kafka1013 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp4007_v4, cp4007_v6
[08:44:35] <icinga-wm>	 PROBLEM - IPsec on cp1048 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:44:36] <icinga-wm>	 PROBLEM - IPsec on cp1051 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:44:55] <icinga-wm>	 PROBLEM - IPsec on cp1064 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:44:56] <icinga-wm>	 PROBLEM - IPsec on cp1072 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[08:44:56] <icinga-wm>	 PROBLEM - IPsec on kafka1012 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp4007_v4, cp4007_v6
[08:44:56] <icinga-wm>	 PROBLEM - IPsec on kafka1022 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp4007_v4, cp4007_v6
[08:44:56] <icinga-wm>	 PROBLEM - IPsec on kafka1018 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp4007_v4, cp4007_v6
[08:45:16] <icinga-wm>	 PROBLEM - IPsec on cp1071 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp4007_v4, cp4007_v6
[09:15:01] <jynus>	 cp4007 is unresponsive to ssh, ping, management console, trying to powrcycle now
[09:15:16] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 662
[09:17:18] <jynus>	 !log powercycling cp4007 (unresponsive to ssh, ping, serial console)
[09:17:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:19:16] <icinga-wm>	 RECOVERY - Host cp4007 is UP: PING OK - Packet loss = 0%, RTA = 79.72 ms
[09:19:26] <icinga-wm>	 RECOVERY - IPsec on cp1061 is OK: Strongswan OK - 60 ESP OK
[09:19:37] <icinga-wm>	 RECOVERY - IPsec on cp1071 is OK: Strongswan OK - 60 ESP OK
[09:19:38] <icinga-wm>	 RECOVERY - IPsec on cp1063 is OK: Strongswan OK - 60 ESP OK
[09:19:38] <icinga-wm>	 RECOVERY - IPsec on cp1062 is OK: Strongswan OK - 60 ESP OK
[09:19:56] <icinga-wm>	 RECOVERY - IPsec on kafka1013 is OK: Strongswan OK - 166 ESP OK
[09:19:56] <icinga-wm>	 RECOVERY - IPsec on cp1049 is OK: Strongswan OK - 60 ESP OK
[09:19:57] <icinga-wm>	 RECOVERY - IPsec on cp1074 is OK: Strongswan OK - 60 ESP OK
[09:19:57] <icinga-wm>	 RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 60 ESP OK
[09:20:06] <icinga-wm>	 RECOVERY - IPsec on cp1048 is OK: Strongswan OK - 60 ESP OK
[09:20:17] <icinga-wm>	 RECOVERY - IPsec on kafka1014 is OK: Strongswan OK - 166 ESP OK
[09:20:17] <icinga-wm>	 RECOVERY - IPsec on kafka1020 is OK: Strongswan OK - 166 ESP OK
[09:20:17] <icinga-wm>	 RECOVERY - IPsec on cp1099 is OK: Strongswan OK - 60 ESP OK
[09:20:17] <icinga-wm>	 RECOVERY - IPsec on cp1051 is OK: Strongswan OK - 60 ESP OK
[09:20:27] <icinga-wm>	 RECOVERY - IPsec on cp1050 is OK: Strongswan OK - 60 ESP OK
[09:20:57] <icinga-wm>	 RECOVERY - IPsec on cp1064 is OK: Strongswan OK - 60 ESP OK
[09:21:06] <icinga-wm>	 RECOVERY - IPsec on cp1072 is OK: Strongswan OK - 60 ESP OK
[09:21:07] <icinga-wm>	 RECOVERY - IPsec on kafka1022 is OK: Strongswan OK - 166 ESP OK
[09:21:07] <icinga-wm>	 RECOVERY - IPsec on kafka1012 is OK: Strongswan OK - 166 ESP OK
[09:21:07] <icinga-wm>	 RECOVERY - IPsec on kafka1018 is OK: Strongswan OK - 166 ESP OK
[09:26:47] <wikibugs>	 6operations, 6Performance-Team: jobrunner memory leaks - https://phabricator.wikimedia.org/T122069#1903975 (10jcrespo) I do not know if the test is over (but puppet is still disabled). If that is not the case, mw1015 has just freezed again right now (presumably because of OOM).
[09:27:28] <icinga-wm>	 PROBLEM - configured eth on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:27:37] <icinga-wm>	 PROBLEM - dhclient process on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:27:38] <icinga-wm>	 PROBLEM - nutcracker port on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:27:47] <icinga-wm>	 PROBLEM - RAID on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:27:57] <icinga-wm>	 PROBLEM - nutcracker process on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:28:07] <icinga-wm>	 PROBLEM - Disk space on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:28:17] <icinga-wm>	 PROBLEM - SSH on mw1015 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:28:36] <icinga-wm>	 PROBLEM - salt-minion processes on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:28:57] <icinga-wm>	 PROBLEM - DPKG on mw1015 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:29:26] <icinga-wm>	 RECOVERY - configured eth on mw1015 is OK: OK - interfaces up
[09:29:27] <icinga-wm>	 RECOVERY - dhclient process on mw1015 is OK: PROCS OK: 0 processes with command name dhclient
[09:29:36] <icinga-wm>	 RECOVERY - nutcracker port on mw1015 is OK: TCP OK - 0.000 second response time on port 11212
[09:29:37] <icinga-wm>	 RECOVERY - RAID on mw1015 is OK: OK: no RAID installed
[09:29:56] <icinga-wm>	 RECOVERY - nutcracker process on mw1015 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[09:30:06] <icinga-wm>	 RECOVERY - Disk space on mw1015 is OK: DISK OK
[09:30:16] <icinga-wm>	 RECOVERY - check_mysql on db1008 is OK: Uptime: 320003 Threads: 119 Questions: 14916576 Slow queries: 4026 Opens: 20467 Flush tables: 2 Open tables: 409 Queries per second avg: 46.613 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[09:30:16] <icinga-wm>	 RECOVERY - SSH on mw1015 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0)
[09:30:28] <icinga-wm>	 RECOVERY - salt-minion processes on mw1015 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[09:30:57] <icinga-wm>	 RECOVERY - DPKG on mw1015 is OK: All packages OK
[09:45:19] <icinga-wm>	 PROBLEM - Host cp3010 is DOWN: PING CRITICAL - Packet loss = 100%
[09:48:20] <icinga-wm>	 PROBLEM - IPsec on kafka1018 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp3010_v4, cp3010_v6
[09:48:30] <icinga-wm>	 PROBLEM - IPsec on cp1053 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp3010_v4, cp3010_v6
[09:48:50] <icinga-wm>	 PROBLEM - IPsec on cp1066 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp3010_v4, cp3010_v6
[09:48:51] <icinga-wm>	 PROBLEM - IPsec on cp1054 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp3010_v4, cp3010_v6
[09:49:01] <icinga-wm>	 PROBLEM - IPsec on cp1052 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp3010_v4, cp3010_v6
[09:49:01] <icinga-wm>	 PROBLEM - IPsec on cp1065 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp3010_v4, cp3010_v6
[09:49:10] <icinga-wm>	 PROBLEM - IPsec on kafka1013 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp3010_v4, cp3010_v6
[09:50:13] <icinga-wm>	 PROBLEM - IPsec on cp1067 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp3010_v4, cp3010_v6
[09:50:14] <icinga-wm>	 PROBLEM - IPsec on kafka1022 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp3010_v4, cp3010_v6
[09:50:14] <icinga-wm>	 PROBLEM - IPsec on kafka1012 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp3010_v4, cp3010_v6
[09:50:15] <icinga-wm>	 PROBLEM - IPsec on kafka1014 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp3010_v4, cp3010_v6
[09:50:15] <icinga-wm>	 PROBLEM - IPsec on kafka1020 is CRITICAL: Strongswan CRITICAL - ok: 164 not-conn: cp3010_v4, cp3010_v6
[09:50:29] <icinga-wm>	 PROBLEM - IPsec on cp1068 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp3010_v4, cp3010_v6
[09:50:30] <icinga-wm>	 PROBLEM - IPsec on cp1055 is CRITICAL: Strongswan CRITICAL - ok: 56 not-conn: cp3010_v4, cp3010_v6
[09:57:31] <icinga-wm>	 PROBLEM - NTP on cp4007 is CRITICAL: NTP CRITICAL: Offset unknown
[10:25:17] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 682
[10:31:24] <grrrit-wm>	 (03CR) 10Muehlenhoff: "Puppet compiler bails on this, but I'm not sure why (it looks ok to me)?" [puppet] - 10https://gerrit.wikimedia.org/r/260926 (https://phabricator.wikimedia.org/T122396) (owner: 10Faidon Liambotis)
[10:50:14] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 699
[10:51:14] <jynus>	 !log powercycle cp3010
[10:51:19] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:53:54] <icinga-wm>	 RECOVERY - Host cp3010 is UP: PING OK - Packet loss = 0%, RTA = 89.32 ms
[10:54:14] <icinga-wm>	 RECOVERY - IPsec on cp1055 is OK: Strongswan OK - 58 ESP OK
[10:54:14] <icinga-wm>	 RECOVERY - IPsec on cp1068 is OK: Strongswan OK - 58 ESP OK
[10:54:25] <icinga-wm>	 RECOVERY - IPsec on kafka1020 is OK: Strongswan OK - 166 ESP OK
[10:54:35] <icinga-wm>	 RECOVERY - IPsec on cp1065 is OK: Strongswan OK - 58 ESP OK
[10:54:35] <icinga-wm>	 RECOVERY - IPsec on kafka1012 is OK: Strongswan OK - 166 ESP OK
[10:54:45] <icinga-wm>	 RECOVERY - IPsec on cp1054 is OK: Strongswan OK - 58 ESP OK
[10:54:55] <icinga-wm>	 RECOVERY - IPsec on kafka1022 is OK: Strongswan OK - 166 ESP OK
[10:55:04] <icinga-wm>	 RECOVERY - IPsec on kafka1013 is OK: Strongswan OK - 166 ESP OK
[10:55:05] <icinga-wm>	 RECOVERY - IPsec on cp1053 is OK: Strongswan OK - 58 ESP OK
[10:55:05] <icinga-wm>	 RECOVERY - IPsec on kafka1018 is OK: Strongswan OK - 166 ESP OK
[10:55:06] <icinga-wm>	 RECOVERY - IPsec on cp1066 is OK: Strongswan OK - 58 ESP OK
[10:55:14] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 875
[10:55:24] <icinga-wm>	 RECOVERY - IPsec on cp1052 is OK: Strongswan OK - 58 ESP OK
[10:55:35] <icinga-wm>	 RECOVERY - IPsec on cp1067 is OK: Strongswan OK - 58 ESP OK
[10:55:45] <icinga-wm>	 RECOVERY - IPsec on kafka1014 is OK: Strongswan OK - 166 ESP OK
[11:00:14] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1176
[11:05:08] <wikibugs>	 6operations, 10DBA, 7Tracking: Migrate MySQLs to use ROW-based replication (tracking) - https://phabricator.wikimedia.org/T109179#1904049 (10jcrespo)
[11:05:13] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1476
[11:10:13] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 770
[11:20:13] <icinga-wm>	 RECOVERY - check_mysql on db1008 is OK: Uptime: 326602 Threads: 117 Questions: 15035994 Slow queries: 4207 Opens: 21586 Flush tables: 2 Open tables: 410 Queries per second avg: 46.037 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[13:19:20] <jynus>	 !log setting db2018's binlog_format as MIXED
[13:19:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:22:20] <wikibugs>	 6operations, 10DBA, 10MediaWiki-Special-pages, 7Performance: Batch update of special pages creates slave lag on s3 over WAN - https://phabricator.wikimedia.org/T122429#1904079 (10jcrespo) I cannot say for sure if it is the Special pages or wbc_entity_usage updates, one of the two:  {F3146353}
[13:22:46] <wikibugs>	 6operations, 10DBA, 10MediaWiki-Special-pages, 7Performance: Batch update of special pages creates slave lag on s3 over WAN - https://phabricator.wikimedia.org/T122429#1904081 (10jcrespo) Setting db2018 as MIXED temporarily to see if that helps.
[13:27:13] <wikibugs>	 6operations, 10DBA, 10MediaWiki-Special-pages, 7Performance: Batch updated create slave lag on s3 over WAN - https://phabricator.wikimedia.org/T122429#1904082 (10jcrespo)
[13:32:34] <wikibugs>	 6operations, 10DBA, 10MediaWiki-Special-pages, 7Performance: Batch updates create slave lag on s3 over WAN - https://phabricator.wikimedia.org/T122429#1904087 (10jcrespo)
[13:32:43] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 04-1] "Good catch. I know why: inline_template() returns a string, not an array, so "each" in the erb fails. Either I need to convert that into p" [puppet] - 10https://gerrit.wikimedia.org/r/260926 (https://phabricator.wikimedia.org/T122396) (owner: 10Faidon Liambotis)
[14:09:17] <icinga-wm>	 PROBLEM - puppet last run on mw1043 is CRITICAL: CRITICAL: Puppet has 1 failures
[14:34:10] <icinga-wm>	 RECOVERY - puppet last run on mw1043 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[14:50:30] <icinga-wm>	 RECOVERY - NTP on cp4007 is OK: NTP OK: Offset 0.04382479191 secs
[15:50:17] <jynus>	 !log testing new mariadb packages on db2070
[15:50:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:51:15] <Coren>	 Feliz Navidad, jynus
[15:51:38] <jynus>	 merry christmas, Coren
[15:52:26] <jynus>	 y próspero año nuevo
[16:57:40] <grrrit-wm>	 (03PS1) 10Luke081515: dewikibooks: Set $wgRestrictDisplayTitle to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/260964 (https://phabricator.wikimedia.org/T122433) 
[20:09:11] <icinga-wm>	 PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Puppet has 1 failures
[20:35:52] <icinga-wm>	 RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[21:24:02] <icinga-wm>	 PROBLEM - Disk space on restbase1003 is CRITICAL: DISK CRITICAL - free space: /var 111195 MB (3% inode=99%)
[22:31:59] <Luke081515>	 Any phab admin here? It's urgent
[22:33:10] <Luke081515>	 Can solve it on my own
[23:03:27] <icinga-wm>	 PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: Puppet has 1 failures
[23:29:06] <icinga-wm>	 RECOVERY - puppet last run on mw2146 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[23:56:08] <icinga-wm>	 PROBLEM - puppet last run on mw2107 is CRITICAL: CRITICAL: Puppet has 1 failures