[00:24:57] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[00:26:57] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[00:40:56] <icinga-wm>	 PROBLEM - NTP on kraz is CRITICAL: NTP CRITICAL: No response from NTP server
[00:49:16] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[00:53:16] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[01:15:36] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[01:19:28] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[01:35:56] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[01:43:48] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[01:49:47] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[01:53:47] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[02:06:07] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[02:08:07] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[02:22:29] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 09m 57s)
[02:22:37] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:31:13] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun May 22 02:31:13 UTC 2016 (duration 8m 45s)
[02:31:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:36:16] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[02:48:07] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[03:08:17] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[03:12:17] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[03:24:07] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[03:26:07] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[03:36:17] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[03:40:16] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[04:00:47] <icinga-wm>	 PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Puppet has 1 failures
[04:02:17] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[04:08:16] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[04:14:08] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[04:14:34] <doctaxon>	 growing replication lag:
[04:14:36] <doctaxon>	                 "host": "db1056",
[04:14:36] <doctaxon>	                 "lag": 408
[04:14:42] <doctaxon>	 commons.wikimedia.org
[04:16:16] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[04:22:08] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[04:24:47] <icinga-wm>	 RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures
[04:36:18] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[04:42:17] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[04:46:18] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[04:49:57] <icinga-wm>	 PROBLEM - puppet last run on mw2009 is CRITICAL: CRITICAL: Puppet has 1 failures
[04:52:17] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[05:01:12] <wikibugs>	 06Operations, 10Wikimedia-IRC-RC-Server: Kraz (irc.wikimedia.org) has been flapping on IRC most of day - https://phabricator.wikimedia.org/T135930#2315645 (10Peachey88)
[05:10:37] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[05:14:16] <icinga-wm>	 RECOVERY - puppet last run on mw2009 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[05:16:37] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[05:25:05] <wikibugs>	 06Operations: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242#2259160 (10Peachey88) looks like {T135930} might be another possible case
[05:46:36] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[05:52:27] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[05:56:27] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[06:02:37] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[06:14:36] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[06:24:27] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[06:28:36] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[06:30:07] <icinga-wm>	 PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:17] <icinga-wm>	 PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:57] <icinga-wm>	 PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:27] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:46] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[06:35:37] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:44:37] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[06:50:48] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[06:54:50] <wikibugs>	 06Operations, 10Traffic, 07HTTPS: Secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548#2315697 (10Nemo_bis) "Secure redirect service" is grammatically unclear to me, I don't understand what is verb/noun/adjective. Does the summary just mean "Swit...
[06:56:18] <icinga-wm>	 RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[06:56:36] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[06:56:37] <icinga-wm>	 RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[06:57:37] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:17] <icinga-wm>	 RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:04:57] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[07:10:48] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[07:18:47] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[07:24:46] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[07:26:46] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[07:32:57] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[07:36:56] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[07:42:47] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[07:46:48] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[08:00:56] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[08:33:26] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[08:39:26] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[08:52:27] <icinga-wm>	 PROBLEM - Disk space on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:52:28] <icinga-wm>	 PROBLEM - Check size of conntrack table on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:52:36] <icinga-wm>	 PROBLEM - RAID on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:52:56] <icinga-wm>	 PROBLEM - configured eth on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:53:16] <icinga-wm>	 PROBLEM - puppet last run on planet2001 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[08:53:17] <icinga-wm>	 PROBLEM - DPKG on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:53:18] <icinga-wm>	 PROBLEM - dhclient process on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:53:37] <icinga-wm>	 PROBLEM - salt-minion processes on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:10:16] <icinga-wm>	 PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 611
[09:20:16] <icinga-wm>	 RECOVERY - check_mysql on lutetium is OK: Uptime: 994398 Threads: 1 Questions: 17096431 Slow queries: 14667 Opens: 99511 Flush tables: 2 Open tables: 64 Queries per second avg: 17.192 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[09:25:48] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[09:32:07] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[09:42:36] <icinga-wm>	 PROBLEM - NTP on planet2001 is CRITICAL: NTP CRITICAL: No response from NTP server
[09:46:47] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[09:47:47] <icinga-wm>	 PROBLEM - puppet last run on elastic1004 is CRITICAL: CRITICAL: Puppet has 1 failures
[09:48:47] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[09:58:07] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[10:08:18] <icinga-wm>	 PROBLEM - SSH on kraz is CRITICAL: Server answer
[10:14:06] <icinga-wm>	 RECOVERY - puppet last run on elastic1004 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[10:29:17] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[10:31:36] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[10:43:47] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[10:56:07] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:02:36] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[11:04:37] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:10:47] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[11:18:56] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:20:33] <wikibugs>	 06Operations, 10ops-eqiad: Rack and Set up new application servers mw1284-1306 - https://phabricator.wikimedia.org/T134309#2315876 (10Southparkfan) @Joe did you already install one of those servers? noticed https://ganglia.wikimedia.org/latest/?c=Application%20servers%20eqiad&h=mw1305.eqiad.wmnet&m=cpu_report&...
[11:25:38] <icinga-wm>	 PROBLEM - RAID on ms-be2012 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline)
[11:27:17] <icinga-wm>	 PROBLEM - Disk space on ms-be2012 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdk1 is not accessible: Input/output error
[11:35:36] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[11:41:46] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[11:45:16] <icinga-wm>	 PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: Puppet has 1 failures
[12:02:37] <icinga-wm>	 RECOVERY - Disk space on ms-be2012 is OK: DISK OK
[12:06:26] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[12:08:26] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[12:09:02] <wikibugs>	 06Operations, 10Traffic, 07HTTPS: Secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548#2315930 (10BBlack) It means a distinct service, running separately from our normal infrastructure for the canonical domains, which does nothing but handle the...
[12:30:48] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[12:34:48] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[13:18:04] <wikibugs>	 06Operations, 10DBA, 10MediaWiki-Database: Preserve InnoDB table auto_increment on restart - https://phabricator.wikimedia.org/T135851#2315983 (10jcrespo) This would be a major change though  I know, but I would still push for it. For the time being this is a non-issue (aka low) because dbs are never restart...
[13:28:07] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[13:30:26] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[13:36:37] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[14:02:40] <jynus>	 !log performing schema change on s6 T130692
[14:02:41] <stashbot>	 T130692: Add new indexes from eec016ece6d2b30addcdf3d3efcc2ba59b10e858 to production databases - https://phabricator.wikimedia.org/T130692
[14:02:47] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:05:17] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:11:18] <icinga-wm>	 PROBLEM - SSH on planet2001 is CRITICAL: Server answer
[14:33:46] <icinga-wm>	 RECOVERY - SSH on planet2001 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:38:17] <jynus>	 !log trying to restart kraz and planet2001 (both service and console unresponsive)
[14:38:23] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:42:06] <icinga-wm>	 RECOVERY - configured eth on planet2001 is OK: OK - interfaces up
[14:42:17] <icinga-wm>	 RECOVERY - dhclient process on planet2001 is OK: PROCS OK: 0 processes with command name dhclient
[14:42:17] <icinga-wm>	 RECOVERY - DPKG on planet2001 is OK: All packages OK
[14:42:27] <icinga-wm>	 RECOVERY - salt-minion processes on planet2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[14:42:47] <icinga-wm>	 RECOVERY - NTP on planet2001 is OK: NTP OK: Offset -0.002482891083 secs
[14:43:37] <icinga-wm>	 RECOVERY - Disk space on planet2001 is OK: DISK OK
[14:43:37] <icinga-wm>	 RECOVERY - RAID on planet2001 is OK: OK: no RAID installed
[14:43:37] <icinga-wm>	 RECOVERY - Check size of conntrack table on planet2001 is OK: OK: nf_conntrack is 0 % full
[14:44:26] <icinga-wm>	 PROBLEM - Host kraz is DOWN: PING CRITICAL - Packet loss = 100%
[14:45:57] <icinga-wm>	 RECOVERY - RAID on kraz is OK: OK: no RAID installed
[14:45:57] <icinga-wm>	 RECOVERY - salt-minion processes on kraz is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[14:46:07] <icinga-wm>	 RECOVERY - Host kraz is UP: PING OK - Packet loss = 0%, RTA = 37.02 ms
[14:46:26] <icinga-wm>	 RECOVERY - puppet last run on planet2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:46:36] <icinga-wm>	 RECOVERY - Check size of conntrack table on kraz is OK: OK: nf_conntrack is 0 % full
[14:46:36] <icinga-wm>	 RECOVERY - Disk space on kraz is OK: DISK OK
[14:46:56] <icinga-wm>	 RECOVERY - DPKG on kraz is OK: All packages OK
[14:46:56] <icinga-wm>	 RECOVERY - configured eth on kraz is OK: OK - interfaces up
[14:46:57] <icinga-wm>	 RECOVERY - dhclient process on kraz is OK: PROCS OK: 0 processes with command name dhclient
[14:47:18] <icinga-wm>	 RECOVERY - SSH on kraz is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u2 (protocol 2.0)
[14:48:06] <icinga-wm>	 RECOVERY - puppet last run on kraz is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[15:23:21] <Glaisher>	 irc.wikimedia.org is broken or something
[15:24:18] <wikibugs>	 06Operations, 10Phabricator, 10Phabricator-Upstream: PHD ensuring umask goodness - https://phabricator.wikimedia.org/T91648#2316165 (10Aklapper) Anyone knows if this is actually still an issue or if T128009 fixed this? @akosiaris, @chasemp or anyone else?  Asking as this task hasn't seen an update since Dec...
[15:30:34] <wikibugs>	 06Operations, 10Wikimedia-IRC-RC-Server: irc.wikimedia.org is not sending out changes - https://phabricator.wikimedia.org/T135948#2316189 (10Glaisher) Adding #operations as I'm not sure who else has access to this.
[15:46:57] <wikibugs>	 06Operations: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242#2316205 (10jcrespo) I've restarted today planet2001 and kraz. The synthoms were different this time- they did not woke up on connection. SSH and console were down/overloaded. I restarted them and the services came to...
[15:54:07] <Glaisher>	 Is anyone looking into https://phabricator.wikimedia.org/T135948 this UBN? It has broken almost all the anti-vandalism bots we have.
[15:54:53] <Glaisher>	 _joe_: akosiaris paravoid  ^
[16:00:18] <jynus>	 Glaisher, check reconecting now
[16:02:13] <Glaisher>	 jynus: working now, thanks
[16:02:28] <jynus>	 can I resolve T135948?
[16:02:28] <stashbot>	 T135948: irc.wikimedia.org is not sending out changes - https://phabricator.wikimedia.org/T135948
[16:02:44] <Glaisher>	 I think so
[16:03:49] <wikibugs>	 06Operations, 10Wikimedia-IRC-RC-Server: irc.wikimedia.org is not sending out changes - https://phabricator.wikimedia.org/T135948#2316214 (10jcrespo) 05Open>03Resolved a:03jcrespo It seems the server overloaded/bugged out, then it was hit by T134875.
[16:03:51] <wikibugs>	 06Operations, 10Wikimedia-IRC-RC-Server: irc.wikimedia.org is not sending out changes - https://phabricator.wikimedia.org/T135948#2316171 (10Dzahn) I checked and the IRC bot is running as of now. Either somebody restarted the service (it's T134875 that it needed that) or it came back a little delayed. It's nor...
[16:04:13] <Glaisher>	 heh ;)
[16:04:51] <wikibugs>	 06Operations, 10Wikimedia-IRC-RC-Server: irc.wikimedia.org is not sending out changes - https://phabricator.wikimedia.org/T135948#2316221 (10jcrespo) No, I have to do it manually, we need to resolve that.
[16:06:46] <wikibugs>	 06Operations: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242#2316223 (10jcrespo) Now it does, it hit T134875.
[16:15:19] <wikibugs>	 06Operations: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242#2316227 (10Dzahn) >>! In T134242#2316205, @jcrespo wrote: >  the instructions are not up to date: https://wikitech.wikimedia.org/wiki/IRCD#Starting_the_bot  updated.
[16:16:22] <wikibugs>	 06Operations: kvm on ganeti instances getting stuck - https://phabricator.wikimedia.org/T134242#2316228 (10jcrespo) Actually, I added them too, more complete: IRCD#Services
[16:18:36] <icinga-wm>	 PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: puppet fail
[16:31:49] <wikibugs>	 06Operations, 06Commons, 10media-storage: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#2316232 (10Aklapper)
[16:33:16] <wikibugs>	 06Operations, 06Commons, 10media-storage: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#1634588 (10Aklapper)
[16:33:37] <wikibugs>	 06Operations, 06Commons, 10media-storage: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#1634588 (10Aklapper)
[16:34:30] <wikibugs>	 06Operations, 06Commons, 10media-storage: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#2316238 (10Aklapper) @Dvorapa: I do not know more than what's written in T84950, sorry. :(
[16:45:16] <icinga-wm>	 RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:16:14] <jynus>	 !log defragmenting db1028
[17:16:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:36:13] <_joe_>	 Glaisher: sorry, I wasn't around
[17:36:17] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 676 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6488837 keys - replication_delay is 676
[17:36:33] <Glaisher>	 _joe_: np, it has been fixed now
[17:37:11] <_joe_>	 Glaisher: I know, I am just sorry I didn't get the ping
[17:38:16] <Glaisher>	 people can't be always be around on IRC ;-)
[18:19:17] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6480537 keys - replication_delay is 0
[18:35:57] <grrrit-wm>	 (03PS1) 10Dzahn: ircd/ircecho: add icinga process monitoring [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) 
[18:36:08] <mutante>	 you are right, i'm adding monitoring ^
[18:37:40] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] ircd/ircecho: add icinga process monitoring [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[18:40:18] <grrrit-wm>	 (03PS2) 10Dzahn: ircd/ircecho: add icinga process monitoring [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) 
[18:41:33] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] ircd/ircecho: add icinga process monitoring [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[18:42:13] <grrrit-wm>	 (03CR) 10Jcrespo: "Actually, both processes were running when down, I would do a more functional approach by trying to join #enwiki, etc. (or easier, checkin" [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[18:42:15] <grrrit-wm>	 (03PS3) 10Dzahn: ircd/ircecho: add icinga process monitoring [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) 
[18:45:33] <grrrit-wm>	 (03PS4) 10Dzahn: ircd/ircecho: add icinga process monitoring [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) 
[18:48:43] <grrrit-wm>	 (03CR) 10Jcrespo: "See comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[18:56:14] <wikibugs>	 06Operations, 06Commons, 10MediaWiki-Page-deletion, 10media-storage, and 3 others: Unable to delete file pages on commons: MWException/LocalFileLockError: "Could not acquire lock" - https://phabricator.wikimedia.org/T132921#2316347 (10Steinsplitter) See also https://commons.wikimedia.org/wiki/Category:Dele...
[18:57:47] <grrrit-wm>	 (03CR) 10Dzahn: ircd/ircecho: add icinga process monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[18:58:36] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 663 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6485884 keys - replication_delay is 663
[19:01:57] <grrrit-wm>	 (03PS1) 10Dzahn: ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) 
[19:03:32] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[19:06:09] <grrrit-wm>	 (03PS2) 10Dzahn: ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) 
[19:06:32] <grrrit-wm>	 (03PS3) 10Dzahn: ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) 
[19:07:41] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[19:08:45] <grrrit-wm>	 (03PS4) 10Dzahn: ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) 
[19:13:06] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6482866 keys - replication_delay is 0
[19:18:29] <grrrit-wm>	 (03PS5) 10Dzahn: ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) 
[19:18:44] <grrrit-wm>	 (03PS6) 10Dzahn: ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) 
[19:19:33] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[19:19:40] <grrrit-wm>	 (03CR) 10Dzahn: "there is also https://gerrit.wikimedia.org/r/#/c/135074/7" [puppet] - 10https://gerrit.wikimedia.org/r/290077 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[19:22:12] <wikibugs>	 06Operations, 10Wikimedia-IRC-RC-Server: Kraz (irc.wikimedia.org) has been flapping on IRC most of day - https://phabricator.wikimedia.org/T135930#2316371 (10Dzahn) these were effects of T134242  it's not happening anymore since the VM got restarted  i'd consider it merged into the above and more or less a dup...
[19:24:14] <wikibugs>	 06Operations, 10Wikimedia-IRC-RC-Server: udpmxircecho should write stats of messages processed and we should alert when that drops to zero - https://phabricator.wikimedia.org/T134326#2316374 (10Dzahn) We have this  https://gerrit.wikimedia.org/r/#/c/135074/7  that already sends user and channel count to statsd...
[19:29:25] <grrrit-wm>	 (03CR) 10Luke081515: "Seems like gerrit restored PS3." [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) (owner: 10Dzahn)
[19:31:49] <grrrit-wm>	 (03PS7) 10Dzahn: ircd: make check_ircd a critical (paging) icinga check [puppet] - 10https://gerrit.wikimedia.org/r/290078 (https://phabricator.wikimedia.org/T135948) 
[19:44:34] <wikibugs>	 06Operations, 06Commons, 10MediaWiki-Page-deletion, 10media-storage, and 3 others: Unable to delete file pages on commons: MWException/LocalFileLockError: "Could not acquire lock" - https://phabricator.wikimedia.org/T132921#2316391 (10Riley_Huntley) Well thats a record.. 32 images in a row that I could not...
[20:36:56] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 665 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6496467 keys - replication_delay is 665
[21:13:36] <grrrit-wm>	 (03PS7) 10Dereckson: Adjust groups permissions on fa.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135774) (owner: 10Urbanecm)
[21:13:56] <grrrit-wm>	 (03CR) 10Dereckson: "PS7: +signed-off-by per PS1 comment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290000 (https://phabricator.wikimedia.org/T135774) (owner: 10Urbanecm)
[21:19:26] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6489689 keys - replication_delay is 0
[21:19:44] <paladox>	 Dereckson: Hi do you have permission to approve translations for this template https://www.mediawiki.org/wiki/Template:WikimediaDownload please.
[21:24:07] <Dereckson>	 paladox: done I think
[21:24:34] <paladox>	 Dereckson: Thanks 
[21:25:02] <Dereckson>	 You're welcome.
[21:31:52] <paladox>	 Dereckson: Would you also be able to enable https://www.mediawiki.org/wiki/Template:WikimediaDownloadOld for translating please.
[21:31:52] <paladox>	 :)
[21:32:58] <wikibugs>	 06Operations, 06Commons, 10MediaWiki-Page-deletion, 10media-storage, and 3 others: Unable to delete file pages on commons: MWException/LocalFileLockError: "Could not acquire lock" - https://phabricator.wikimedia.org/T132921#2316535 (10matmarex) @Riley_Huntley That sounds like a separate bug, Wikidata-relat...
[21:33:02] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2316537 (10Gilles)
[21:33:43] <Dereckson>	 No changes to review. Marking this page for translation will not edit the page nor any existing translation unit.
[21:34:13] <Dereckson>	 The page Template:WikimediaDownloadOld has been marked up for translation with 13 translation units. The page can now be translated. Please import any pre-existing translations: you can use Special:PageMigration for this purpose.
[21:35:43] <Dereckson>	 https://www.mediawiki.org/wiki/Special:AllPages?from=WikimediaDownloadOld&to=&namespace=10 there wasn't any
[21:37:37] <paladox>	 Thanks Dereckson
[21:52:32] <Dereckson>	 yw
[21:53:57] <icinga-wm>	 PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed
[22:18:17] <icinga-wm>	 RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active
[22:28:34] <wikibugs>	 06Operations, 14Spam: invalid ops task - https://phabricator.wikimedia.org/T78602#2316595 (10Danny_B)
[22:50:37] <icinga-wm>	 PROBLEM - Disk space on kafka1022 is CRITICAL: DISK CRITICAL - free space: /var/spool/kafka/b 73805 MB (3% inode=99%)
[23:17:14] <wikibugs>	 06Operations, 10Wikimedia-IRC-RC-Server: Kraz (irc.wikimedia.org) has been flapping on IRC most of day - https://phabricator.wikimedia.org/T135930#2316812 (10Peachey88) 05Open>03Resolved a:03jcrespo > <jynus> !log trying to restart kraz and planet2001 (both service and console unresponsive)  >>! In T1359...
[23:25:06] <wikibugs>	 06Operations, 10ArchCom-RfC, 06Services, 07RfC: Service Ownership and Maintenance - https://phabricator.wikimedia.org/T122825#2316826 (10Danny_B)
[23:25:33] <wikibugs>	 06Operations, 10Analytics, 10ArchCom-RfC, 06Discovery, and 8 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#2316831 (10Danny_B)
[23:25:53] <wikibugs>	 06Operations, 10ArchCom-RfC, 10Architecture, 10Incident-20150423-Commons, and 7 others: RFC: Re-evaluate varnish-level request-restart behavior on 5xx - https://phabricator.wikimedia.org/T97206#2316840 (10Danny_B)