[00:02:32] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw2256 is CRITICAL: connect to address 10.192.16.55 and port 443: Connection refused
[00:02:32] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mw2242 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:02:32] <icinga-wm>	 PROBLEM - Check systemd state on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:02:32] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw2255 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:04:12] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw2242 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:04:12] <icinga-wm>	 PROBLEM - configured eth on mw2242 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:04:12] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw2255 is CRITICAL: connect to address 10.192.16.54 and port 443: Connection refused
[00:04:12] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:04:12] <icinga-wm>	 PROBLEM - Check systemd state on mw2255 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:05:53] <icinga-wm>	 PROBLEM - DPKG on mw2242 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:05:53] <icinga-wm>	 PROBLEM - dhclient process on mw2242 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:05:53] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mw2255 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:05:53] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:05:53] <icinga-wm>	 PROBLEM - configured eth on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:07:42] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2242 is CRITICAL: Host mw2242 is not in mediawiki-installation dsh group
[00:07:42] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw2255 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:07:42] <icinga-wm>	 PROBLEM - configured eth on mw2255 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:07:42] <icinga-wm>	 PROBLEM - dhclient process on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:07:42] <icinga-wm>	 PROBLEM - DPKG on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:09:22] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2256 is CRITICAL: Host mw2256 is not in mediawiki-installation dsh group
[00:09:22] <icinga-wm>	 PROBLEM - DPKG on mw2255 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:09:22] <icinga-wm>	 PROBLEM - dhclient process on mw2255 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:09:22] <icinga-wm>	 PROBLEM - Disk space on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:11:02] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2255 is CRITICAL: Host mw2255 is not in mediawiki-installation dsh group
[00:11:02] <icinga-wm>	 PROBLEM - Disk space on mw2255 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:11:03] <icinga-wm>	 PROBLEM - HHVM processes on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:11:03] <icinga-wm>	 PROBLEM - nutcracker port on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:11:12] <icinga-wm>	 PROBLEM - HHVM rendering on mw2242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:12:43] <icinga-wm>	 PROBLEM - nutcracker process on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:14:42] <icinga-wm>	 PROBLEM - HHVM rendering on mw2255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:14:52] <icinga-wm>	 PROBLEM - HHVM rendering on mw2256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:16:22] <icinga-wm>	 PROBLEM - Apache HTTP on mw2242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:16:22] <icinga-wm>	 PROBLEM - MD RAID on mw2242 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:16:23] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw2242 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:16:23] <icinga-wm>	 PROBLEM - configured eth on mw2242 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:16:32] <icinga-wm>	 PROBLEM - nutcracker port on mw2242 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused
[00:17:02] <icinga-wm>	 RECOVERY - dhclient process on mw2242 is OK: PROCS OK: 0 processes with command name dhclient
[00:17:02] <icinga-wm>	 RECOVERY - DPKG on mw2242 is OK: All packages OK
[00:17:12] <icinga-wm>	 PROBLEM - nutcracker process on mw2242 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nutcracker), command name nutcracker
[00:17:13] <icinga-wm>	 RECOVERY - MD RAID on mw2242 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0
[00:17:22] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw2242 is OK: OK ferm input default policy is set
[00:17:22] <icinga-wm>	 RECOVERY - configured eth on mw2242 is OK: OK - interfaces up
[00:18:03] <icinga-wm>	 PROBLEM - Apache HTTP on mw2256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:19:42] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw2242 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.191 second response time
[00:19:42] <icinga-wm>	 PROBLEM - Check systemd state on mw2242 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:19:52] <icinga-wm>	 PROBLEM - Apache HTTP on mw2255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:20:12] <icinga-wm>	 PROBLEM - Disk space on mw2255 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:20:22] <icinga-wm>	 PROBLEM - Check systemd state on mw2255 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:20:32] <icinga-wm>	 RECOVERY - dhclient process on mw2255 is OK: PROCS OK: 0 processes with command name dhclient
[00:20:32] <icinga-wm>	 RECOVERY - DPKG on mw2255 is OK: All packages OK
[00:20:42] <icinga-wm>	 PROBLEM - nutcracker process on mw2255 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nutcracker), command name nutcracker
[00:20:43] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw2255 is OK: OK: nf_conntrack is 0 % full
[00:21:10] <icinga-wm>	 RECOVERY - Disk space on mw2255 is OK: DISK OK
[00:22:39] <icinga-wm>	 PROBLEM - Disk space on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:22:49] <icinga-wm>	 PROBLEM - nutcracker port on mw2255 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused
[00:22:50] <icinga-wm>	 PROBLEM - Check systemd state on mw2256 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:22:59] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw2256 is OK: OK ferm input default policy is set
[00:23:00] <icinga-wm>	 RECOVERY - configured eth on mw2256 is OK: OK - interfaces up
[00:23:11] <icinga-wm>	 PROBLEM - nutcracker port on mw2256 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused
[00:23:11] <icinga-wm>	 RECOVERY - HHVM processes on mw2256 is OK: PROCS OK: 6 processes with command name hhvm
[00:23:19] <icinga-wm>	 RECOVERY - nutcracker process on mw2242 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker
[00:23:30] <icinga-wm>	 RECOVERY - Disk space on mw2256 is OK: DISK OK
[00:23:39] <icinga-wm>	 RECOVERY - nutcracker port on mw2242 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[00:23:40] <icinga-wm>	 RECOVERY - Check systemd state on mw2242 is OK: OK - running: The system is fully operational
[00:23:49] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw2242 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 619 bytes in 3.987 second response time
[00:24:10] <icinga-wm>	 RECOVERY - HHVM rendering on mw2242 is OK: HTTP OK: HTTP/1.1 200 OK - 75522 bytes in 0.402 second response time
[00:24:46] <wikibugs_>	 (03CR) 10Chad: [C: 032] Drop MEDIAWIKI_DBLIST_DIR, no longer used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428844 (owner: 10Chad)
[00:24:49] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw2257 is CRITICAL: connect to address 10.192.16.56 and port 443: Connection refused
[00:24:49] <icinga-wm>	 PROBLEM - Check systemd state on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:25:13] <no_justification>	 What's up with those codfw apaches?
[00:25:19] <icinga-wm>	 RECOVERY - Apache HTTP on mw2242 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.148 second response time
[00:25:41] <wikibugs_>	 (03CR) 10Chad: [C: 032] Drop MEDIAWIKI_DIRECTORY_REGEX & MEDIAWIKI_VERSION_REGEX unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428845 (owner: 10Chad)
[00:26:16] <wikibugs_>	 (03Merged) 10jenkins-bot: Drop MEDIAWIKI_DBLIST_DIR, no longer used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428844 (owner: 10Chad)
[00:26:30] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:26:57] <wikibugs_>	 (03Merged) 10jenkins-bot: Drop MEDIAWIKI_DIRECTORY_REGEX & MEDIAWIKI_VERSION_REGEX unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428845 (owner: 10Chad)
[00:27:00] <wikibugs_>	 (03PS4) 10Chad: Apache: Move all private wikis to a single vhost block [puppet] - 10https://gerrit.wikimedia.org/r/422571
[00:27:39] <icinga-wm>	 RECOVERY - Check systemd state on mw2255 is OK: OK - running: The system is fully operational
[00:27:50] <icinga-wm>	 RECOVERY - nutcracker process on mw2255 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker
[00:27:59] <icinga-wm>	 RECOVERY - nutcracker port on mw2255 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[00:27:59] <icinga-wm>	 RECOVERY - Apache HTTP on mw2255 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 618 bytes in 7.810 second response time
[00:28:19] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:28:19] <icinga-wm>	 PROBLEM - configured eth on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:28:39] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw2255 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.202 second response time
[00:28:50] <icinga-wm>	 RECOVERY - HHVM rendering on mw2255 is OK: HTTP OK: HTTP/1.1 200 OK - 75522 bytes in 0.400 second response time
[00:29:10] <wikibugs_>	 (03CR) 10jenkins-bot: Drop MEDIAWIKI_DBLIST_DIR, no longer used [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428844 (owner: 10Chad)
[00:30:00] <icinga-wm>	 RECOVERY - Check systemd state on mw2256 is OK: OK - running: The system is fully operational
[00:30:00] <icinga-wm>	 PROBLEM - dhclient process on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:30:00] <icinga-wm>	 PROBLEM - DPKG on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:30:00] <icinga-wm>	 RECOVERY - DPKG on mw2256 is OK: All packages OK
[00:30:10] <wikibugs_>	 (03Abandoned) 10Chad: WIP: Add git::config{} for calling `git config` on repositories. [puppet] - 10https://gerrit.wikimedia.org/r/416200 (owner: 10Chad)
[00:30:11] <icinga-wm>	 RECOVERY - nutcracker port on mw2256 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[00:30:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw2256 is OK: HTTP OK: HTTP/1.1 200 OK - 75524 bytes in 5.871 second response time
[00:31:00] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw2256 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.202 second response time
[00:31:00] <logmsgbot>	 !log demon@tin Synchronized multiversion/defines.php: rm unused defines (duration: 01m 16s)
[00:31:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:31:40] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2257 is CRITICAL: Host mw2257 is not in mediawiki-installation dsh group
[00:31:40] <icinga-wm>	 PROBLEM - Disk space on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:32:10] <icinga-wm>	 RECOVERY - Apache HTTP on mw2256 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 618 bytes in 3.397 second response time
[00:32:29] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mw2242 is OK: OK: synced at Wed 2018-04-25 00:32:26 UTC.
[00:33:20] <icinga-wm>	 PROBLEM - nutcracker port on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:33:20] <icinga-wm>	 PROBLEM - HHVM processes on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:34:10] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mw2256 is OK: OK: synced at Wed 2018-04-25 00:34:06 UTC.
[00:34:36] <wikibugs_>	 (03PS1) 10Chad: Add gerrit.wmfusercontent.org DNS entry [dns] - 10https://gerrit.wikimedia.org/r/428869
[00:34:49] <icinga-wm>	 RECOVERY - dhclient process on mw2256 is OK: PROCS OK: 0 processes with command name dhclient
[00:35:09] <icinga-wm>	 PROBLEM - HHVM rendering on mw2257 is CRITICAL: connect to address 10.192.16.56 and port 80: Connection refused
[00:35:09] <icinga-wm>	 PROBLEM - nutcracker process on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:35:59] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mw2255 is OK: OK: synced at Wed 2018-04-25 00:35:52 UTC.
[00:36:29] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw2255 is OK: OK ferm input default policy is set
[00:36:29] <icinga-wm>	 RECOVERY - nutcracker process on mw2256 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker
[00:36:50] <icinga-wm>	 PROBLEM - puppet last run on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:40:00] <icinga-wm>	 RECOVERY - configured eth on mw2255 is OK: OK - interfaces up
[00:40:20] <icinga-wm>	 PROBLEM - MD RAID on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:41:59] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:44:20] <icinga-wm>	 PROBLEM - HHVM rendering on mw2257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:45:29] <icinga-wm>	 PROBLEM - Apache HTTP on mw2257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:51:20] <icinga-wm>	 PROBLEM - nutcracker process on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:51:20] <icinga-wm>	 PROBLEM - dhclient process on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:51:20] <icinga-wm>	 PROBLEM - DPKG on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:51:29] <icinga-wm>	 RECOVERY - Apache HTTP on mw2257 is OK: HTTP OK: HTTP/1.1 200 OK - 10975 bytes in 0.073 second response time
[00:51:29] <icinga-wm>	 PROBLEM - MD RAID on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:51:30] <icinga-wm>	 PROBLEM - configured eth on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:51:30] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw2257 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[00:51:39] <icinga-wm>	 PROBLEM - nutcracker port on mw2257 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused
[00:51:39] <icinga-wm>	 RECOVERY - HHVM processes on mw2257 is OK: PROCS OK: 6 processes with command name hhvm
[00:51:49] <icinga-wm>	 RECOVERY - Disk space on mw2257 is OK: DISK OK
[00:52:09] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw2257 is OK: OK: nf_conntrack is 0 % full
[00:52:09] <icinga-wm>	 PROBLEM - Check systemd state on mw2257 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:52:19] <icinga-wm>	 RECOVERY - dhclient process on mw2257 is OK: PROCS OK: 0 processes with command name dhclient
[00:52:19] <icinga-wm>	 RECOVERY - DPKG on mw2257 is OK: All packages OK
[00:52:19] <icinga-wm>	 PROBLEM - HHVM rendering on mw2257 is CRITICAL: connect to address 10.192.16.56 and port 80: Connection refused
[00:52:29] <icinga-wm>	 RECOVERY - MD RAID on mw2257 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0
[00:52:30] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw2257 is OK: OK ferm input default policy is set
[00:52:30] <icinga-wm>	 RECOVERY - configured eth on mw2257 is OK: OK - interfaces up
[00:55:10] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw2257 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 619 bytes in 3.926 second response time
[00:55:29] <icinga-wm>	 RECOVERY - HHVM rendering on mw2257 is OK: HTTP OK: HTTP/1.1 200 OK - 75535 bytes in 7.664 second response time
[00:56:41] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mw2257 is OK: OK: synced at Wed 2018-04-25 00:56:35 UTC.
[00:59:31] <icinga-wm>	 RECOVERY - Long running screen/tmux on restbase1010 is OK: OK: No SCREEN or tmux processes detected.
[01:00:22] <icinga-wm>	 RECOVERY - nutcracker process on mw2257 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker
[01:00:51] <icinga-wm>	 RECOVERY - nutcracker port on mw2257 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[01:01:12] <icinga-wm>	 RECOVERY - Check systemd state on mw2257 is OK: OK - running: The system is fully operational
[01:07:01] <icinga-wm>	 RECOVERY - puppet last run on mw2257 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[01:14:57] <wikibugs_>	 10Operations, 10Performance-Team, 10Patch-For-Review: Move coal from graphite#001 nodes to webperf#001 - https://phabricator.wikimedia.org/T159354#4156092 (10Krinkle) @Imarlier I landed it as-is.  Nevermind about using the `/etc/wikimedia-cluster` file ([puppet](https://github.com/wikimedia/puppet/blob/0b915...
[01:34:25] <wikibugs_>	 (03PS2) 10Bstorm: wiki replicas: index script should be able to operate on one DB [puppet] - 10https://gerrit.wikimedia.org/r/428550
[01:34:50] <wikibugs_>	 (03CR) 10Bstorm: wiki replicas: index script should be able to operate on one DB (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/428550 (owner: 10Bstorm)
[01:34:59] <wikibugs_>	 (03PS3) 10Bstorm: wiki replicas: index script should be able to operate on one DB [puppet] - 10https://gerrit.wikimedia.org/r/428550
[01:36:31] <icinga-wm>	 PROBLEM - Disk space on labtestnet2001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=75%)
[02:01:11] <icinga-wm>	 RECOVERY - cassandra-c CQL 10.64.0.116:9042 on restbase1010 is OK: TCP OK - 0.000 second response time on 10.64.0.116 port 9042
[02:31:41] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2257 is OK: OK
[02:36:02] <icinga-wm>	 PROBLEM - cassandra-a CQL 10.64.0.114:9042 on restbase1010 is CRITICAL: connect to address 10.64.0.114 and port 9042: Connection refused
[02:36:31] <icinga-wm>	 PROBLEM - cassandra-a SSL 10.64.0.114:7001 on restbase1010 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused
[02:46:32] <icinga-wm>	 RECOVERY - cassandra-a SSL 10.64.0.114:7001 on restbase1010 is OK: SSL OK - Certificate restbase1010-a valid until 2018-08-17 16:11:05 +0000 (expires in 114 days)
[02:47:02] <icinga-wm>	 RECOVERY - cassandra-a CQL 10.64.0.114:9042 on restbase1010 is OK: TCP OK - 0.000 second response time on 10.64.0.114 port 9042
[02:48:03] <wikibugs_>	 10Operations, 10ops-eqiad, 10Cassandra, 10hardware-requests, and 3 others: Replace 5 Samsung SSD 850 devices w/ 4 1.6T Intel or HP SSDs - https://phabricator.wikimedia.org/T189822#4156115 (10Eevans) All 3 instances of 1010 have been bootstrapped.
[02:55:24] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.30) (duration: 07m 23s)
[02:55:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:07:41] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2242 is OK: OK
[03:09:21] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2256 is OK: OK
[03:11:01] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2255 is OK: OK
[03:26:11] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 708.46 seconds
[03:37:13] <yannf>	 https://phabricator.wikimedia.org/T192866
[03:37:59] <yannf>	 could someone look into this please? It is important, and prevent working
[03:38:37] <yannf>	 I am available if testing is needed
[03:52:34] <ori>	 yannf: the files are not identical
[03:53:06] <yannf>	 ori, which files?
[03:53:41] <ori>	 Nouveau_Larousse_illustré,_1898,_IV.djvu is about 25k larger than Nouveau_Larousse_illustré,_1898,_IV_test.djvu
[03:57:25] <wikibugs_>	 (03CR) 10BryanDavis: [C: 031] wiki replicas: index script should be able to operate on one DB [puppet] - 10https://gerrit.wikimedia.org/r/428550 (owner: 10Bstorm)
[03:59:51] <wikibugs_>	 (03CR) 10Bstorm: [C: 032] wiki replicas: index script should be able to operate on one DB [puppet] - 10https://gerrit.wikimedia.org/r/428550 (owner: 10Bstorm)
[04:04:12] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 208.30 seconds
[04:08:49] <yannf>	 ori, yes, some metadata was changed, otherwise the uploader couldn't upload it
[04:09:37] <yannf>	 I reuploaded the old version here: https://commons.wikimedia.org/wiki/File:Nouveau_Larousse_illustr%C3%A9,_1898,_IV_test.djvu
[04:10:27] <yannf>	 other files from the same series are still not OK: https://commons.wikimedia.org/wiki/File:Nouveau_Larousse_illustr%C3%A9,_1898,_V.djvu
[04:11:33] <yannf>	 and even when the file looks OK on Commons, it doesn't work on WS: https://fr.wikisource.org/wiki/Livre:Nouveau_Larousse_illustr%C3%A9,_1898,_IV.djvu
[05:26:15] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4156182 (10Marostegui)
[05:27:45] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4116638 (10Marostegui)
[05:35:15] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4156186 (10Marostegui)
[05:36:03] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4116638 (10Marostegui) @Cmjohnson I have confirmed that all the hosts with the exception of db1120 as you mentioned, are up and ready - let's keep this opened till it is fixed. T...
[05:48:07] <wikibugs_>	 (03PS1) 10Marostegui: mariadb: Convert db1116 as sanitarium multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/428872 (https://phabricator.wikimedia.org/T192979)
[05:49:31] <wikibugs_>	 (03PS2) 10Marostegui: mariadb: Convert db1116 as sanitarium multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/428872 (https://phabricator.wikimedia.org/T192979)
[05:50:25] <wikibugs_>	 (03PS3) 10Marostegui: mariadb: Convert db1116 as sanitarium multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/428872 (https://phabricator.wikimedia.org/T192979)
[06:13:26] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] mariadb: Convert db1116 as sanitarium multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/428872 (https://phabricator.wikimedia.org/T192979) (owner: 10Marostegui)
[06:14:39] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1113:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428873
[06:14:43] <wikibugs_>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1113:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428873
[06:18:32] <wikibugs_>	 (03PS1) 10Marostegui: db1116.yaml: Give it the correct shards [puppet] - 10https://gerrit.wikimedia.org/r/428874
[06:19:20] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db1116.yaml: Give it the correct shards [puppet] - 10https://gerrit.wikimedia.org/r/428874 (owner: 10Marostegui)
[06:27:50] <wikibugs_>	 10Operations, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Upgrade mysqld_exporter to 0.10.0 - https://phabricator.wikimedia.org/T161296#4156221 (10jcrespo) Note I was not asking it, the main improvement of 0.10.0 is multisource support, which we are moving away from. We can wait for buster.
[06:29:56] <icinga-wm>	 PROBLEM - puppet last run on labmon1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/modprobe.d/nf_conntrack.conf]
[06:30:26] <icinga-wm>	 PROBLEM - puppet last run on labvirt1014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/10-puppet-agent.conf]
[06:40:05] <wikibugs_>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428876 (https://phabricator.wikimedia.org/T190704)
[06:41:38] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428876 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui)
[06:42:57] <wikibugs_>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428876 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui)
[06:44:34] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1106 (duration: 01m 21s)
[06:44:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:45:06] <icinga-wm>	 RECOVERY - Disk space on labtestnet2001 is OK: DISK OK
[06:53:03] <moritzm>	 !log reimaging mw1314, mw1315, mw1316 (API servers) to stretch
[06:53:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:59:56] <icinga-wm>	 RECOVERY - puppet last run on labmon1002 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[07:00:26] <icinga-wm>	 RECOVERY - puppet last run on labvirt1014 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[07:03:47] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ping1001 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[07:03:56] <icinga-wm>	 PROBLEM - Check systemd state on ping1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[07:05:31] <akosiaris>	 !log starting a very slow rolling reboot of all VMs on codfw ganeti cluster, row_C nodegroup, excluding poolcounter1001 and puppetdb1001. T150532
[07:05:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:05:38] <stashbot>	 T150532: Upgrade qemu on ganeti clusters to 2.8 - https://phabricator.wikimedia.org/T150532
[07:05:54] <akosiaris>	 elukey: bohrium is on row_A so ^ this won't affect you for the next few hours
[07:11:14] <elukey>	 ack!
[07:11:22] <wikibugs_>	 (03PS3) 10Muehlenhoff: Remove obsolete fontconfig/imagemagick code from mediawiki::multimedia [puppet] - 10https://gerrit.wikimedia.org/r/428300
[07:13:09] <wikibugs_>	 10Operations, 10vm-requests: Site: 4 VM request for pdf-render/proton - https://phabricator.wikimedia.org/T192983#4156279 (10akosiaris)
[07:13:42] <wikibugs_>	 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4156290 (10akosiaris)
[07:13:44] <wikibugs_>	 10Operations, 10vm-requests: Site: 4 VM request for pdf-render/proton - https://phabricator.wikimedia.org/T192983#4156289 (10akosiaris)
[07:16:40] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Remove obsolete fontconfig/imagemagick code from mediawiki::multimedia [puppet] - 10https://gerrit.wikimedia.org/r/428300 (owner: 10Muehlenhoff)
[07:23:00] <wikibugs_>	 (03PS3) 10Marostegui: Revert "db-eqiad.php: Depool db1113:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428873
[07:24:22] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1113:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428873 (owner: 10Marostegui)
[07:25:37] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1113:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428873 (owner: 10Marostegui)
[07:27:26] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 after alter table (duration: 01m 16s)
[07:27:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:29:30] <wikibugs_>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428877 (https://phabricator.wikimedia.org/T190148)
[07:31:39] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428877 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui)
[07:32:51] <wikibugs_>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428877 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui)
[07:34:21] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1085 for alter table (duration: 01m 16s)
[07:34:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:35:08] <marostegui>	 !log Deploy schema change on db1085 with replication (this will generate lag on labsdb hosts on s6) - T191519 T188299 T190148
[07:35:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:35:16] <stashbot>	 T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519
[07:35:16] <stashbot>	 T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148
[07:35:16] <stashbot>	 T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299
[07:36:43] <wikibugs_>	 (03PS1) 10Urbanecm: New throttle rule for cswiki Wikipedia event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428878 (https://phabricator.wikimedia.org/T192898)
[07:42:59] <wikibugs_>	 (03PS6) 10Muehlenhoff: mediawiki::packages::fonts: Consistently use require_package [puppet] - 10https://gerrit.wikimedia.org/r/420670
[07:44:31] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] mediawiki::packages::fonts: Consistently use require_package [puppet] - 10https://gerrit.wikimedia.org/r/420670 (owner: 10Muehlenhoff)
[07:58:27] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb-backups: Fix configuration error on eqiad backups [puppet] - 10https://gerrit.wikimedia.org/r/428879
[07:59:30] <wikibugs_>	 10Operations: rack/setup/install ms-be104[0-3].eqiad.wmnet - https://phabricator.wikimedia.org/T190081#4156344 (10fgiunchedi)
[07:59:38] <wikibugs_>	 10Operations, 10Patch-For-Review: Rack and setup ms-be1040-1043 - https://phabricator.wikimedia.org/T191896#4156346 (10fgiunchedi)
[07:59:38] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb-backups: Fix configuration error on eqiad backups [puppet] - 10https://gerrit.wikimedia.org/r/428879 (owner: 10Jcrespo)
[08:00:29] <wikibugs_>	 10Operations: rack/setup/install ms-be104[0-3].eqiad.wmnet - https://phabricator.wikimedia.org/T190081#4062129 (10fgiunchedi) a:05RobH>03fgiunchedi
[08:07:57] <wikibugs_>	 (03PS3) 10Muehlenhoff: Remove mediawiki::firejail [puppet] - 10https://gerrit.wikimedia.org/r/428382
[08:12:02] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Remove mediawiki::firejail [puppet] - 10https://gerrit.wikimedia.org/r/428382 (owner: 10Muehlenhoff)
[08:13:22] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: Traceback (most recent call last)
[08:13:32] <icinga-wm>	 PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: Traceback (most recent call last)
[08:14:01] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: Traceback (most recent call last)
[08:14:11] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: Traceback (most recent call last)
[08:14:34] <wikibugs_>	 (03PS1) 10Elukey: Enable meminfo_numa collector on Druid and Hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/428881
[08:14:51] <icinga-wm>	 PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: Traceback (most recent call last)
[08:17:54] <icinga-wm>	 PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: Traceback (most recent call last)
[08:18:34] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: Traceback (most recent call last)
[08:19:01] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Depool db1090 to upgrade it and clone it to db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428882
[08:19:53] <icinga-wm>	 RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 0 probes of 322 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map
[08:19:53] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 8 probes of 300 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[08:19:53] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 0 probes of 319 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[08:19:53] <icinga-wm>	 RECOVERY - IPv4 ping to eqsin on ripe-atlas-eqsin is OK: OK - failed 0 probes of 317 (alerts on 19) - https://atlas.ripe.net/measurements/11645085/#!map
[08:23:16] <moritzm>	 !log reimaging mw1247, mw1248, mw1249 (app servers) to stretch
[08:23:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:50] <godog>	 !log eqiad-prod: add ms-be104[0-3] with minimal weight - T190081
[08:23:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:23:56] <stashbot>	 T190081: rack/setup/install ms-be104[0-3].eqiad.wmnet - https://phabricator.wikimedia.org/T190081
[08:25:46] <wikibugs_>	 (03CR) 10Filippo Giunchedi: [C: 031] Enable meminfo_numa collector on Druid and Hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/428881 (owner: 10Elukey)
[08:26:06] <wikibugs_>	 (03CR) 10Elukey: [C: 032] Enable meminfo_numa collector on Druid and Hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/428881 (owner: 10Elukey)
[08:26:33] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 7 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map
[08:28:24] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 5 probes of 301 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[08:28:47] <icinga-wm>	 RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 0 probes of 320 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map
[08:34:56] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1090 to upgrade it and clone it to db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428882 (owner: 10Jcrespo)
[08:34:59] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Introduce poolcounter1003 [dns] - 10https://gerrit.wikimedia.org/r/428883 (https://phabricator.wikimedia.org/T187297)
[08:35:01] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Introduce proton{1,2}00{1,2} VMs [dns] - 10https://gerrit.wikimedia.org/r/428884 (https://phabricator.wikimedia.org/T192983)
[08:36:11] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Depool db1090 to upgrade it and clone it to db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428882 (owner: 10Jcrespo)
[08:37:22] <wikibugs_>	 (03CR) 10Jonas Kress (WMDE): [C: 031] Set SPARQL services to use internal cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428722 (https://phabricator.wikimedia.org/T192942) (owner: 10Smalyshev)
[08:38:41] <marostegui>	 !log Drop user_old and user_temp tables from s3 - T172664
[08:38:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:44:35] <wikibugs_>	 (03PS1) 10Muehlenhoff: Remove obsolete mediawiki::packages::fonts from mediawiki::multimedia [puppet] - 10https://gerrit.wikimedia.org/r/428886
[08:51:24] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1090 (duration: 01m 17s)
[08:51:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:55:48] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] Introduce poolcounter1003 [dns] - 10https://gerrit.wikimedia.org/r/428883 (https://phabricator.wikimedia.org/T187297) (owner: 10Alexandros Kosiaris)
[08:56:04] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] Introduce proton{1,2}00{1,2} VMs [dns] - 10https://gerrit.wikimedia.org/r/428884 (https://phabricator.wikimedia.org/T192983) (owner: 10Alexandros Kosiaris)
[08:59:31] <icinga-wm>	 PROBLEM - Apache HTTP on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:00:22] <icinga-wm>	 RECOVERY - Apache HTTP on mwdebug1001 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 621 bytes in 0.067 second response time
[09:01:41] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1247 is CRITICAL: connect to address 10.64.48.82 and port 443: Connection refused
[09:01:41] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1247 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:01:41] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on mw1248 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:01:41] <icinga-wm>	 PROBLEM - configured eth on mw1248 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:01:41] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1249 is CRITICAL: Host mw1249 is not in mediawiki-installation dsh group
[09:01:41] <icinga-wm>	 PROBLEM - DPKG on mw1249 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:03:22] <icinga-wm>	 PROBLEM - Check systemd state on mw1247 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:03:22] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on mw1248 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:03:22] <icinga-wm>	 PROBLEM - dhclient process on mw1248 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:03:22] <icinga-wm>	 PROBLEM - Disk space on mw1249 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:03:22] <icinga-wm>	 PROBLEM - nutcracker port on mw1249 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:04:51] <moritzm>	 ^reimage, silencing
[09:05:11] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1248 is CRITICAL: Host mw1248 is not in mediawiki-installation dsh group
[09:05:11] <icinga-wm>	 PROBLEM - DPKG on mw1248 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:08:52] <wikibugs_>	 (03PS1) 10Vgutierrez: Rename lvs[2004-2006] interface dependent hostnames [dns] - 10https://gerrit.wikimedia.org/r/428888 (https://phabricator.wikimedia.org/T191897)
[09:09:45] <jynus>	 !log stopping db1090 for maintenance
[09:09:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:11:42] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1247 is OK: OK: nf_conntrack is 0 % full
[09:12:32] <icinga-wm>	 RECOVERY - Disk space on mw1249 is OK: DISK OK
[09:12:42] <icinga-wm>	 RECOVERY - configured eth on mw1248 is OK: OK - interfaces up
[09:12:42] <icinga-wm>	 RECOVERY - DPKG on mw1249 is OK: All packages OK
[09:13:12] <icinga-wm>	 RECOVERY - DPKG on mw1248 is OK: All packages OK
[09:13:32] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw1248 is OK: OK ferm input default policy is set
[09:13:32] <icinga-wm>	 RECOVERY - dhclient process on mw1248 is OK: PROCS OK: 0 processes with command name dhclient
[09:14:31] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ping1001 is OK: OK ferm input default policy is set
[09:14:51] <icinga-wm>	 RECOVERY - Check systemd state on ping1001 is OK: OK - running: The system is fully operational
[09:15:22] <wikibugs_>	 10Operations, 10OTRS, 10User-notice: Update OTRS to the latest stable version (6.x.x) - https://phabricator.wikimedia.org/T187984#4156588 (10Scoopfinder)
[09:16:25] <wikibugs_>	 10Operations, 10OTRS, 10User-notice: Update OTRS to the latest stable version (6.x.x) - https://phabricator.wikimedia.org/T187984#3992110 (10Scoopfinder)
[09:16:41] <wikibugs_>	 10Operations, 10OTRS, 10User-notice: Update OTRS to the latest stable version (6.x.x) - https://phabricator.wikimedia.org/T187984#3992110 (10Scoopfinder)
[09:17:41] <icinga-wm>	 RECOVERY - Check systemd state on mw1247 is OK: OK - running: The system is fully operational
[09:17:51] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1247 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.067 second response time
[09:18:21] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 032] Reset waitIndex on etcd error 401 [debs/pybal] - 10https://gerrit.wikimedia.org/r/428303 (https://phabricator.wikimedia.org/T169765) (owner: 10Vgutierrez)
[09:18:25] <wikibugs_>	 (03PS3) 10Vgutierrez: Reset waitIndex on etcd error 401 [debs/pybal] - 10https://gerrit.wikimedia.org/r/428303 (https://phabricator.wikimedia.org/T169765)
[09:18:41] <icinga-wm>	 RECOVERY - nutcracker port on mw1249 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[09:18:51] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Setup db1122 as an s2 core eqiad database [puppet] - 10https://gerrit.wikimedia.org/r/428890 (https://phabricator.wikimedia.org/T192979)
[09:20:28] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Setup db1122 as an s2 core eqiad database [puppet] - 10https://gerrit.wikimedia.org/r/428890 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[09:25:04] <wikibugs_>	 (03PS2) 10Mark Bergsma: Move BGP constants into their own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424580
[09:25:07] <wikibugs_>	 (03PS2) 10Mark Bergsma: Split off bgp.FSM into its own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424581
[09:25:36] <mark>	 bah valentin just ahead of me, i need to rebase again ;-)
[09:25:53] <vgutierrez>	 /o\
[09:26:16] <wikibugs_>	 (03PS3) 10Mark Bergsma: Move BGP constants into their own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424580
[09:26:18] <wikibugs_>	 (03PS3) 10Mark Bergsma: Split off bgp.FSM into its own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424581
[09:31:43] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mw1248 is OK: OK: synced at Wed 2018-04-25 09:31:35 UTC.
[09:32:38] <wikibugs_>	 (03CR) 10Mark Bergsma: [C: 031] Move BGP constants into their own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424580 (owner: 10Mark Bergsma)
[09:32:54] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Allow only reimage of db1116, db1120 + upgrade of >db1089 [puppet] - 10https://gerrit.wikimedia.org/r/428891 (https://phabricator.wikimedia.org/T192979)
[09:37:22] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 031] Split off bgp.FSM into its own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424581 (owner: 10Mark Bergsma)
[09:37:50] <wikibugs_>	 (03CR) 10Marostegui: mariadb: Allow only reimage of db1116, db1120 + upgrade of >db1089 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/428891 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[09:38:28] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 031] Move BGP constants into their own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424580 (owner: 10Mark Bergsma)
[09:38:46] <wikibugs_>	 (03CR) 10Mark Bergsma: [C: 032] Move BGP constants into their own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424580 (owner: 10Mark Bergsma)
[09:39:17] <wikibugs_>	 (03Merged) 10jenkins-bot: Move BGP constants into their own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424580 (owner: 10Mark Bergsma)
[09:41:04] <wikibugs_>	 (03CR) 10Jcrespo: "From the comment:" [puppet] - 10https://gerrit.wikimedia.org/r/428891 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[09:42:16] <wikibugs_>	 (03PS6) 10Ema: VCL: only parse X-Connection-Properties if available [puppet] - 10https://gerrit.wikimedia.org/r/428580
[09:42:22] <wikibugs_>	 (03CR) 10Mark Bergsma: [C: 032] Split off bgp.FSM into its own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424581 (owner: 10Mark Bergsma)
[09:42:54] <wikibugs_>	 (03Merged) 10jenkins-bot: Split off bgp.FSM into its own module [debs/pybal] - 10https://gerrit.wikimedia.org/r/424581 (owner: 10Mark Bergsma)
[09:43:23] <wikibugs_>	 (03CR) 10Marostegui: [C: 031] "Thanks, missed that part :)" [puppet] - 10https://gerrit.wikimedia.org/r/428891 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[09:43:43] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Allow only reimage of db1116, db1120 + upgrade of >db1089 [puppet] - 10https://gerrit.wikimedia.org/r/428891 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[09:44:51] <wikibugs_>	 (03CR) 10Marostegui: "Will you do the prometheus addition in a different commit? Just asking to make sure it is not forgotten :)" [puppet] - 10https://gerrit.wikimedia.org/r/428890 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[09:45:30] <wikibugs_>	 (03CR) 10jenkins-bot: Drop MEDIAWIKI_DIRECTORY_REGEX & MEDIAWIKI_VERSION_REGEX unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428845 (owner: 10Chad)
[09:45:34] <wikibugs_>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428876 (https://phabricator.wikimedia.org/T190704) (owner: 10Marostegui)
[09:45:38] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1113:3316" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428873 (owner: 10Marostegui)
[09:45:43] <wikibugs_>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1085 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428877 (https://phabricator.wikimedia.org/T190148) (owner: 10Marostegui)
[09:45:48] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Depool db1090 to upgrade it and clone it to db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428882 (owner: 10Jcrespo)
[09:48:28] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb-auto_reimage: Reimage db1090 into stretch [puppet] - 10https://gerrit.wikimedia.org/r/428892
[09:48:36] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428893
[09:48:39] <wikibugs_>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428893
[09:49:03] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb-auto_reimage: Reimage db1090 into stretch [puppet] - 10https://gerrit.wikimedia.org/r/428892 (owner: 10Jcrespo)
[09:49:28] <wikibugs_>	 (03CR) 10Elukey: [C: 031] Switch scap proxy in A7 to mw1268 [puppet] - 10https://gerrit.wikimedia.org/r/428655 (owner: 10Muehlenhoff)
[09:50:03] <wikibugs_>	 (03CR) 10Elukey: [C: 031] Switch scap proxy in B6 to mw1285 [puppet] - 10https://gerrit.wikimedia.org/r/428683 (owner: 10Muehlenhoff)
[09:50:20] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428893 (owner: 10Marostegui)
[09:51:06] <wikibugs_>	 (03PS5) 10Jcrespo: base: Disable atop daemon everywhere [puppet] - 10https://gerrit.wikimedia.org/r/428579 (https://phabricator.wikimedia.org/T192551)
[09:51:10] <wikibugs_>	 (03CR) 10Gilles: [C: 031] Remove obsolete mediawiki::packages::fonts from mediawiki::multimedia [puppet] - 10https://gerrit.wikimedia.org/r/428886 (owner: 10Muehlenhoff)
[09:51:36] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428893 (owner: 10Marostegui)
[09:51:47] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] base: Disable atop daemon everywhere [puppet] - 10https://gerrit.wikimedia.org/r/428579 (https://phabricator.wikimedia.org/T192551) (owner: 10Jcrespo)
[09:52:41] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Depool poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428894 (https://phabricator.wikimedia.org/T150532)
[09:53:40] <wikibugs_>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428895
[09:53:50] <wikibugs_>	 (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428895
[09:54:06] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1085" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428893 (owner: 10Marostegui)
[09:54:10] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1085 after alter table (duration: 01m 30s)
[09:54:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:54:39] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: Depool poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428894 (https://phabricator.wikimedia.org/T150532)
[09:54:41] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Revert "Depool poolcounter1001" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428896 (https://phabricator.wikimedia.org/T150532)
[09:54:43] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Add poolcounter1003 to $wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428897 (https://phabricator.wikimedia.org/T150532)
[09:55:11] <wikibugs_>	 (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428895 (owner: 10Marostegui)
[09:56:47] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428895 (owner: 10Marostegui)
[09:58:04] <elukey>	 !log reimage analytics106[1,2] to Debian Stretch
[09:58:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:58:19] <logmsgbot>	 !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1106 (duration: 01m 16s)
[09:58:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:59:42] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] "I will, I don't want to add it yet (same with dblists) until the server is up." [puppet] - 10https://gerrit.wikimedia.org/r/428890 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[10:00:41] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428895 (owner: 10Marostegui)
[10:01:43] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1249 is OK: OK
[10:02:41] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Install params for poolcounter1003 [puppet] - 10https://gerrit.wikimedia.org/r/428898 (https://phabricator.wikimedia.org/T187297)
[10:05:13] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1248 is OK: OK
[10:13:10] <wikibugs_>	 (03PS1) 10Elukey: Add the possibility to configure UDF blacklist in Hive 2 server [puppet/cdh] - 10https://gerrit.wikimedia.org/r/428899
[10:13:21] <wikibugs_>	 (03PS2) 10Muehlenhoff: Don't include mediawiki::multimedia on labweb* [puppet] - 10https://gerrit.wikimedia.org/r/428298
[10:15:01] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: Install params for poolcounter1003 [puppet] - 10https://gerrit.wikimedia.org/r/428898 (https://phabricator.wikimedia.org/T187297)
[10:15:08] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Install params for poolcounter1003 [puppet] - 10https://gerrit.wikimedia.org/r/428898 (https://phabricator.wikimedia.org/T187297) (owner: 10Alexandros Kosiaris)
[10:16:00] <wikibugs_>	 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: rack/setup/install lvs101[3-6] - https://phabricator.wikimedia.org/T184293#4156787 (10Vgutierrez) @Cmjohnson we will go with stretch and raid1-lvm (modules/install_server/files/autoinstall/netboot.cfg). Could you add the production dns entries for l...
[10:19:15] <akosiaris>	 !starting a slow rolling reboot of all VMs on eqiad ganeti cluster, row_A nodegroup, excluding bohrium. T150532
[10:19:16] <stashbot>	 T150532: Upgrade qemu on ganeti clusters to 2.8 - https://phabricator.wikimedia.org/T150532
[10:23:29] <wikibugs_>	 (03CR) 10EddieGP: [C: 031] Apache: Move all private wikis to a single vhost block [puppet] - 10https://gerrit.wikimedia.org/r/422571 (owner: 10Chad)
[10:27:48] <eddiegp>	 I'd appreciate if anyone could do https://gerrit.wikimedia.org/r/#/c/425967/ . Unfortunately I couldn't be here for puppet swat yesterday, and won't be able to tomorrow either.
[10:28:30] <wikibugs_>	 (03PS5) 10Mark Bergsma: Introduce server.is_pooled and make server.pooled usage more consistent [debs/pybal] - 10https://gerrit.wikimedia.org/r/421053
[10:28:32] <wikibugs_>	 (03PS1) 10Mark Bergsma: Rename server.pooled to .pool to indicate intent [debs/pybal] - 10https://gerrit.wikimedia.org/r/428900
[10:28:34] <wikibugs_>	 (03PS1) 10Mark Bergsma: Remove server.is_pooled as it isn't actually used [debs/pybal] - 10https://gerrit.wikimedia.org/r/428901
[10:28:47] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Install params for proton[12]00[12] [puppet] - 10https://gerrit.wikimedia.org/r/428902 (https://phabricator.wikimedia.org/T192983)
[10:29:19] <jynus>	 !log stopping replication, running optimize table on dbstore2001:s8
[10:29:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:26] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Install params for proton[12]00[12] [puppet] - 10https://gerrit.wikimedia.org/r/428902 (https://phabricator.wikimedia.org/T192983) (owner: 10Alexandros Kosiaris)
[10:48:27] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4156925 (10jcrespo) with https://gerrit.wikimedia.org/r/428579 deployed, we could close this as resolved, and reevaluate later if to drop the package entirely or to ree...
[10:49:45] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4156928 (10Marostegui) 05Open>03Resolved a:03jcrespo
[10:50:25] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4142726 (10Marostegui) For easy access:  Bug submitted to Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=896767 Bug submitted to upstream: upstream: https://...
[10:55:15] <wikibugs_>	 (03PS4) 10EddieGP: Remove wikipedia.org vhost [puppet] - 10https://gerrit.wikimedia.org/r/398396
[10:57:02] <wikibugs_>	 (03CR) 10EddieGP: "Actually I agree with what Krinkle wrote. Let's make wikipedia.org > www.wikipedia.org a plain redirect." [puppet] - 10https://gerrit.wikimedia.org/r/398396 (owner: 10EddieGP)
[11:01:27] <elukey>	 eddiegp: I am checking https://gerrit.wikimedia.org/r/#/c/425967/, but the commit description puzzles me - isn't this code only for terbium/wasat ?
[11:03:52] <wikibugs_>	 (03PS2) 10Elukey: mediawiki: Disable updateArticleCount cron [puppet] - 10https://gerrit.wikimedia.org/r/425967 (https://phabricator.wikimedia.org/T192139) (owner: 10EddieGP)
[11:03:57] <wikibugs_>	 (03PS3) 10Elukey: mediawiki: Disable updateArticleCount cron [puppet] - 10https://gerrit.wikimedia.org/r/425967 (https://phabricator.wikimedia.org/T192139) (owner: 10EddieGP)
[11:04:07] <wikibugs_>	 (03CR) 10Jcrespo: [C: 031] mediawiki: Disable updateArticleCount cron [puppet] - 10https://gerrit.wikimedia.org/r/425967 (https://phabricator.wikimedia.org/T192139) (owner: 10EddieGP)
[11:04:09] <elukey>	 I update the commit msg 
[11:04:55] <wikibugs_>	 (03CR) 10Elukey: [C: 032] mediawiki: Disable updateArticleCount cron [puppet] - 10https://gerrit.wikimedia.org/r/425967 (https://phabricator.wikimedia.org/T192139) (owner: 10EddieGP)
[11:06:24] <eddiegp>	 elukey: Yes, you're! Sorry for the confusion.
[11:06:37] <eddiegp>	 *you're right even
[11:08:01] <elukey>	 np! Merged and ran on terbium/wasat
[11:09:32] <elukey>	 eddiegp: another side note - as it was discovered by other opsens adding inline comments to httpd's config might lead to unexpected results (for example ServerAlias doesn't stop when it sees a "#")
[11:09:57] <elukey>	 I still need to open a bug upstream, but in the meantime let's try to avoid them
[11:10:10] <elukey>	 (I saw one in a code review for a rewrite rule)
[11:11:45] <eddiegp>	 elukey: Good catch, I'll have a look at my open apache changes. Could easily be that I used that somewhere, not sure where though.
[11:11:57] <eddiegp>	 And thanks for the merge!
[11:12:01] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Reenable notifications on db1122, add to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/428903 (https://phabricator.wikimedia.org/T192979)
[11:12:06] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1970 bytes in 0.093 second response time
[11:12:52] <elukey>	 eddiegp: thank you for the cleanup work :)
[11:13:04] <eddiegp>	 :)
[11:13:25] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Reenable notifications on db1122, add to prometheus [puppet] - 10https://gerrit.wikimedia.org/r/428903 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[11:15:47] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Add db1122 to s2 host list [software] - 10https://gerrit.wikimedia.org/r/428904
[11:17:41] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Add db1122 to s2 host list [software] - 10https://gerrit.wikimedia.org/r/428904 (owner: 10Jcrespo)
[11:19:35] <moritzm>	 !log reimaging mw1228, mw1229, mw1230 (API servers) to stretch
[11:19:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:21:21] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Add, but not pool yet, new server db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428906 (https://phabricator.wikimedia.org/T192979)
[11:22:06] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1974 bytes in 0.093 second response time
[11:23:44] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Add, but not pool yet, new server db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428906 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[11:24:58] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Add, but not pool yet, new server db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428906 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[11:25:27] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Add, but not pool yet, new server db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428906 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[11:29:16] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-codfw.php: Add db1122 (duration: 03m 24s)
[11:29:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:35] <icinga-wm>	 PROBLEM - HTTP on install1002 is CRITICAL: connect to address 208.80.154.22 and port 80: Connection refused
[11:29:45] <icinga-wm>	 PROBLEM - TFTP service on install1002 is CRITICAL: Return code of 255 is out of bounds
[11:29:45] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on install1002 is CRITICAL: Return code of 255 is out of bounds
[11:29:51] <jynus>	 1228,29,30 failed as expected
[11:29:55] <icinga-wm>	 PROBLEM - Check systemd state on install1002 is CRITICAL: CRITICAL - starting: Late bootup, before the job queue becomes idle for the first time, or one of the rescue targets are reached.
[11:30:35] <icinga-wm>	 RECOVERY - HTTP on install1002 is OK: HTTP OK: HTTP/1.1 302 Moved Temporarily - 381 bytes in 0.001 second response time
[11:30:45] <icinga-wm>	 RECOVERY - TFTP service on install1002 is OK: PROCS OK: 1 process with UID = 65534 (nobody), regex args .*/usr/sbin/atftpd .*
[11:30:45] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on install1002 is OK: OK ferm input default policy is set
[11:31:55] <icinga-wm>	 RECOVERY - Check systemd state on install1002 is OK: OK - running: The system is fully operational
[11:31:55] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on db1102 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 6946.37 seconds
[11:32:11] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Add db1122 (duration: 01m 16s)
[11:32:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:32:49] <jynus>	 I guess db1102 s6 is an expired downtime due to a schema change, marostegui?
[11:57:15] <icinga-wm>	 PROBLEM - Disk space on logstash1007 is CRITICAL: Return code of 255 is out of bounds
[11:57:35] <icinga-wm>	 PROBLEM - Check size of conntrack table on logstash1007 is CRITICAL: Return code of 255 is out of bounds
[11:57:36] <icinga-wm>	 PROBLEM - logstash JSON linesTCP port on logstash1007 is CRITICAL: connect to address 127.0.0.1 and port 11514: Connection refused
[11:57:55] <icinga-wm>	 PROBLEM - puppet last run on logstash1007 is CRITICAL: CRITICAL: Puppet has 11 failures. Last run 1 minute ago with 11 failures. Failed resources (up to 3 shown): Service[ssh],Service[exim4],Service[prometheus-elasticsearch-exporter],Service[kibana]
[11:58:15] <wikibugs_>	 (03CR) 10Mark Bergsma: [C: 031] "Ema indicated a preference for renaming .pooled to .pool, and not having .is_pooled at all as it's not actually needed in production code." [debs/pybal] - 10https://gerrit.wikimedia.org/r/421053 (owner: 10Mark Bergsma)
[11:58:33] <icinga-wm>	 RECOVERY - Check size of conntrack table on logstash1007 is OK: OK: nf_conntrack is 0 % full
[11:58:42] <icinga-wm>	 RECOVERY - logstash JSON linesTCP port on logstash1007 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11514
[12:00:03] <icinga-wm>	 RECOVERY - Disk space on logstash1007 is OK: DISK OK
[12:02:53] <icinga-wm>	 RECOVERY - puppet last run on logstash1007 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[12:06:13] <icinga-wm>	 PROBLEM - logstash log4j TCP port on logstash1008 is CRITICAL: connect to address 127.0.0.1 and port 4560: Connection refused
[12:07:13] <icinga-wm>	 RECOVERY - logstash log4j TCP port on logstash1008 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 4560
[12:08:22] <moritzm>	 !log reimaging mw1251, mw1252, mw1253 (app servers) to stretch
[12:08:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:14:48] <wikibugs_>	 (03PS2) 10Muehlenhoff: Switch scap proxy in A7 to mw1268 [puppet] - 10https://gerrit.wikimedia.org/r/428655
[12:14:51] <wikibugs_>	 (03CR) 10BBlack: [C: 031] VCL: only parse X-Connection-Properties if available [puppet] - 10https://gerrit.wikimedia.org/r/428580 (owner: 10Ema)
[12:15:43] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Switch scap proxy in A7 to mw1268 [puppet] - 10https://gerrit.wikimedia.org/r/428655 (owner: 10Muehlenhoff)
[12:15:45] <wikibugs_>	 (03CR) 10BBlack: [C: 031] VCL: 400 on empty/unparseable Host header values [puppet] - 10https://gerrit.wikimedia.org/r/428594 (owner: 10Ema)
[12:21:27] <wikibugs_>	 (03PS2) 10EddieGP: mediawiki: Remove updateArticleCount cron [puppet] - 10https://gerrit.wikimedia.org/r/425968 (https://phabricator.wikimedia.org/T192139)
[12:23:57] <wikibugs_>	 (03CR) 10Ema: [C: 031] Rename server.pooled to .pool to indicate intent [debs/pybal] - 10https://gerrit.wikimedia.org/r/428900 (owner: 10Mark Bergsma)
[12:25:31] <wikibugs_>	 (03CR) 10Ema: [C: 031] Remove server.is_pooled as it isn't actually used [debs/pybal] - 10https://gerrit.wikimedia.org/r/428901 (owner: 10Mark Bergsma)
[12:27:13] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on db1102 is OK: OK slave_sql_lag Replication lag: 0.14 seconds
[12:28:12] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: Install params for proton[12]00[12] [puppet] - 10https://gerrit.wikimedia.org/r/428902 (https://phabricator.wikimedia.org/T192983)
[12:28:48] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Install params for proton[12]00[12] [puppet] - 10https://gerrit.wikimedia.org/r/428902 (https://phabricator.wikimedia.org/T192983) (owner: 10Alexandros Kosiaris)
[12:31:28] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] "Removing jenkins-bots -1, it's about including standard, which is fine for now, we will undo it anyway soon" [puppet] - 10https://gerrit.wikimedia.org/r/428902 (https://phabricator.wikimedia.org/T192983) (owner: 10Alexandros Kosiaris)
[12:34:16] <icinga-wm>	 PROBLEM - Disk space on mw1253 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:34:16] <icinga-wm>	 PROBLEM - nutcracker port on mw1253 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:35:56] <icinga-wm>	 PROBLEM - HHVM processes on mw1253 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:35:56] <icinga-wm>	 PROBLEM - nutcracker process on mw1253 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:37:45] <icinga-wm>	 PROBLEM - HHVM rendering on mw1253 is CRITICAL: connect to address 10.64.48.88 and port 80: Connection refused
[12:37:45] <icinga-wm>	 PROBLEM - puppet last run on mw1253 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:40:05] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes1001 is CRITICAL: 1.668e+04 ge 1.5e+04 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:41:05] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes1001 is OK: (C)1.5e+04 ge (W)1e+04 ge 4781 https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[12:42:55] <icinga-wm>	 PROBLEM - Apache HTTP on mw1253 is CRITICAL: connect to address 10.64.48.88 and port 80: Connection refused
[12:43:55] <moritzm>	 ^ silencing
[12:44:35] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1253 is CRITICAL: connect to address 10.64.48.88 and port 443: Connection refused
[12:44:35] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1253 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[12:45:51] <marostegui>	 jynus: yeah, sorry, expired downtime on db1102
[12:46:09] <marostegui>	 Going to downtime it again
[12:46:33] <marostegui>	 Ah, actually it finished 
[12:46:55] <icinga-wm>	 RECOVERY - Apache HTTP on mw1253 is OK: HTTP OK: HTTP/1.1 200 OK - 10975 bytes in 0.001 second response time
[12:47:09] <akosiaris>	 !log reboot puppetdb1001 for T150532
[12:47:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:47:16] <stashbot>	 T150532: Upgrade qemu on ganeti clusters to 2.8 - https://phabricator.wikimedia.org/T150532
[12:50:01] <Urbanecm>	 jouncebot, next
[12:50:01] <jouncebot>	 In 0 hour(s) and 9 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180425T1300)
[12:51:06] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 031] Remove server.is_pooled as it isn't actually used [debs/pybal] - 10https://gerrit.wikimedia.org/r/428901 (owner: 10Mark Bergsma)
[12:51:35] <wikibugs_>	 (03CR) 10Vgutierrez: [C: 031] Rename server.pooled to .pool to indicate intent [debs/pybal] - 10https://gerrit.wikimedia.org/r/428900 (owner: 10Mark Bergsma)
[12:53:01] <wikibugs_>	 10Operations, 10Design-Research: Edit optoutresearch@ mailing list recipients - https://phabricator.wikimedia.org/T100860#4157400 (10Aklapper)
[12:55:05] <icinga-wm>	 RECOVERY - HHVM processes on mw1253 is OK: PROCS OK: 1 process with command name hhvm
[12:55:25] <icinga-wm>	 RECOVERY - Disk space on mw1253 is OK: DISK OK
[12:55:36] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1253 is OK: OK: nf_conntrack is 0 % full
[12:55:45] <gehel>	 !log starting elasticsearch codfw rolling restart for plugin update and NUMA config - T191543 / T191236
[12:55:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:52] <stashbot>	 T191543: Deploy updated search/extra plugin with Slovak Stemmer - https://phabricator.wikimedia.org/T191543
[12:55:53] <stashbot>	 T191236: Resolve elasticsearch latency alerts - https://phabricator.wikimedia.org/T191236
[12:57:45] <icinga-wm>	 RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[12:58:45] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1253 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 619 bytes in 4.663 second response time
[12:58:54] <icinga-wm>	 RECOVERY - HHVM rendering on mw1253 is OK: HTTP OK: HTTP/1.1 200 OK - 73517 bytes in 6.932 second response time
[12:59:26] <wikibugs_>	 (03PS1) 10Jcrespo: mariadb: Repool with low load db1090, db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428913 (https://phabricator.wikimedia.org/T192979)
[13:00:04] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the European Mid-day SWAT(Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180425T1300).
[13:00:04] <jouncebot>	 Urbanecm and Amir1: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[13:00:10] <Urbanecm>	 present
[13:00:16] <Amir1>	 o/
[13:00:20] <zeljkof>	 I can SWAT today
[13:01:05] <zeljkof>	 Amir1: you can start with your config change while I get ready, then you have backports, I guess we can deploy in parallel
[13:01:16] <Amir1>	 zeljkof: cool
[13:01:34] <zeljkof>	 Amir1: let me know when you are done with the config change
[13:01:42] <Amir1>	 sure
[13:01:43] <wikibugs_>	 (03PS2) 10Ladsgroup: Remove xx-uca-fa for Persian Wikis except Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428626
[13:01:45] <wikibugs_>	 (03PS1) 10Filippo Giunchedi: Add puppetization for mcrouter_exporter [puppet] - 10https://gerrit.wikimedia.org/r/428914 (https://phabricator.wikimedia.org/T192763)
[13:02:23] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add puppetization for mcrouter_exporter [puppet] - 10https://gerrit.wikimedia.org/r/428914 (https://phabricator.wikimedia.org/T192763) (owner: 10Filippo Giunchedi)
[13:02:40] <wikibugs_>	 (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428626 (owner: 10Ladsgroup)
[13:03:34] <icinga-wm>	 RECOVERY - nutcracker port on mw1253 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[13:03:56] <wikibugs_>	 (03Merged) 10jenkins-bot: Remove xx-uca-fa for Persian Wikis except Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428626 (owner: 10Ladsgroup)
[13:04:03] <wikibugs_>	 (03PS2) 10Filippo Giunchedi: Add puppetization for mcrouter_exporter [puppet] - 10https://gerrit.wikimedia.org/r/428914 (https://phabricator.wikimedia.org/T192763)
[13:04:05] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1972 bytes in 0.123 second response time
[13:04:14] <icinga-wm>	 RECOVERY - nutcracker process on mw1253 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker
[13:04:43] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add puppetization for mcrouter_exporter [puppet] - 10https://gerrit.wikimedia.org/r/428914 (https://phabricator.wikimedia.org/T192763) (owner: 10Filippo Giunchedi)
[13:05:21] <wikibugs_>	 (03CR) 10jenkins-bot: Remove xx-uca-fa for Persian Wikis except Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428626 (owner: 10Ladsgroup)
[13:05:44] <Amir1>	 zeljkof: the config change is not correct, I need to make a follow up :/
[13:05:46] <Amir1>	 sorry
[13:05:52] <zeljkof>	 Amir1: ok
[13:06:04] <marostegui>	 !log Deploy schema change on s2 codfw master (db2035) - this will generate lag on codfw - T191519 T188299 T190148
[13:06:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:12] <stashbot>	 T191519: Schema change for rc_namespace_title_timestamp index - https://phabricator.wikimedia.org/T191519
[13:06:12] <stashbot>	 T190148: Change DEFAULT 0 for rev_text_id on production DBs - https://phabricator.wikimedia.org/T190148
[13:06:12] <stashbot>	 T188299: Schema change for refactored actor storage - https://phabricator.wikimedia.org/T188299
[13:06:14] <wikibugs_>	 (03PS7) 10Ema: VCL: only parse X-Connection-Properties if available [puppet] - 10https://gerrit.wikimedia.org/r/428580
[13:07:16] <wikibugs_>	 (03CR) 10Ema: [C: 032] VCL: only parse X-Connection-Properties if available [puppet] - 10https://gerrit.wikimedia.org/r/428580 (owner: 10Ema)
[13:07:30] <wikibugs_>	 (03PS1) 10Ladsgroup: Use the right uca for Persian Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428916
[13:07:48] <wikibugs_>	 (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428916 (owner: 10Ladsgroup)
[13:09:12] <wikibugs_>	 (03Merged) 10jenkins-bot: Use the right uca for Persian Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428916 (owner: 10Ladsgroup)
[13:12:07] <logmsgbot>	 !log ladsgroup@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:428626|Remove xx-uca-fa for Persian Wikis except Wikipedia]] (duration: 01m 17s)
[13:12:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:12:16] <wikibugs_>	 (03CR) 10jenkins-bot: Use the right uca for Persian Wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428916 (owner: 10Ladsgroup)
[13:12:53] <Amir1>	 zeljkof: done
[13:14:00] <zeljkof>	 Amir1: ok, deploying a couple of config changes, I guess you can merge your backports, they could take a while
[13:14:17] <wikibugs_>	 (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428878 (https://phabricator.wikimedia.org/T192898) (owner: 10Urbanecm)
[13:14:31] <zeljkof>	 Urbanecm: deploying throttle commit first, 428878
[13:14:32] <Amir1>	 zeljkof: well, they will fail cause phan fails on branches (the same old issue)
[13:14:36] <wikibugs_>	 (03CR) 10Mark Bergsma: [C: 032] Introduce server.is_pooled and make server.pooled usage more consistent [debs/pybal] - 10https://gerrit.wikimedia.org/r/421053 (owner: 10Mark Bergsma)
[13:14:49] <zeljkof>	 Urbanecm: I will ping you when the second commit is at mwdebug
[13:14:55] <Urbanecm>	 ack
[13:15:08] <wikibugs_>	 (03Merged) 10jenkins-bot: Introduce server.is_pooled and make server.pooled usage more consistent [debs/pybal] - 10https://gerrit.wikimedia.org/r/421053 (owner: 10Mark Bergsma)
[13:15:14] <zeljkof>	 Amir1: uh oh, it's up to you :)
[13:15:40] <wikibugs_>	 (03Merged) 10jenkins-bot: New throttle rule for cswiki Wikipedia event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428878 (https://phabricator.wikimedia.org/T192898) (owner: 10Urbanecm)
[13:16:34] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] Depool poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428894 (https://phabricator.wikimedia.org/T150532) (owner: 10Alexandros Kosiaris)
[13:16:38] <wikibugs_>	 (03CR) 10Ottomata: [C: 031] Add the possibility to configure UDF blacklist in Hive 2 server [puppet/cdh] - 10https://gerrit.wikimedia.org/r/428899 (owner: 10Elukey)
[13:16:40] <wikibugs_>	 (03PS2) 10Zfilipin: Enable Mapframe for bgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428630 (https://phabricator.wikimedia.org/T192895) (owner: 10Urbanecm)
[13:17:03] <wikibugs_>	 (03CR) 10Elukey: [C: 032] Add the possibility to configure UDF blacklist in Hive 2 server [puppet/cdh] - 10https://gerrit.wikimedia.org/r/428899 (owner: 10Elukey)
[13:17:33] <logmsgbot>	 !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:428878|New throttle rule for cswiki Wikipedia event (T192898)]] (duration: 01m 16s)
[13:17:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:17:40] <stashbot>	 T192898: Please lift the IP cap on 2018-05-03 - https://phabricator.wikimedia.org/T192898
[13:17:53] <zeljkof>	 Urbanecm: 428878 deployed
[13:17:57] <Urbanecm>	 ack
[13:18:13] <zeljkof>	 herron: a minor scap hickup during eu swat today
[13:18:28] <akosiaris>	 ?
[13:18:30] <zeljkof>	 13:17:29 ['/usr/bin/scap', 'pull', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/throttle.php', 'mw1268.eqiad.wmnet', 'mw1284.eqiad.wmnet', 'mw1319.eqiad.wmnet', 'mw2290.codfw.wmnet', 'mw2215.codfw.wmnet', 'mw2254.codfw.wmnet', 'mw2187.codfw.wmnet', 'mw1250.eqiad.wmnet', 'mw1313.eqiad.wmnet'] on mw1230.eqiad.wmnet returned [255]: Host key verification failed.
[13:18:31] <wikibugs_>	 (03CR) 10jenkins-bot: New throttle rule for cswiki Wikipedia event [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428878 (https://phabricator.wikimedia.org/T192898) (owner: 10Urbanecm)
[13:18:36] <wikibugs_>	 (03CR) 10jenkins-bot: Depool poolcounter1001 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428894 (https://phabricator.wikimedia.org/T150532) (owner: 10Alexandros Kosiaris)
[13:18:39] <zeljkof>	 13:17:29 ['/usr/bin/scap', 'pull', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/throttle.php', 'mw1268.eqiad.wmnet', 'mw1284.eqiad.wmnet', 'mw1319.eqiad.wmnet', 'mw2290.codfw.wmnet', 'mw2215.codfw.wmnet', 'mw2254.codfw.wmnet', 'mw2187.codfw.wmnet', 'mw1250.eqiad.wmnet', 'mw1313.eqiad.wmnet'] on mw1228.eqiad.wmnet returned [255]: Host key verification failed.
[13:19:14] <logmsgbot>	 !log akosiaris@tin Synchronized wmf-config/ProductionServices.php: depool poolcounter1001 T150532 (duration: 01m 17s)
[13:19:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:19:20] <stashbot>	 T150532: Upgrade qemu on ganeti clusters to 2.8 - https://phabricator.wikimedia.org/T150532
[13:19:21] <moritzm>	 zeljkof: one of the scap proxies changed earlier, but puppet should have fixed that by now?
[13:19:31] <akosiaris>	 zeljkof: https://tools.wmflabs.org/sal/log/AWLS9gcSCdtJF08990fa
[13:19:39] <moritzm>	 zeljkof: ah, no. wait
[13:19:54] <akosiaris>	 some of these hosts are in the reimaging process from what I gather ?
[13:19:57] <zeljkof>	 herron, akosiaris, moritzm: should I continue with scap? or wait?
[13:20:10] <wikibugs_>	 (03PS3) 10Filippo Giunchedi: Add puppetization for mcrouter_exporter [puppet] - 10https://gerrit.wikimedia.org/r/428914 (https://phabricator.wikimedia.org/T192763)
[13:20:20] <moritzm>	 yeah, mw1230 was reimaged earlier, but seems there was a problem with wmf-reimage, I'll set it as deactived
[13:20:21] <zeljkof>	 Amir1: did you get the same error messages while deploying?
[13:20:29] <moritzm>	 zeljkof: give me a minute, then you can proceed
[13:20:31] <Amir1>	 nope, it was fine
[13:20:41] <zeljkof>	 moritzm: ok, waiting cc Urbanecm 
[13:21:07] <Urbanecm>	 moritzm, zeljkof, what's happening? Something's broken?
[13:21:23] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Repool with low load db1090, db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428913 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[13:21:28] <moritzm>	 zeljkof: please try again, I've set mw1228-mw1230 as deactivated
[13:21:43] <zeljkof>	 Urbanecm: a couple of servers say Host key verification failed. 
[13:21:44] <akosiaris>	 jynus: wait, don't deploy that yet
[13:21:58] <akosiaris>	 scap issues
[13:22:05] <zeljkof>	 moritzm: ok, deploying
[13:22:06] <jynus>	 akosiaris: waiting
[13:22:06] <Urbanecm>	 Ok, let's wait, plenty of time :)
[13:22:36] <moritzm>	 Urbanecm: mw1228-mw1230 were reimaged, but it seems wmf-auto-reimage had a problem with the IPMI command triggering the reboot
[13:23:01] <moritzm>	 but they are no longer considered by scap for now, so should be fine now
[13:23:32] <logmsgbot>	 !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:428878|New throttle rule for cswiki Wikipedia event (T192898)]] (duration: 01m 16s)
[13:23:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:23:41] <stashbot>	 T192898: Please lift the IP cap on 2018-05-03 - https://phabricator.wikimedia.org/T192898
[13:23:58] <zeljkof>	 akosiaris, moritzm: no problems, thanks, continuing with swat
[13:24:03] <wikibugs_>	 (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428630 (https://phabricator.wikimedia.org/T192895) (owner: 10Urbanecm)
[13:24:10] <akosiaris>	 ok, thanks!
[13:24:11] <akosiaris>	 jynus: ^
[13:24:27] <moritzm>	 zeljkof: ack, sorry the interruption
[13:24:40] <zeljkof>	 Urbanecm: merging 428630, will ping you when it's at mwdebug
[13:24:45] <Urbanecm>	 ack
[13:24:59] <zeljkof>	 moritzm: no problem, thanks for the quick help!
[13:25:04] <wikibugs_>	 (03PS1) 10Elukey: profile::hive::client: blacklist a UDF builtin for CVE-2018-1284 [puppet] - 10https://gerrit.wikimedia.org/r/428919
[13:25:20] <wikibugs_>	 (03Merged) 10jenkins-bot: Enable Mapframe for bgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428630 (https://phabricator.wikimedia.org/T192895) (owner: 10Urbanecm)
[13:26:09] <zeljkof>	 Urbanecm: 428630 is at mwdebug
[13:26:22] <wikibugs_>	 (03CR) 10jenkins-bot: Enable Mapframe for bgwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428630 (https://phabricator.wikimedia.org/T192895) (owner: 10Urbanecm)
[13:26:27] <Urbanecm>	 zeljkof, going to test
[13:27:21] <wikibugs_>	 (03CR) 10Mark Bergsma: Rename server.pooled to .pool to indicate intent (031 comment) [debs/pybal] - 10https://gerrit.wikimedia.org/r/428900 (owner: 10Mark Bergsma)
[13:27:28] <wikibugs_>	 (03PS1) 10Filippo Giunchedi: Initial debianization [debs/prometheus-mcrouter-exporter] - 10https://gerrit.wikimedia.org/r/428920
[13:27:33] <wikibugs_>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/11034/analytics1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/428919 (owner: 10Elukey)
[13:28:37] <Urbanecm>	 zeljkof, working, please deploy
[13:28:44] <zeljkof>	 Urbanecm: deploying
[13:29:43] <Urbanecm>	 ack
[13:30:01] <logmsgbot>	 !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:428630|Enable Mapframe for bgwiki (T192895)]] (duration: 01m 15s)
[13:30:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:30:07] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4157551 (10faidon) 05Resolved>03Open My two cents: - I don't see this hiera knob used anywhere in the tree right now; has anyone expressed interest in using it in i...
[13:30:07] <stashbot>	 T192895: Enable Kartographer on the Bulgarian Wikipedia - https://phabricator.wikimedia.org/T192895
[13:30:21] <wikibugs_>	 (03PS2) 10Elukey: profile::hive::client: blacklist a UDF builtin for CVE-2018-1284 [puppet] - 10https://gerrit.wikimedia.org/r/428919
[13:30:34] <zeljkof>	 Urbanecm: 428630 is deployed, please check and thanks for deploying with #releng! :)
[13:30:38] <zeljkof>	 Amir1: swat is yours
[13:30:49] <Amir1>	 \o/
[13:30:51] <Amir1>	 Thanks!
[13:30:57] <Urbanecm>	 Working, thank you for the deploy!
[13:31:35] <wikibugs_>	 (03PS2) 10Mark Bergsma: Rename server.pooled to .pool to indicate intent [debs/pybal] - 10https://gerrit.wikimedia.org/r/428900
[13:31:37] <wikibugs_>	 (03PS2) 10Mark Bergsma: Remove server.is_pooled as it isn't actually used [debs/pybal] - 10https://gerrit.wikimedia.org/r/428901
[13:33:48] <wikibugs_>	 (03CR) 10Mark Bergsma: [C: 032] Rename server.pooled to .pool to indicate intent [debs/pybal] - 10https://gerrit.wikimedia.org/r/428900 (owner: 10Mark Bergsma)
[13:34:12] <wikibugs_>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/11035/analytics1003.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/428919 (owner: 10Elukey)
[13:34:19] <wikibugs_>	 (03Merged) 10jenkins-bot: Rename server.pooled to .pool to indicate intent [debs/pybal] - 10https://gerrit.wikimedia.org/r/428900 (owner: 10Mark Bergsma)
[13:34:33] <wikibugs_>	 (03CR) 10Elukey: [C: 032] profile::hive::client: blacklist a UDF builtin for CVE-2018-1284 [puppet] - 10https://gerrit.wikimedia.org/r/428919 (owner: 10Elukey)
[13:38:14] <wikibugs_>	 (03CR) 10Gehel: Set SPARQL services to use internal cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428722 (https://phabricator.wikimedia.org/T192942) (owner: 10Smalyshev)
[13:38:55] <wikibugs_>	 (03PS2) 10Ottomata: Set PXE boot to Debian Stretch for kafka[12]00[123] [puppet] - 10https://gerrit.wikimedia.org/r/428575 (https://phabricator.wikimedia.org/T192832) (owner: 10Elukey)
[13:40:22] <wikibugs_>	 (03PS7) 10Ema: VCL: 400 on empty/unparseable Host header values [puppet] - 10https://gerrit.wikimedia.org/r/428594
[13:40:46] <akosiaris>	 !log reboot poolcounter1001 for T150532
[13:40:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:40:54] <stashbot>	 T150532: Upgrade qemu on ganeti clusters to 2.8 - https://phabricator.wikimedia.org/T150532
[13:40:55] <wikibugs_>	 (03CR) 10Ottomata: [C: 032] Set PXE boot to Debian Stretch for kafka[12]00[123] [puppet] - 10https://gerrit.wikimedia.org/r/428575 (https://phabricator.wikimedia.org/T192832) (owner: 10Elukey)
[13:42:38] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4157574 (10Marostegui) I would prefer option #2 (remove atop). My reasoning for it is that we now have to remove "-R" from it, but what could happen in the future? Mayb...
[13:43:23] <logmsgbot>	 !log ladsgroup@tin Synchronized php-1.31.0-wmf.30/extensions/Wikibase/lib/includes/Changes: [[gerrit:428907|Make sure statements in EntityDiffChangedAspects are not passed around as stdClass (T192085)]] (duration: 01m 17s)
[13:43:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:43:29] <stashbot>	 T192085: PHP Fatal in AffectedPagesFinder::getChangedAspects - https://phabricator.wikimedia.org/T192085
[13:43:55] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4157581 (10jcrespo) I tried to do the least amount of impact regarding atop, and offer a way to enable it to who could complain about it. If I was the one to decide, I...
[13:44:21] <Amir1>	 I need some minutes to make sure this doesn't make the infra to fall over
[13:44:59] <wikibugs_>	 (03PS8) 10Ema: VCL: 400 on empty/unparseable Host header values [puppet] - 10https://gerrit.wikimedia.org/r/428594
[13:45:47] <wikibugs_>	 (03CR) 10Ema: [C: 032] VCL: 400 on empty/unparseable Host header values [puppet] - 10https://gerrit.wikimedia.org/r/428594 (owner: 10Ema)
[13:45:49] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] Revert "Depool poolcounter1001" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428896 (https://phabricator.wikimedia.org/T150532) (owner: 10Alexandros Kosiaris)
[13:46:41] <wikibugs_>	 (03PS2) 10Alexandros Kosiaris: Add poolcounter1003 to $wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428897 (https://phabricator.wikimedia.org/T187297)
[13:47:03] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "Depool poolcounter1001" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428896 (https://phabricator.wikimedia.org/T150532) (owner: 10Alexandros Kosiaris)
[13:47:57] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add poolcounter1003 to $wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428897 (https://phabricator.wikimedia.org/T187297) (owner: 10Alexandros Kosiaris)
[13:48:06] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4142726 (10akosiaris) > If I was the one to decide, I would personally remove it from everwhere, too.  FWIW, this has my +1.
[13:49:48] <logmsgbot>	 !log akosiaris@tin Synchronized wmf-config/ProductionServices.php: repool poolcounter1001 T150532 (duration: 01m 16s)
[13:49:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:55] <stashbot>	 T150532: Upgrade qemu on ganeti clusters to 2.8 - https://phabricator.wikimedia.org/T150532
[13:51:12] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4032605 (10Pchelolo)
[13:51:38] <wikibugs_>	 (03PS3) 10Alexandros Kosiaris: Add poolcounter1003 to $wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428897 (https://phabricator.wikimedia.org/T187297)
[13:53:20] <logmsgbot>	 !log ladsgroup@tin Synchronized php-1.32.0-wmf.1/extensions/Wikibase/lib/includes/Changes: [[gerrit:428908|Make sure statements in EntityDiffChangedAspects are not passed around as stdClass (T192085)]] (duration: 01m 16s)
[13:53:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:53:26] <stashbot>	 T192085: PHP Fatal in AffectedPagesFinder::getChangedAspects - https://phabricator.wikimedia.org/T192085
[13:53:48] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] Add poolcounter1003 to $wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428897 (https://phabricator.wikimedia.org/T187297) (owner: 10Alexandros Kosiaris)
[13:53:49] <Amir1>	 !log EU SWAT is done!
[13:53:51] <wikibugs_>	 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack frbast1001 - https://phabricator.wikimedia.org/T187363#4157629 (10Jgreen) No display output after the host started pxeboot sequence, turns out it needed "Redirection After Boot" enabled in BIOS.
[13:53:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:59] <wikibugs_>	 (03Merged) 10jenkins-bot: Add poolcounter1003 to $wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428897 (https://phabricator.wikimedia.org/T187297) (owner: 10Alexandros Kosiaris)
[13:57:18] <logmsgbot>	 !log akosiaris@tin Synchronized wmf-config/ProductionServices.php: pool poolcounter1003 T187297 (duration: 01m 16s)
[13:57:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:57:26] <stashbot>	 T187297: VM for poolcounter1002 - https://phabricator.wikimedia.org/T187297
[13:57:35] <wikibugs_>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4157635 (10Marostegui)
[13:58:09] <wikibugs_>	 (03CR) 10Muehlenhoff: "Looks good, some random comments" (035 comments) [debs/prometheus-mcrouter-exporter] - 10https://gerrit.wikimedia.org/r/428920 (owner: 10Filippo Giunchedi)
[14:02:55] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157665 (10dcausse) I don't have strong opinions on which wikis we should migrate next. My sole concerns right now is regarding write freezes when we resta...
[14:02:59] <wikibugs_>	 (03PS1) 10Muehlenhoff: Reimage mwdebug servers with stretch [puppet] - 10https://gerrit.wikimedia.org/r/428923 (https://phabricator.wikimedia.org/T174431)
[14:04:03] <wikibugs_>	 10Operations, 10vm-requests, 10Patch-For-Review: VM for poolcounter1002 - https://phabricator.wikimedia.org/T187297#4157669 (10akosiaris) 05Open>03Resolved a:03akosiaris poolcounter1003 is up and running fine and serving connections for the mediawiki fleet. I 'll resolve this and create a decom task fo...
[14:04:18] <wikibugs_>	 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4157672 (10jcrespo)
[14:05:37] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157685 (10mobrovac) >>! In T189137#4157665, @dcausse wrote: > I don't have strong opinions on which wikis we should migrate next.  group1 could be a good...
[14:05:41] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Depool poolcounter1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428924 (https://phabricator.wikimedia.org/T193025)
[14:08:56] <wikibugs_>	 (03PS1) 10Alexandros Kosiaris: Assign role spare to poolcounter1002 [puppet] - 10https://gerrit.wikimedia.org/r/428925 (https://phabricator.wikimedia.org/T193025)
[14:09:28] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] Depool poolcounter1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428924 (https://phabricator.wikimedia.org/T193025) (owner: 10Alexandros Kosiaris)
[14:09:32] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157701 (10Pchelolo) The subtasks that were created to fix issues discovered during the first iteration of the switch were resolved, and I don't see any lo...
[14:09:36] <wikibugs_>	 (03CR) 10Hoo man: Set SPARQL services to use internal cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428722 (https://phabricator.wikimedia.org/T192942) (owner: 10Smalyshev)
[14:10:11] <wikibugs_>	 10Operations, 10Code-Stewardship-Reviews, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4157703 (10faidon) @danstillman this is very useful information (and good news!), thank you for the detailed updated! It still seems like the option...
[14:10:43] <wikibugs_>	 (03Merged) 10jenkins-bot: Depool poolcounter1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428924 (https://phabricator.wikimedia.org/T193025) (owner: 10Alexandros Kosiaris)
[14:11:31] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157706 (10dcausse) >>! In T189137#4157685, @mobrovac wrote: >>>! In T189137#4157665, @dcausse wrote: >> I don't have strong opinions on which wikis we sho...
[14:12:05] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 032] Assign role spare to poolcounter1002 [puppet] - 10https://gerrit.wikimedia.org/r/428925 (https://phabricator.wikimedia.org/T193025) (owner: 10Alexandros Kosiaris)
[14:12:16] <wikibugs_>	 (03PS1) 10Ottomata: Add IPv6 entries for kafka[12]00[123] [dns] - 10https://gerrit.wikimedia.org/r/428926 (https://phabricator.wikimedia.org/T192832)
[14:12:47] <logmsgbot>	 !log akosiaris@tin Synchronized wmf-config/ProductionServices.php: depool poolcounter1002 T193025 (duration: 01m 16s)
[14:12:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:53] <stashbot>	 T193025: Decommision poolcounter1002 - https://phabricator.wikimedia.org/T193025
[14:13:12] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157711 (10mobrovac) Given the numbers above, going with everything but enwiki, wikidata and commons should be a good next round.
[14:14:19] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157713 (10Pchelolo) > When we freeze writes we start to push ElasticaWrite jobs that contain the full page doc which can be relatively large. We had to ra...
[14:16:16] <wikibugs_>	 10Operations, 10ops-eqiad, 10Patch-For-Review: Decommision poolcounter1002 - https://phabricator.wikimedia.org/T193025#4157673 (10akosiaris)
[14:16:58] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4032605 (10Ottomata) I already feel like 4Mb messages are a lot, and would much prefer not to increase the max message size more.  Can these jobs be split up?
[14:17:21] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157722 (10dcausse) >>! In T189137#4157713, @Pchelolo wrote: >> When we freeze writes we start to push ElasticaWrite jobs that contain the full page doc wh...
[14:25:27] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "Depool poolcounter1001" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428896 (https://phabricator.wikimedia.org/T150532) (owner: 10Alexandros Kosiaris)
[14:25:33] <wikibugs_>	 (03CR) 10jenkins-bot: Add poolcounter1003 to $wmfAllServices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428897 (https://phabricator.wikimedia.org/T187297) (owner: 10Alexandros Kosiaris)
[14:25:38] <wikibugs_>	 (03CR) 10jenkins-bot: Depool poolcounter1002 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428924 (https://phabricator.wikimedia.org/T193025) (owner: 10Alexandros Kosiaris)
[14:26:23] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157730 (10Gehel) >>! In T189137#4157706, @dcausse wrote: >>>! In T189137#4157685, @mobrovac wrote: >>>>! In T189137#4157665, @dcausse wrote: >>> My sole c...
[14:26:27] <wikibugs_>	 (03PS1) 10Ottomata: Add add_ip6_mapped to main-codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/428928 (https://phabricator.wikimedia.org/T192832)
[14:26:45] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157731 (10Pchelolo) > If there is a way to monitor such errors I guess we can pick-up known large pages and modify them while the write are frozen?  There...
[14:26:58] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add add_ip6_mapped to main-codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/428928 (https://phabricator.wikimedia.org/T192832) (owner: 10Ottomata)
[14:32:59] <ema>	 !log cp3030: upgrade varnish to 5.1.3-1wm7 T192368
[14:33:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:06] <stashbot>	 T192368: Unconditional return(deliver) in vcl_hit - https://phabricator.wikimedia.org/T192368
[14:34:12] <akosiaris>	 !log reboot bohrium T150532
[14:34:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:34:19] <stashbot>	 T150532: Upgrade qemu on ganeti clusters to 2.8 - https://phabricator.wikimedia.org/T150532
[14:34:57] <wikibugs_>	 10Operations, 10Patch-For-Review: Upgrade qemu on ganeti clusters to 2.8 - https://phabricator.wikimedia.org/T150532#4157757 (10akosiaris) 05Open>03Resolved a:03akosiaris And we are at qemu 2.8 and this can finally be closed.
[14:35:12] <wikibugs_>	 (03PS1) 10Jcrespo: standard_packages: Remove atop for every WMF machine [puppet] - 10https://gerrit.wikimedia.org/r/428930 (https://phabricator.wikimedia.org/T192551)
[14:35:24] <jynus>	 is mediawiki deploy free again?
[14:35:33] <akosiaris>	 yes
[14:35:41] <wikibugs_>	 10Operations, 10vm-requests, 10Patch-For-Review: Site: 4 VM request for pdf-render/proton - https://phabricator.wikimedia.org/T192983#4157761 (10akosiaris) p:05Triage>03Normal
[14:36:57] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4157762 (10jcrespo) a:05jcrespo>03faidon Created T192551, because as I said, the problem was not technical.
[14:36:59] <elukey>	 !log restart hive-server2 on analytics1003 to pick up settings in https://gerrit.wikimedia.org/r/428919
[14:37:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:37:20] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4157767 (10jcrespo) I meant https://gerrit.wikimedia.org/r/428930
[14:39:19] <wikibugs_>	 (03CR) 10Faidon Liambotis: [C: 031] standard_packages: Remove atop for every WMF machine [puppet] - 10https://gerrit.wikimedia.org/r/428930 (https://phabricator.wikimedia.org/T192551) (owner: 10Jcrespo)
[14:39:37] <wikibugs_>	 (03CR) 10Filippo Giunchedi: "> Patch Set 1:" (035 comments) [debs/prometheus-mcrouter-exporter] - 10https://gerrit.wikimedia.org/r/428920 (owner: 10Filippo Giunchedi)
[14:39:53] <wikibugs_>	 (03PS2) 10Filippo Giunchedi: Initial debianization [debs/prometheus-mcrouter-exporter] - 10https://gerrit.wikimedia.org/r/428920 (https://phabricator.wikimedia.org/T192763)
[14:40:33] <wikibugs_>	 (03PS1) 10Elukey: cassandra: add percentile metrics to 2.2's prometheus jmx config [puppet] - 10https://gerrit.wikimedia.org/r/428931 (https://phabricator.wikimedia.org/T193017)
[14:41:43] <wikibugs_>	 (03CR) 10Imarlier: graphite: allow data requests from performance.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/428836 (https://phabricator.wikimedia.org/T191994) (owner: 10Imarlier)
[14:42:41] <wikibugs_>	 (03PS2) 10Elukey: cassandra: add percentile metrics to 2.x's prometheus jmx config [puppet] - 10https://gerrit.wikimedia.org/r/428931 (https://phabricator.wikimedia.org/T193017)
[14:46:07] <wikibugs_>	 (03CR) 10Muehlenhoff: standard_packages: Remove atop for every WMF machine (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/428930 (https://phabricator.wikimedia.org/T192551) (owner: 10Jcrespo)
[14:48:57] <wikibugs_>	 (03PS2) 10Muehlenhoff: Switch scap proxy in B6 to mw1285 [puppet] - 10https://gerrit.wikimedia.org/r/428683
[14:49:37] <wikibugs_>	 (03PS1) 10Cmjohnson: Removing db1039 site.pp entry [puppet] - 10https://gerrit.wikimedia.org/r/428932 (https://phabricator.wikimedia.org/T184262)
[14:50:10] <wikibugs_>	 (03CR) 10Cmjohnson: [C: 032] Removing db1039 site.pp entry [puppet] - 10https://gerrit.wikimedia.org/r/428932 (https://phabricator.wikimedia.org/T184262) (owner: 10Cmjohnson)
[14:51:19] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Switch scap proxy in B6 to mw1285 [puppet] - 10https://gerrit.wikimedia.org/r/428683 (owner: 10Muehlenhoff)
[14:51:24] <wikibugs_>	 (03PS3) 10Muehlenhoff: Switch scap proxy in B6 to mw1285 [puppet] - 10https://gerrit.wikimedia.org/r/428683
[14:52:23] <wikibugs_>	 (03PS2) 10Jcrespo: standard_packages: Remove atop from every WMF machine [puppet] - 10https://gerrit.wikimedia.org/r/428930 (https://phabricator.wikimedia.org/T192551)
[14:52:43] <wikibugs_>	 (03CR) 10Jcrespo: standard_packages: Remove atop from every WMF machine (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/428930 (https://phabricator.wikimedia.org/T192551) (owner: 10Jcrespo)
[14:53:13] <wikibugs_>	 (03PS1) 10Ottomata: Blacklist job and change-prop topics from lag check for main -> analytics [puppet] - 10https://gerrit.wikimedia.org/r/428933
[14:53:29] <wikibugs_>	 (03PS2) 10Ottomata: Blacklist job and change-prop topics from lag check for main -> analytics [puppet] - 10https://gerrit.wikimedia.org/r/428933
[14:53:37] <wikibugs_>	 (03CR) 10Marostegui: [C: 031] standard_packages: Remove atop from every WMF machine [puppet] - 10https://gerrit.wikimedia.org/r/428930 (https://phabricator.wikimedia.org/T192551) (owner: 10Jcrespo)
[14:54:35] <wikibugs_>	 (03CR) 10Ottomata: [C: 032] Blacklist job and change-prop topics from lag check for main -> analytics [puppet] - 10https://gerrit.wikimedia.org/r/428933 (owner: 10Ottomata)
[14:57:34] <wikibugs_>	 10Operations, 10ops-eqiad, 10hardware-requests, 10Patch-For-Review: Decommision poolcounter1002 - https://phabricator.wikimedia.org/T193025#4157805 (10akosiaris)
[14:58:53] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157809 (10Pchelolo) I've run some analysis on the logs and indeed sometimes the `cirrusSearchElasticWrite` is too large. Here're the sizes in bytes for al...
[15:00:01] <wikibugs_>	 (03PS1) 10Muehlenhoff: Switch scap proxy for D5 to mw1251 [puppet] - 10https://gerrit.wikimedia.org/r/428934
[15:00:14] <wikibugs_>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/428931 (https://phabricator.wikimedia.org/T193017) (owner: 10Elukey)
[15:02:25] <wikibugs_>	 (03PS1) 10Andrew Bogott: nova: Add labvirt1016 to the scheduling pool [puppet] - 10https://gerrit.wikimedia.org/r/428935
[15:02:27] <wikibugs_>	 (03PS1) 10Andrew Bogott: nova: repool labvirt1015 [puppet] - 10https://gerrit.wikimedia.org/r/428936 (https://phabricator.wikimedia.org/T192422)
[15:03:11] <wikibugs_>	 (03PS2) 10Ottomata: Add add_ip6_mapped to main-codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/428928 (https://phabricator.wikimedia.org/T192832)
[15:03:37] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Add add_ip6_mapped to main-codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/428928 (https://phabricator.wikimedia.org/T192832) (owner: 10Ottomata)
[15:04:02] <wikibugs_>	 (03PS2) 10Andrew Bogott: nova: Add labvirt1016 to the scheduling pool [puppet] - 10https://gerrit.wikimedia.org/r/428935
[15:04:07] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1971 bytes in 0.079 second response time
[15:04:17] <andrewbogott>	 !log adding labvirt1016 to the nova-compute scheduling pool
[15:04:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:50] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] nova: Add labvirt1016 to the scheduling pool [puppet] - 10https://gerrit.wikimedia.org/r/428935 (owner: 10Andrew Bogott)
[15:05:23] <ottomata>	 !log temp disabling puppet, applying ipv6 mapped on kafka200*
[15:05:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:05:43] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] Add add_ip6_mapped to main-codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/428928 (https://phabricator.wikimedia.org/T192832) (owner: 10Ottomata)
[15:06:00] <wikibugs_>	 (03PS3) 10Ottomata: Add add_ip6_mapped to main-codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/428928 (https://phabricator.wikimedia.org/T192832)
[15:06:02] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] Add add_ip6_mapped to main-codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/428928 (https://phabricator.wikimedia.org/T192832) (owner: 10Ottomata)
[15:12:01] <wikibugs_>	 10Operations, 10monitoring, 10Patch-For-Review, 10Upstream: atop on stretch overloading a host - https://phabricator.wikimedia.org/T192551#4157908 (10Dzahn) >>! In T192551#4150850, @Dzahn wrote: > +1 to remove the daemon/cron, keeping the package itself.  I only said to keep the package out of a similar mo...
[15:13:40] <wikibugs_>	 (03PS2) 10Andrew Bogott: nova: repool labvirt1015 [puppet] - 10https://gerrit.wikimedia.org/r/428936 (https://phabricator.wikimedia.org/T192422)
[15:13:51] <wikibugs_>	 (03PS3) 10Elukey: cassandra: add percentile metrics to 2.x's prometheus jmx config [puppet] - 10https://gerrit.wikimedia.org/r/428931 (https://phabricator.wikimedia.org/T193017)
[15:14:18] <wikibugs_>	 (03CR) 10Andrew Bogott: [C: 032] nova: repool labvirt1015 [puppet] - 10https://gerrit.wikimedia.org/r/428936 (https://phabricator.wikimedia.org/T192422) (owner: 10Andrew Bogott)
[15:14:37] <wikibugs_>	 (03PS4) 10Elukey: cassandra: add percentile metrics to 2.x's prometheus jmx config [puppet] - 10https://gerrit.wikimedia.org/r/428931 (https://phabricator.wikimedia.org/T193017)
[15:14:56] <elukey>	 gehel: o/
[15:15:07] <gehel>	 elukey: \o
[15:15:30] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4157931 (10Ottomata) I don't love it!  I feel like 4Mb is already huge.  Consider troubleshooting some problem with `kafkacat -C | jq .`.  Gotta consume a...
[15:15:50] <elukey>	 gehel: do you have anything against https://gerrit.wikimedia.org/r/428931 ?
[15:15:59] <elukey>	 not sure if you guys are using the jmx exporter for cassandra
[15:16:02] <elukey>	 in the maps cluster
[15:16:55] <wikibugs_>	 (03CR) 10Gehel: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/428931 (https://phabricator.wikimedia.org/T193017) (owner: 10Elukey)
[15:17:27] <gehel>	 elukey: not sure I parsed that regex correctly...
[15:17:53] <gehel>	 elukey: I don't look at those metrics as much as I should, so feel free to break whatever is on the maps side, and I'll fix it when I need it
[15:17:57] <gehel>	 s/when/if/
[15:19:56] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1001 is CRITICAL: 5.179e+05 ge 5e+05 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=main-eqiad&var-kafka_broker=kafka1001
[15:19:57] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1003 is CRITICAL: 5.067e+05 ge 5e+05 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=main-eqiad&var-kafka_broker=kafka1003
[15:20:26] <elukey>	 gehel: sorry I missed to add some context :) I am working on finding a way to use the same dashboards for all the cassandra clusters, since 3.x changed metric names (Sigh). The one that I am adding is a copy from the 3.x one, that wasn't "backported" afaict
[15:20:56] <elukey>	 task is T193017
[15:20:56] <stashbot>	 T193017: Unify, if possible, AQS and Restbase's cassandra dashboards - https://phabricator.wikimedia.org/T193017
[15:21:11] <ottomata>	 ^^^ ? looking
[15:21:15] <ottomata>	 1001?
[15:21:22] <ottomata>	 mabye more elasticawrite...?
[15:21:40] <gehel>	 ottomata: might be related to the cluster restart in progress...
[15:22:21] <anomie>	 !log Running populateRevisionLength.php on group 0 for T192189
[15:22:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:22:27] <stashbot>	 T192189: RevisionArchiveRecord incorrectly changes null ar_len to 0 - https://phabricator.wikimedia.org/T192189
[15:22:34] <wikibugs_>	 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 5 others: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137#4158003 (10Pchelolo) > Consider troubleshooting some problem with kafkacat -C | jq .  Haha :)  > That said, I'm not opposed, as I don't know of any practic...
[15:22:37] <ottomata>	 gehel:  https://grafana-admin.wikimedia.org/dashboard/db/kafka?refresh=5m&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-kafka_cluster=main-eqiad&var-cluster=eventbus&var-kafka_broker=All&from=now-6h&to=now
[15:22:40] <ottomata>	 i think we have to fix this
[15:22:53] <ottomata>	 when this happens, there is a huge jump in large messages
[15:23:35] <gehel>	 ottomata: looks like it  matches with https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&from=now-6h&to=now&var-cluster=codfw&var-smoothing=1&panelId=64&fullscreen&refresh=1m
[15:24:12] <gehel>	 so yes, elasticsearch is probably the culprit (cc: ebernhardson dcausse)
[15:24:23] <wikibugs_>	 (03PS2) 10Jcrespo: mariadb: Repool with low load db1090, db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428913 (https://phabricator.wikimedia.org/T192979)
[15:24:33] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] mariadb: Repool with low load db1090, db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428913 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[15:25:09] <gehel>	 ottomata: that's the first time I see this issue during an elasticsearch cluster restart (I might just have missed it all other times)
[15:25:57] <ottomata>	 gehel:  not sure if it is always restart, but we've seen really bursty message sizes from elasticwrite over the last few days-weekish
[15:26:15] <wikibugs_>	 (03Merged) 10jenkins-bot: mariadb: Repool with low load db1090, db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428913 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[15:26:42] <gehel>	 ottomata: we're going to need input from dcausse / ebernhardson on this
[15:26:46] <ottomata>	 aye
[15:27:01] <ottomata>	 i think they'we are already kinda talking about it https://phabricator.wikimedia.org/T189137#4157731
[15:27:11] <ottomata>	 it isn't super urgent, but will likely keep making alerts flap
[15:27:22] <ottomata>	 (oh you are on that ticket too)
[15:27:23] <ottomata>	 :)
[15:27:35] <elukey>	 gehel: shall I disable puppet on maps* and then let you run/check in there first?
[15:27:46] <gehel>	 yep, seems at least related
[15:28:21] <gehel>	 elukey: please do! In a meeting, but I'll check / re-enable soon
[15:28:41] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Pool db1122, db1090 with low load (duration: 01m 14s)
[15:28:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:30:10] <wikibugs_>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4158048 (10Cmjohnson)
[15:30:55] <wikibugs_>	 (03CR) 10jenkins-bot: mariadb: Repool with low load db1090, db1122 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428913 (https://phabricator.wikimedia.org/T192979) (owner: 10Jcrespo)
[15:31:18] <wikibugs_>	 10Operations, 10DBA, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4116638 (10Cmjohnson) a:05Cmjohnson>03Marostegui @marostegui db1120 is fixed, i had the ethernet cable in the wrong port :-(.   Assigning to you and removing ops-eqiad tag
[15:33:11] <wikibugs_>	 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4158084 (10Marostegui)
[15:33:14] <wikibugs_>	 10Operations, 10DBA, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4158081 (10Marostegui) 05Open>03Resolved a:05Marostegui>03Cmjohnson Confirmed db1120 looks good! Thanks @Cmjohnson!
[15:33:29] <wikibugs_>	 10Operations, 10DBA, 10Patch-For-Review: Rack and setup db1116 - db1123 - https://phabricator.wikimedia.org/T191792#4158086 (10Marostegui)
[15:35:15] <wikibugs_>	 10Operations, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Create a prometheus exporter for mcrouter - https://phabricator.wikimedia.org/T192763#4158093 (10fgiunchedi) I sent some changes upstream that I think would be beneficial, https://github.com/Dev2...
[15:36:25] <wikibugs_>	 (03CR) 10Andrew Bogott: "Reedy suggests that we might still need libvips-tools -- that include should probably be moved elsewhere" [puppet] - 10https://gerrit.wikimedia.org/r/428298 (owner: 10Muehlenhoff)
[15:38:47] <wikibugs_>	 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158122 (10Cmjohnson)
[15:47:35] <wikibugs_>	 (03PS1) 10Jcrespo: flaggedreviews-maintenance: Avoid cronspam by sending error output to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/428942 (https://phabricator.wikimedia.org/T192340)
[15:47:53] <wikibugs_>	 (03PS2) 10Jcrespo: flaggedreviews-maintenance: Avoid cronspam by sending error output to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/428942 (https://phabricator.wikimedia.org/T192340)
[15:48:12] <gehel>	 ottomata: I still need to continue that cluster restart, and I don't have a quick fix to not send large documents to elasticawrite... how bad is it on the kafka side?
[15:48:19] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] flaggedreviews-maintenance: Avoid cronspam by sending error output to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/428942 (https://phabricator.wikimedia.org/T192340) (owner: 10Jcrespo)
[15:49:24] <mutante>	 if there is a user on wikitech wiki and it says in the logs "has been created automatically" this means they are logging in with a SUL user, right? (they can now on wikitech?) but they don't have an LDAP user. is that right?
[15:49:44] <mutante>	 the user keeps insisting they have an LDAP user (wikitech user) but i can't find them anywhere in LDAP
[15:49:49] <wikibugs_>	 (03PS3) 10Jcrespo: flaggedreviews-maintenance: Avoid cronspam by sending error output to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/428942 (https://phabricator.wikimedia.org/T192340)
[15:51:28] <wikibugs_>	 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4158225 (10akosiaris)
[15:51:31] <wikibugs_>	 10Operations, 10vm-requests, 10Patch-For-Review: Site: 4 VM request for pdf-render/proton - https://phabricator.wikimedia.org/T192983#4158222 (10akosiaris) 05Open>03Resolved a:03akosiaris VMs are up and running, but without a role yet applied. Resolving this
[15:52:29] <wikibugs_>	 (03CR) 10Alexandros Kosiaris: [C: 031] flaggedreviews-maintenance: Avoid cronspam by sending error output to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/428942 (https://phabricator.wikimedia.org/T192340) (owner: 10Jcrespo)
[15:53:19] <wikibugs_>	 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158245 (10ayounsi)
[15:53:57] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1001 is OK: (C)5e+05 ge (W)1e+05 ge 9.166e+04 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=main-eqiad&var-kafka_broker=kafka1001
[15:54:07] <icinga-wm>	 RECOVERY - Kafka Broker Replica Max Lag on kafka1003 is OK: (C)5e+05 ge (W)1e+05 ge 9.686e+04 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=main-eqiad&var-kafka_broker=kafka1003
[15:54:48] <wikibugs_>	 (03CR) 10Jcrespo: [C: 032] flaggedreviews-maintenance: Avoid cronspam by sending error output to /dev/null [puppet] - 10https://gerrit.wikimedia.org/r/428942 (https://phabricator.wikimedia.org/T192340) (owner: 10Jcrespo)
[15:54:54] <wikibugs_>	 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158258 (10Cmjohnson)
[15:55:36] <wikibugs_>	 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4158271 (10Marostegui) >>! In T187962#4119429, @Marostegui wrote: >>>! In T187962#4119423, @jcrespo wrote: >> I would honestly move x1 replica (or the master d...
[15:57:25] <wikibugs_>	 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-c-eqiad switch stack - https://phabricator.wikimedia.org/T187962#4158278 (10jcrespo) I agree, first one will probably be a direct decommision, but next one could be used for that.
[16:00:00] <wikibugs_>	 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324#4158304 (10jcrespo)
[16:00:49] <wikibugs_>	 (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/428931 (https://phabricator.wikimedia.org/T193017) (owner: 10Elukey)
[16:01:48] <wikibugs_>	 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158327 (10ayounsi)
[16:08:02] <wikibugs_>	 (03PS1) 10Filippo Giunchedi: profile: install SMART checks after 'raid' fact is available. [puppet] - 10https://gerrit.wikimedia.org/r/428947
[16:09:14] <mutante>	 !log re-imaging mw2258, mw2163, mw2164 ff.
[16:09:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:08] <wikibugs_>	 (03PS2) 10Dzahn: admins: add arlolra, cscott to releasers-parsoid [puppet] - 10https://gerrit.wikimedia.org/r/427954 (https://phabricator.wikimedia.org/T192684)
[16:13:52] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] "as requested by subbu" [puppet] - 10https://gerrit.wikimedia.org/r/427954 (https://phabricator.wikimedia.org/T192684) (owner: 10Dzahn)
[16:19:38] <wikibugs_>	 (03PS2) 10Filippo Giunchedi: profile: install SMART checks after 'raid' fact is available. [puppet] - 10https://gerrit.wikimedia.org/r/428947 (https://phabricator.wikimedia.org/T132324)
[16:20:00] <wikibugs_>	 (03CR) 10BryanDavis: Don't include mediawiki::multimedia on labweb* (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/428298 (owner: 10Muehlenhoff)
[16:20:05] <wikibugs_>	 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: add arlo and scott to parsoid releasers admin group - https://phabricator.wikimedia.org/T192684#4158416 (10Dzahn) [releases1001:~] $ id arlolra uid=3381(arlolra) gid=500(wikidev) groups=500(wikidev),802(releasers-parsoid) [releases1001:~] $ id cscott u...
[16:20:30] <wikibugs_>	 10Operations, 10Parsoid, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide an archive endpoint for older Parsoid debs (on releases.wikimedia.org or elsewhere) - https://phabricator.wikimedia.org/T150672#4158420 (10Dzahn)
[16:20:34] <wikibugs_>	 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: add arlo and scott to parsoid releasers admin group - https://phabricator.wikimedia.org/T192684#4158417 (10Dzahn) 05Open>03Resolved
[16:20:49] <wikibugs_>	 (03PS1) 10Ladsgroup: Change fawiki's uca to the right one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428951
[16:20:59] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Change fawiki's uca to the right one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428951 (owner: 10Ladsgroup)
[16:21:25] <wikibugs_>	 10Operations, 10Parsoid, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External): Provide an archive endpoint for older Parsoid debs (on releases.wikimedia.org or elsewhere) - https://phabricator.wikimedia.org/T150672#2792988 (10Dzahn) > 12:41 < subbu> could you add arlo and scott to that grou...
[16:21:53] <wikibugs_>	 (03PS4) 10Dzahn: admins: create shell account for mepps, add to analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/427944 (https://phabricator.wikimedia.org/T192472)
[16:24:04] <wikibugs_>	 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Upgrade mw* servers to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T174431#3561778 (10fgiunchedi) While investigating cronspam from recent reimages I took a look at mw1247 (for example) and noticed it has two disks but no software...
[16:24:32] <godog>	 mutante moritzm ^ FYI
[16:27:20] <wikibugs_>	 (03PS1) 10Urbanecm: Add all Hindi projects as import sources for hiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428952 (https://phabricator.wikimedia.org/T188366)
[16:29:18] <wikibugs_>	 (03CR) 10Filippo Giunchedi: "This isn't yielding the result I want according to PCC, https://puppet-compiler.wmflabs.org/compiler02/11036/" [puppet] - 10https://gerrit.wikimedia.org/r/428947 (https://phabricator.wikimedia.org/T132324) (owner: 10Filippo Giunchedi)
[16:30:39] <mutante>	 godog: thanks! i'll check the ones i reinstalled.  first one i have says "# / was on /dev/md1 during installation
[16:32:25] <mutante>	 sda (sda1, sda2), sdb (sdb1, sdb2), md0, md1    are all in /proc/partitions
[16:32:44] <godog>	 mutante: ack, thanks! yeah only some hosts are affected iirc, there's a list in the task I linked
[16:33:17] <mutante>	 yep, i saw the list. i'll just go through the ones i reinstalled and check 
[16:34:10] <godog>	 sweet
[16:35:14] <wikibugs_>	 (03PS2) 10Ladsgroup: Change fawiki's uca to the right one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428951
[16:36:23] <wikibugs_>	 (03PS5) 10Elukey: cassandra: add percentile metrics to 2.x's prometheus jmx config [puppet] - 10https://gerrit.wikimedia.org/r/428931 (https://phabricator.wikimedia.org/T193017)
[16:36:51] <wikibugs_>	 (03CR) 10Elukey: [C: 032] cassandra: add percentile metrics to 2.x's prometheus jmx config [puppet] - 10https://gerrit.wikimedia.org/r/428931 (https://phabricator.wikimedia.org/T193017) (owner: 10Elukey)
[16:39:57] <wikibugs_>	 (03PS5) 10Dzahn: admins: create shell account for mepps, add to analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/427944 (https://phabricator.wikimedia.org/T192472)
[16:40:45] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] admins: create shell account for mepps, add to analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/427944 (https://phabricator.wikimedia.org/T192472) (owner: 10Dzahn)
[16:41:19] <wikibugs_>	 (03PS1) 10Urbanecm: Fix pixelization of new wiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428953 (https://phabricator.wikimedia.org/T193028)
[16:44:15] <elukey>	 gehel: merge done, aqs looks good from what I can see, I'll let you do maps or do you want me to ?
[16:45:42] <wikibugs_>	 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to analytics servers for mepps - https://phabricator.wikimedia.org/T192472#4158586 (10Dzahn) Hi @mepps Your user has been created now and you are in the requested group.  On one of the bastion hosts:  [bast1002:~] $ id mepps uid=16947...
[16:45:54] <wikibugs_>	 (03PS2) 10Urbanecm: Add all Hindi projects plus meta as import sources for hiwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428952 (https://phabricator.wikimedia.org/T188366)
[16:46:00] <wikibugs_>	 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to analytics servers for mepps - https://phabricator.wikimedia.org/T192472#4158587 (10Dzahn) 05Open>03Resolved
[16:46:48] <gehel>	 elukey: I'll do map around 8pm CEST if that's early enough for you
[16:47:22] <gehel>	 elukey: and thanks for the cleanup!
[16:47:29] <wikibugs_>	 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team, 10User-Urbanecm: Requesting access to production for SWAT deploy for Urbanecm - https://phabricator.wikimedia.org/T192830#4158589 (10Dzahn)
[16:52:18] <wikibugs_>	 (03PS2) 10Muehlenhoff: Reimage mwdebug servers with stretch [puppet] - 10https://gerrit.wikimedia.org/r/428923 (https://phabricator.wikimedia.org/T174431)
[16:53:48] <icinga-wm>	 PROBLEM - Host wdqs1004 is DOWN: PING CRITICAL - Packet loss = 100%
[16:54:03] <elukey>	 gehel: good for me, I might not be around but if you ping me on hangouts I'll join
[16:56:01] <mutante>	 wdqs1004 is actually running
[16:56:04] <mutante>	 networking went down?
[16:56:21] <gehel>	 can't SSH into it, trying admin console
[16:56:39] <mutante>	 the admin console tells me it's running
[16:56:52] <mutante>	 at login
[16:57:17] <mutante>	 doesn't mean i can login though
[16:57:40] <gehel>	 mutante: since you're already there, can you check if you can login and what state the network is in?
[16:57:48] <mutante>	 gehel: i can't login
[16:57:54] <wikibugs_>	 (03CR) 10Muehlenhoff: [C: 032] Reimage mwdebug servers with stretch [puppet] - 10https://gerrit.wikimedia.org/r/428923 (https://phabricator.wikimedia.org/T174431) (owner: 10Muehlenhoff)
[16:58:08] <gehel>	 mutante: ok, not good :/    powercycle?
[16:58:16] <mutante>	 sure, cycling it
[17:00:01] <mutante>	 !log powercycling wdqs1004
[17:00:04] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Morning SWAT (Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180425T1700).
[17:00:04] <jouncebot>	 subbu and Amir1: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[17:00:05] <cmjohnson1>	 gehel: that was me
[17:00:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:00:21] <cmjohnson1>	 i am working in the rack and accidently pulled your network cable 
[17:00:42] <subbu>	 o/
[17:01:06] <gehel>	 cmjohnson1: ok, no problem! Good to know it's minor!
[17:01:07] <Amir1>	 o/
[17:02:19] <icinga-wm>	 PROBLEM - nutcracker port on mw2258 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[17:02:19] <icinga-wm>	 PROBLEM - HHVM processes on mw2258 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[17:03:31] <icinga-wm>	 ACKNOWLEDGEMENT - HHVM processes on mw2258 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. daniel_zahn reinstall
[17:03:31] <icinga-wm>	 ACKNOWLEDGEMENT - HHVM rendering on mw2258 is CRITICAL: connect to address 10.192.16.57 and port 80: Connection refused daniel_zahn reinstall
[17:03:31] <icinga-wm>	 ACKNOWLEDGEMENT - nutcracker port on mw2258 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. daniel_zahn reinstall
[17:03:31] <icinga-wm>	 ACKNOWLEDGEMENT - nutcracker process on mw2258 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. daniel_zahn reinstall
[17:03:58] <icinga-wm>	 PROBLEM - Apache HTTP on mw2163 is CRITICAL: connect to address 10.192.32.51 and port 80: Connection refused
[17:04:08] <icinga-wm>	 PROBLEM - nutcracker process on mw2165 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[17:04:08] <icinga-wm>	 PROBLEM - HHVM processes on mw2165 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[17:04:28] <icinga-wm>	 PROBLEM - nutcracker port on mw2165 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused
[17:04:40] <mutante>	 oh come on, yes
[17:04:40] <thcipriani>	 I can SWAT
[17:04:59] <icinga-wm>	 RECOVERY - HHVM processes on mw2165 is OK: PROCS OK: 6 processes with command name hhvm
[17:05:08] <icinga-wm>	 ACKNOWLEDGEMENT - Apache HTTP on mw2163 is CRITICAL: connect to address 10.192.32.51 and port 80: Connection refused daniel_zahn reinstall
[17:05:08] <icinga-wm>	 ACKNOWLEDGEMENT - Nginx local proxy to apache on mw2163 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 325 bytes in 0.167 second response time daniel_zahn reinstall
[17:05:26] <SMalyshev>	 gehel: looks like wdqs1004 is dead, could you take a look?
[17:05:34] <wikibugs_>	 (03PS2) 10Thcipriani: Enable RemexHtml on frwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427181 (https://phabricator.wikimedia.org/T192301) (owner: 10Subramanya Sastry)
[17:05:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw2165 is CRITICAL: connect to address 10.192.32.53 and port 80: Connection refused
[17:05:41] <gehel>	 SMalyshev: yep, cable issue, coming back
[17:05:50] <wikibugs_>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427181 (https://phabricator.wikimedia.org/T192301) (owner: 10Subramanya Sastry)
[17:06:04] <SMalyshev>	 gehel: cool, thanks!
[17:06:31] <gehel>	 cmjohnson1: can you ping me when the cable is back in place so I can check things work fine?
[17:06:34] <subbu>	 Amir1, or anyone else swatting?
[17:06:51] <Amir1>	 thcipriani: is SWATing
[17:06:52] <cmjohnson1>	 gehel it's been back in place..I immediately put it back as soon as I realized my mistake
[17:06:56] <cmjohnson1>	 let me check it
[17:07:00] <Amir1>	 mine is not testable 
[17:07:06] <gehel>	 cmjohnson1: I still can't SSH...
[17:07:10] <Amir1>	 we need to run updateCollation though 
[17:07:13] <wikibugs_>	 (03Merged) 10jenkins-bot: Enable RemexHtml on frwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427181 (https://phabricator.wikimedia.org/T192301) (owner: 10Subramanya Sastry)
[17:07:50] <wikibugs_>	 (03PS3) 10Muehlenhoff: Don't include mediawiki::multimedia on labweb* [puppet] - 10https://gerrit.wikimedia.org/r/428298
[17:07:57] <thcipriani>	 subbu: RemexHtml on frwikiquote is on mwdebug1002, check please
[17:07:59] <icinga-wm>	 RECOVERY - Apache HTTP on mw2163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.154 second response time
[17:08:12] <subbu>	 thanks. testing.
[17:08:30] <wikibugs_>	 (03CR) 10Krinkle: [C: 031] graphite: allow data requests from performance.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/428836 (https://phabricator.wikimedia.org/T191994) (owner: 10Imarlier)
[17:08:40] <subbu>	 thcipriani, looks good.
[17:09:14] <thcipriani>	 subbu: okie doke, going live
[17:09:25] <subbu>	 k
[17:09:51] <wikibugs_>	 (03PS2) 10Arturo Borrero Gonzalez: labs_bootstrapvz: remove /var/lib/puppet/ssl in firstboot.sh script [puppet] - 10https://gerrit.wikimedia.org/r/428694 (https://phabricator.wikimedia.org/T181523)
[17:11:08] <icinga-wm>	 RECOVERY - nutcracker process on mw2165 is OK: PROCS OK: 1 process with UID = 113 (nutcracker), command name nutcracker
[17:11:08] <icinga-wm>	 RECOVERY - Host wdqs1004 is UP: PING OK - Packet loss = 0%, RTA = 0.30 ms
[17:11:17] <gehel>	 cmjohnson1: ok, wdqs1004 is back, I can SSH
[17:11:28] <icinga-wm>	 RECOVERY - nutcracker port on mw2165 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[17:11:29] <cmjohnson1>	 sorry about that
[17:11:30] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:427181|Enable RemexHtml on frwikiquote]] T192301 (duration: 01m 17s)
[17:11:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:37] <stashbot>	 T192301: Enable RemexHTML on frwikiquote - https://phabricator.wikimedia.org/T192301
[17:11:42] <thcipriani>	 subbu: ^ live everywhere
[17:11:49] <icinga-wm>	 RECOVERY - HHVM rendering on mw2165 is OK: HTTP OK: HTTP/1.1 200 OK - 73606 bytes in 8.565 second response time
[17:11:50] <subbu>	 \o/ k
[17:11:53] <subbu>	 thakns.
[17:11:55] <subbu>	 *thanks
[17:12:04] <wikibugs_>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428951 (owner: 10Ladsgroup)
[17:12:35] <wikibugs_>	 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4158687 (10Cmjohnson)
[17:12:53] <thcipriani>	 Amir1: are you going to run updateCollation after your change is deployed? Or do you need me to?
[17:13:10] <wikibugs_>	 (03CR) 10jenkins-bot: Enable RemexHtml on frwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427181 (https://phabricator.wikimedia.org/T192301) (owner: 10Subramanya Sastry)
[17:13:17] <Amir1>	 thcipriani: I will do it, probably tomorrow
[17:13:28] <icinga-wm>	 PROBLEM - WDQS HTTP on wdqs1004 is CRITICAL: connect to address 10.64.0.17 and port 80: Connection refused
[17:13:28] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1004 is CRITICAL: connect to address 10.64.0.17 and port 80: Connection refused
[17:13:29] <icinga-wm>	 PROBLEM - WDQS HTTP Port on wdqs1004 is CRITICAL: connect to address 127.0.0.1 and port 80: Connection refused
[17:13:29] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on wdqs1004 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[17:13:29] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[17:13:33] <thcipriani>	 Amir1: ok
[17:14:09] <gehel>	 ^ wdqs1004 is just getting back up, should be good in a minute, but I'm checking
[17:14:26] <wikibugs_>	 (03CR) 10Thcipriani: Change fawiki's uca to the right one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428951 (owner: 10Ladsgroup)
[17:14:28] <icinga-wm>	 PROBLEM - puppet last run on wdqs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:14:29] <wikibugs_>	 (03PS3) 10Thcipriani: Change fawiki's uca to the right one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428951 (owner: 10Ladsgroup)
[17:14:37] <wikibugs_>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428951 (owner: 10Ladsgroup)
[17:15:28] <icinga-wm>	 RECOVERY - WDQS HTTP on wdqs1004 is OK: HTTP OK: HTTP/1.1 200 OK - 16541 bytes in 0.001 second response time
[17:15:28] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1004 is OK: HTTP OK: HTTP/1.1 200 OK - 16541 bytes in 0.001 second response time
[17:15:29] <icinga-wm>	 RECOVERY - WDQS HTTP Port on wdqs1004 is OK: HTTP OK: HTTP/1.1 200 OK - 434 bytes in 0.023 second response time
[17:15:55] <wikibugs_>	 (03Merged) 10jenkins-bot: Change fawiki's uca to the right one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428951 (owner: 10Ladsgroup)
[17:16:28] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on wdqs1004 is OK: OK ferm input default policy is set
[17:16:28] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1004 is OK: OK - running: The system is fully operational
[17:16:53] <thcipriani>	 Amir1: nothing to test, correct?
[17:17:03] <Amir1>	 yeah
[17:17:14] <Amir1>	 also tested it in other Persian Wikis before 
[17:17:54] <thcipriani>	 ok, going live
[17:18:13] <wikibugs_>	 10Operations, 10HHVM, 10Patch-For-Review, 10User-Elukey: Upgrade mw* servers to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T174431#3561778 (10Dzahn) I checked and all the mw22* are getting RAID due to this:          mw22*) echo partman/mw-raid1.cfg ;; \  But mw216* hosts like mw2163, 2...
[17:18:18] <icinga-wm>	 RECOVERY - puppet last run on wdqs1004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[17:20:04] <Amir1>	 Thanks
[17:20:08] <wikibugs_>	 (03CR) 10jenkins-bot: Change fawiki's uca to the right one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428951 (owner: 10Ladsgroup)
[17:20:38] <icinga-wm>	 RECOVERY - HHVM processes on mw2258 is OK: PROCS OK: 6 processes with command name hhvm
[17:20:42] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:428951|Change fawiki uca to the right one]] (duration: 01m 17s)
[17:20:47] <thcipriani>	 ^ Amir1 live everywhere
[17:20:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:22:17] <wikibugs_>	 (03PS1) 10Dzahn: install_server: let mw21[6-9] have software RAID [puppet] - 10https://gerrit.wikimedia.org/r/428961 (https://phabricator.wikimedia.org/T174431)
[17:22:31] <wikibugs_>	 (03PS2) 10Dzahn: install_server: let mw21[6-9] have software RAID [puppet] - 10https://gerrit.wikimedia.org/r/428961 (https://phabricator.wikimedia.org/T174431)
[17:23:55] <wikibugs_>	 (03PS3) 10Dzahn: install_server: let mw21[6-9][0-9] have software RAID [puppet] - 10https://gerrit.wikimedia.org/r/428961 (https://phabricator.wikimedia.org/T174431)
[17:30:54] <wikibugs_>	 (03CR) 10Imarlier: graphite: allow data requests from performance.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/428836 (https://phabricator.wikimedia.org/T191994) (owner: 10Imarlier)
[17:35:20] <urandom>	 !log starting cleanups on row 'a' Cassandra nodes -- T189822
[17:35:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:35:26] <stashbot>	 T189822: Replace 5 Samsung SSD 850 devices w/ 4 1.6T Intel or HP SSDs - https://phabricator.wikimedia.org/T189822
[17:37:48] <icinga-wm>	 RECOVERY - nutcracker port on mw2258 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[17:40:31] <wikibugs_>	 (03Abandoned) 10Muehlenhoff: Inline role::mediawiki::scaler [puppet] - 10https://gerrit.wikimedia.org/r/428295 (owner: 10Muehlenhoff)
[17:50:44] <Amir1>	 thcipriani: Thank you!
[17:50:52] <thcipriani>	 yw :)
[17:52:38] <wikibugs_>	 (03PS1) 10Ottomata: Blacklist job|change-prop proper mirror maker instance for main -> analytics [puppet] - 10https://gerrit.wikimedia.org/r/428962
[17:53:25] <wikibugs_>	 (03CR) 10Ottomata: [C: 032] Blacklist job|change-prop proper mirror maker instance for main -> analytics [puppet] - 10https://gerrit.wikimedia.org/r/428962 (owner: 10Ottomata)
[17:53:57] <ottomata>	 moritzm: merging your mwdebug stretch change
[17:56:23] <wikibugs_>	 (03PS1) 10Ottomata: Add add_ip6_mapped to kafka100* [puppet] - 10https://gerrit.wikimedia.org/r/428963 (https://phabricator.wikimedia.org/T192832)
[17:56:41] <wikibugs_>	 (03PS2) 10Ottomata: Add add_ip6_mapped to kafka100* [puppet] - 10https://gerrit.wikimedia.org/r/428963 (https://phabricator.wikimedia.org/T192832)
[18:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180425T1800)
[18:06:54] <wikibugs_>	 (03PS1) 10Niharika29: Graduate CodeMirror from beta on 2017 Wikitext Editor for all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428968 (https://phabricator.wikimedia.org/T191923)
[18:08:15] <wikibugs_>	 (03CR) 10Herron: [C: 032] coal: Point systemd and uwsgi config to scap-deployed version [puppet] - 10https://gerrit.wikimedia.org/r/428659 (https://phabricator.wikimedia.org/T191994) (owner: 10Imarlier)
[18:08:20] <wikibugs_>	 (03PS5) 10Herron: coal: Point systemd and uwsgi config to scap-deployed version [puppet] - 10https://gerrit.wikimedia.org/r/428659 (https://phabricator.wikimedia.org/T191994) (owner: 10Imarlier)
[18:20:31] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1003 is CRITICAL: 5.028e+05 ge 5e+05 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=main-eqiad&var-kafka_broker=kafka1003
[18:20:42] <wikibugs_>	 (03PS2) 10Niharika29: Enable CodeMirror on RTL wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428968 (https://phabricator.wikimedia.org/T191923)
[18:21:21] <icinga-wm>	 PROBLEM - Kafka Broker Replica Max Lag on kafka1001 is CRITICAL: 5.059e+05 ge 5e+05 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=main-eqiad&var-kafka_broker=kafka1001
[18:24:37] <wikibugs_>	 (03PS1) 10Ppchelko: Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327)
[18:25:25] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko)
[18:28:02] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[18:28:12] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0
[18:28:58] <wikibugs_>	 (03PS2) 10Ppchelko: Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327)
[18:30:11] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko)
[18:30:17] <wikibugs_>	 (03PS3) 10Ppchelko: Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327)
[18:31:28] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko)
[18:32:40] <wikibugs_>	 (03PS4) 10Ppchelko: Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327)
[18:33:52] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko)
[18:36:51] <wikibugs_>	 (03PS5) 10Ppchelko: Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327)
[18:41:12] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0
[18:41:22] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0
[18:44:21] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[18:44:22] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0
[18:47:13] <wikibugs_>	 (03PS3) 10Ottomata: Add add_ip6_mapped to kafka100* [puppet] - 10https://gerrit.wikimedia.org/r/428963 (https://phabricator.wikimedia.org/T192832)
[18:47:17] <wikibugs_>	 (03CR) 10Ottomata: [V: 032 C: 032] Add add_ip6_mapped to kafka100* [puppet] - 10https://gerrit.wikimedia.org/r/428963 (https://phabricator.wikimedia.org/T192832) (owner: 10Ottomata)
[18:49:18] <wikibugs_>	 (03PS6) 10Ppchelko: Disable Redis queue for most of jobs for test wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428972 (https://phabricator.wikimedia.org/T190327)
[18:49:22] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0
[18:49:31] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0
[18:52:21] <wikibugs_>	 10Operations, 10Wikispeech, 10Wikispeech-WMSE: TTS server deployment strategy - https://phabricator.wikimedia.org/T193072#4159055 (10Reedy)
[18:55:45] <logmsgbot>	 !log imarlier@tin Started deploy [performance/coal@1e79c79]: deploy fix for coal-web
[18:55:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:55:51] <logmsgbot>	 !log imarlier@tin Finished deploy [performance/coal@1e79c79]: deploy fix for coal-web (duration: 00m 06s)
[18:55:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:04] <jouncebot>	 no_justification: #bothumor I � Unicode. All rise for MediaWiki train deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180425T1900).
[19:01:28] <wikibugs_>	 (03PS1) 10Imarlier: coal: remove files that aren't needed any longer [puppet] - 10https://gerrit.wikimedia.org/r/428980 (https://phabricator.wikimedia.org/T191994)
[19:05:41] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0
[19:06:31] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[19:09:47] <logmsgbot>	 !log otto@tin Started deploy [eventlogging/eventbus@f562c1b]: Fix for logging error https://gerrit.wikimedia.org/r/#/c/428982/
[19:09:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:11:13] <logmsgbot>	 !log otto@tin Started deploy [eventlogging/eventbus@f0783bb]: Fix for logging error https://gerrit.wikimedia.org/r/#/c/428982/
[19:11:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:11:23] <logmsgbot>	 !log otto@tin Finished deploy [eventlogging/eventbus@f0783bb]: Fix for logging error https://gerrit.wikimedia.org/r/#/c/428982/ (duration: 00m 11s)
[19:11:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:12:44] <urandom>	 !log altering timeline tables for 6 month TTL -- T192689
[19:12:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:12:50] <stashbot>	 T192689: Unchecked storage growth(?) - https://phabricator.wikimedia.org/T192689
[19:15:59] <wikibugs_>	 (03CR) 10Jforrester: [C: 031] "Good to go once 1.32.0-wmf.1 is everywhere (so, Thursday evening SWAT)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428968 (https://phabricator.wikimedia.org/T191923) (owner: 10Niharika29)
[19:16:41] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0
[19:16:42] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0
[19:18:23] <marlier>	 Would anyone be able to merge https://gerrit.wikimedia.org/r/#/c/428836/ for me?
[19:20:42] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0
[19:20:51] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0
[19:21:42] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 226, down: 0, dormant: 0, excluded: 0, unused: 0
[19:21:51] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0
[19:28:54] <wikibugs_>	 (03CR) 10Ottomata: [C: 032] Add IPv6 entries for kafka[12]00[123] [dns] - 10https://gerrit.wikimedia.org/r/428926 (https://phabricator.wikimedia.org/T192832) (owner: 10Ottomata)
[19:29:43] <ottomata>	 looking :)
[19:31:10] <ottomata>	 marlier:  did you want to do what timo said and remove http support?
[19:32:00] <marlier>	 Yeah, I can pull that out.  Give me one second.
[19:33:09] <wikibugs_>	 (03PS2) 10Imarlier: graphite: allow data requests from performance.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/428836 (https://phabricator.wikimedia.org/T191994)
[19:33:24] <marlier>	 ottomata: Just pushed that up, will take a couple of minutes for the tests to run, I assume.
[19:33:50] <wikibugs_>	 (03CR) 10Niharika29: "Thanks James. It's scheduled for tomorrow evening." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428968 (https://phabricator.wikimedia.org/T191923) (owner: 10Niharika29)
[19:34:01] <wikibugs_>	 (03CR) 10Hashar: "I guess we would need to carry some other packages as well:" [puppet] - 10https://gerrit.wikimedia.org/r/428314 (owner: 10Muehlenhoff)
[19:37:17] <logmsgbot>	 !log otto@tin Started deploy [eventlogging/eventbus@f0783bb]: Fix for logging error https://gerrit.wikimedia.org/r/#/c/428982/
[19:37:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:39:03] <logmsgbot>	 !log otto@tin Finished deploy [eventlogging/eventbus@f0783bb]: Fix for logging error https://gerrit.wikimedia.org/r/#/c/428982/ (duration: 01m 45s)
[19:39:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:39:25] <wikibugs_>	 (03CR) 10Ottomata: [C: 032] graphite: allow data requests from performance.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/428836 (https://phabricator.wikimedia.org/T191994) (owner: 10Imarlier)
[19:39:41] <ottomata>	 merged marlier
[19:39:56] <marlier>	 ottomata: sweet, thanks!
[19:49:51] <wikibugs_>	 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519#4159243 (10ayounsi) a:05ayounsi>03None
[19:53:40] <wikibugs_>	 10Operations, 10hardware-requests: Eqiad: hardware request for 2 HP D3600 Enclosures - https://phabricator.wikimedia.org/T193079#4159250 (10chasemp)
[20:00:04] <jouncebot>	 cscott, arlolra, subbu, bearND, halfak, and Amir1: How many deployers does it take to do Services – Parsoid / Citoid / Mobileapps / ORES / … deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180425T2000).
[20:00:37] <subbu>	 no parsoid deploy today
[20:09:33] <wikibugs_>	 10Operations, 10ops-ulsfo, 10Traffic, 10netops: Rack/cable/configure ulsfo MX204 - https://phabricator.wikimedia.org/T189552#4159296 (10ayounsi)
[20:11:16] <awight>	 Nothing for ORES.
[20:11:36] <wikibugs_>	 (03PS1) 10Ottomata: Enable eventbus Kafka producer snappy compression [puppet] - 10https://gerrit.wikimedia.org/r/429007 (https://phabricator.wikimedia.org/T193080)
[20:11:57] <logmsgbot>	 !log bsitzmann@tin Started deploy [mobileapps/deploy@5a4a282]: Config: Start up to 4 workers in parallel during start-up
[20:12:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:17:34] <wikibugs_>	 (03PS1) 10Ayounsi: Ping offload: remove test VIP [puppet] - 10https://gerrit.wikimedia.org/r/429012 (https://phabricator.wikimedia.org/T190090)
[20:18:45] <logmsgbot>	 !log bsitzmann@tin Finished deploy [mobileapps/deploy@5a4a282]: Config: Start up to 4 workers in parallel during start-up (duration: 06m 48s)
[20:18:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:33] <wikibugs_>	 (03PS1) 10Ayounsi: Ping offload: remove test VIP from DNS [dns] - 10https://gerrit.wikimedia.org/r/429013 (https://phabricator.wikimedia.org/T190090)
[20:19:56] <wikibugs_>	 (03CR) 10Ayounsi: [C: 032] Ping offload: remove test VIP [puppet] - 10https://gerrit.wikimedia.org/r/429012 (https://phabricator.wikimedia.org/T190090) (owner: 10Ayounsi)
[20:21:30] <XioNoX>	 !log remove test VIP for eqiad ping offload server - T190090
[20:21:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:21:36] <stashbot>	 T190090: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090
[20:22:14] <wikibugs_>	 (03CR) 10Ayounsi: [C: 032] Ping offload: remove test VIP from DNS [dns] - 10https://gerrit.wikimedia.org/r/429013 (https://phabricator.wikimedia.org/T190090) (owner: 10Ayounsi)
[20:29:40] <wikibugs_>	 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090#4159322 (10ayounsi)
[20:34:00] <wikibugs_>	 (03PS1) 10MaxSem: Redeploy ArticleCreationWorkflow, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429017 (https://phabricator.wikimedia.org/T192455)
[20:34:24] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] Redeploy ArticleCreationWorkflow, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429017 (https://phabricator.wikimedia.org/T192455) (owner: 10MaxSem)
[20:41:46] <wikibugs_>	 (03PS1) 10Chad: group1 to 1.32.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429096
[20:42:47] <wikibugs_>	 10Operations, 10Cloud-VPS, 10hardware-requests: Codfw: (1) hardware access request for labtestnet1001 refresh - https://phabricator.wikimedia.org/T193081#4159335 (10chasemp) p:05Triage>03Normal
[20:44:04] <wikibugs_>	 10Operations, 10Cloud-VPS, 10hardware-requests: eqiad: (2) systems for labvirt expansion (labvirt1023 & labvirt1024) - https://phabricator.wikimedia.org/T192119#4159357 (10chasemp) p:05Triage>03Normal a:03RobH
[20:53:34] <wikibugs_>	 (03PS2) 10MaxSem: Redeploy ArticleCreationWorkflow, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429017 (https://phabricator.wikimedia.org/T192455)
[20:53:50] <mutante>	 Krinkle: i just saw you edited the commit message on https://gerrit.wikimedia.org/r/#/c/392030/ a couple days ago. does this mean we are ready for that though?
[20:55:48] <wikibugs_>	 (03PS1) 10Ayounsi: Assign IP for ping2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/429099 (https://phabricator.wikimedia.org/T190090)
[20:57:54] <wikibugs_>	 (03CR) 10Chad: [C: 032] group1 to 1.32.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429096 (owner: 10Chad)
[20:57:58] <wikibugs_>	 (03PS1) 10MaxSem: Redeploy ArticleCreationWorkflow, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429100 (https://phabricator.wikimedia.org/T192455)
[20:58:33] <hasharAway>	 !log on tin: rebased php-1.31.0-wmf.30 for https://gerrit.wikimedia.org/r/#/c/429018/
[20:58:35] <wikibugs_>	 (03CR) 10Ayounsi: [C: 032] Assign IP for ping2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/429099 (https://phabricator.wikimedia.org/T190090) (owner: 10Ayounsi)
[20:58:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:59:21] <wikibugs_>	 (03Merged) 10jenkins-bot: group1 to 1.32.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429096 (owner: 10Chad)
[20:59:45] <wikibugs_>	 (03PS1) 10Eevans: cassandra: increase `vm.max_map_count` to 1048575 [puppet] - 10https://gerrit.wikimedia.org/r/429101 (https://phabricator.wikimedia.org/T193083)
[21:00:37] <wikibugs_>	 (03PS2) 10Eevans: cassandra: increase `vm.max_map_count` to 1048575 [puppet] - 10https://gerrit.wikimedia.org/r/429101 (https://phabricator.wikimedia.org/T193083)
[21:00:42] <wikibugs_>	 (03CR) 10jenkins-bot: group1 to 1.32.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429096 (owner: 10Chad)
[21:01:17] <logmsgbot>	 !log demon@tin Synchronized php: symlink bump (duration: 01m 16s)
[21:01:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:03:55] <logmsgbot>	 !log demon@tin rebuilt and synchronized wikiversions files: group1 to wmf.1
[21:04:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:05:07] <wikibugs_>	 (03CR) 10Eevans: "I'm not certain this is needed on the other clusters, *BUT*, I'm also not certain that would hurt to be applied on the other clusters.  Be" [puppet] - 10https://gerrit.wikimedia.org/r/429101 (https://phabricator.wikimedia.org/T193083) (owner: 10Eevans)
[21:06:55] <Krinkle>	 mutante: No, it is not ready.
[21:06:58] <Krinkle>	 still -1
[21:07:34] <Krinkle>	 we'll probably do it a different way, and move navtiming/coal separately, and also remove it from hafnium at the same time.
[21:08:11] <Krinkle>	 also, possibly moving to webperf1001 before adding webperf1002 to make sure everything is clean and fine on the new host, separetly from multi-dc concerns.
[21:12:32] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2163 is CRITICAL: Host mw2163 is not in mediawiki-installation dsh group
[21:12:48] <wikibugs_>	 (03CR) 10Eevans: "PC output: http://puppet-compiler.wmflabs.org/11039" [puppet] - 10https://gerrit.wikimedia.org/r/429101 (https://phabricator.wikimedia.org/T193083) (owner: 10Eevans)
[21:15:31] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2164 is CRITICAL: Host mw2164 is not in mediawiki-installation dsh group
[21:16:59] <mutante>	 Krinkle: ok, thanks. it sounds like i should just abandon it then.
[21:17:21] <mutante>	 mw2163 and mw2164 will be reinstalled a second time in order to get RAID they should have
[21:17:21] <Krinkle>	 mutante: That's fine yeah, we might re-open it at some point, or use it as starting point.
[21:17:30] <Krinkle>	 Thanks though!
[21:18:06] <wikibugs_>	 (03Abandoned) 10Dzahn: webperf1001/2001 start using webperf role [puppet] - 10https://gerrit.wikimedia.org/r/392030 (https://phabricator.wikimedia.org/T186774) (owner: 10Dzahn)
[21:18:26] <mutante>	 you're welcome, no worries
[21:19:31] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2165 is CRITICAL: Host mw2165 is not in mediawiki-installation dsh group
[21:19:47] <wikibugs_>	 (03PS1) 10Ayounsi: Ping offload, dhcp, partman and puppet for ping2001 [puppet] - 10https://gerrit.wikimedia.org/r/429106 (https://phabricator.wikimedia.org/T190090)
[21:20:57] <wikibugs_>	 (03CR) 10Ayounsi: [C: 032] Ping offload, dhcp, partman and puppet for ping2001 [puppet] - 10https://gerrit.wikimedia.org/r/429106 (https://phabricator.wikimedia.org/T190090) (owner: 10Ayounsi)
[21:33:00] <wikibugs_>	 (03CR) 10Imarlier: "@Aaron Still worth looking at this?" [debs/dynomite] - 10https://gerrit.wikimedia.org/r/421447 (owner: 10Aaron Schulz)
[21:33:37] <wikibugs_>	 (03PS4) 10Dzahn: install_server: let mw21[6-9][0-9] have software RAID [puppet] - 10https://gerrit.wikimedia.org/r/428961 (https://phabricator.wikimedia.org/T174431)
[21:34:30] <wikibugs_>	 (03CR) 10Dzahn: [C: 032] install_server: let mw21[6-9][0-9] have software RAID [puppet] - 10https://gerrit.wikimedia.org/r/428961 (https://phabricator.wikimedia.org/T174431) (owner: 10Dzahn)
[21:54:56] <wikibugs_>	 (03PS1) 10Dzahn: base: update version of gen_fingerprints script [puppet] - 10https://gerrit.wikimedia.org/r/429114
[22:00:04] <jouncebot>	 samwilson and MaxSem: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for GlobalPreferences test deployment . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180425T2200).
[22:05:31] <wikibugs_>	 (03PS2) 10Samwilson: Deploy GlobalPreferences to test wikis and mw.org (third time) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428554
[22:07:13] <wikibugs_>	 (03CR) 10Samwilson: [C: 032] Deploy GlobalPreferences to test wikis and mw.org (third time) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428554 (owner: 10Samwilson)
[22:08:29] <wikibugs_>	 (03Merged) 10jenkins-bot: Deploy GlobalPreferences to test wikis and mw.org (third time) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428554 (owner: 10Samwilson)
[22:09:34] <wikibugs_>	 (03CR) 10jenkins-bot: Deploy GlobalPreferences to test wikis and mw.org (third time) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428554 (owner: 10Samwilson)
[22:09:42] <wikibugs_>	 (03CR) 10Mobrovac: "While I agree that no harm would come from increasing the map count overall, we can also modify the value in RB's role Hiera keeping the o" [puppet] - 10https://gerrit.wikimedia.org/r/429101 (https://phabricator.wikimedia.org/T193083) (owner: 10Eevans)
[22:15:22] <wikibugs_>	 10Operations, 10Traffic, 10netops, 10Patch-For-Review: Offload pings to dedicated server - https://phabricator.wikimedia.org/T190090#4159636 (10ayounsi)
[22:19:31] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw2258 is CRITICAL: Host mw2258 is not in mediawiki-installation dsh group
[22:21:54] <logmsgbot>	 !log samwilson@tin Synchronized wmf-config/InitialiseSettings.php: Deploy GlobalPreferences T189806 (duration: 01m 18s)
[22:22:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:22:01] <stashbot>	 T189806: Deploy GlobalPrefs on production - https://phabricator.wikimedia.org/T189806
[22:32:48] <wikibugs_>	 (03PS1) 10Samwilson: Revert "Deploy GlobalPreferences to test wikis and mw.org (third time)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429119
[22:51:47] <wikibugs_>	 (03CR) 10MaxSem: [C: 032] Revert "Deploy GlobalPreferences to test wikis and mw.org (third time)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429119 (owner: 10Samwilson)
[22:53:05] <wikibugs_>	 (03Merged) 10jenkins-bot: Revert "Deploy GlobalPreferences to test wikis and mw.org (third time)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429119 (owner: 10Samwilson)
[22:53:20] <wikibugs_>	 (03CR) 10jenkins-bot: Revert "Deploy GlobalPreferences to test wikis and mw.org (third time)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/429119 (owner: 10Samwilson)
[22:55:45] <logmsgbot>	 !log samwilson@tin Synchronized wmf-config/InitialiseSettings.php: Undeploy GlobalPreferences T184121 (duration: 01m 16s)
[22:55:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:55:52] <stashbot>	 T184121: Deploy checklist for GlobalPreferences on production - https://phabricator.wikimedia.org/T184121
[23:00:04] <jouncebot>	 addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for Evening SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180425T2300).
[23:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[23:17:47] <wikibugs_>	 (03CR) 10Aaron Schulz: "We still need to redo session storage (currently redis) to be more multi-dc robust (specfically with regard to server failures and re-shar" [debs/dynomite] - 10https://gerrit.wikimedia.org/r/421447 (owner: 10Aaron Schulz)
[23:19:32] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw2258 is OK: OK
[23:31:48] <icinga-wm>	 PROBLEM - Disk space on labtestnet2001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=75%)
[23:33:50] <Jayprakash12345>	 jouncebot: now
[23:33:50] <jouncebot>	 For the next 0 hour(s) and 26 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180425T2300)