[00:00:48] <mutante>	 !log switching apt.wikimedia.org from carbon to install1002 - there might be a short time until the LE SSL cert is also adjusted
[00:00:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:01:48] <wikibugs>	 06Operations, 13Patch-For-Review: Split carbon's install/mirror roles, provision install1001 - https://phabricator.wikimedia.org/T132757#3018660 (10Dzahn) 16:02 < mutante> !log switching apt.wikimedia.org from carbon to install1002 - there might be a short time until the LE SSL cert is also adjusted
[00:06:08] <icinga-wm>	 RECOVERY - puppet last run on labvirt1006 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[00:07:18] <icinga-wm>	 RECOVERY - puppet last run on cp3038 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[00:09:03] <wikibugs>	 (03PS1) 10Dzahn: Revert "switch apt.wm.org from carbon to install1002" [dns] - 10https://gerrit.wikimedia.org/r/337195
[00:09:32] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "doing this on Monday instead. i had not run authdns-update yet" [dns] - 10https://gerrit.wikimedia.org/r/337195 (owner: 10Dzahn)
[00:12:23] <wikibugs>	 (03PS1) 10Dzahn: switch apt.wm.org from carbon to install1002 [dns] - 10https://gerrit.wikimedia.org/r/337196
[00:13:08] <wikibugs>	 (03PS2) 10Dzahn: switch apt.wm.org from carbon to install1002 [dns] - 10https://gerrit.wikimedia.org/r/337196 (https://phabricator.wikimedia.org/T132757)
[00:13:38] <wikibugs>	 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#3018671 (10bd808)
[00:13:48] <wikibugs>	 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#2334744 (10bd808)
[00:15:05] <wikibugs>	 (03PS1) 10Dzahn: remove carbon from puppet [puppet] - 10https://gerrit.wikimedia.org/r/337197
[00:15:18] <icinga-wm>	 PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:18:19] <wikibugs>	 (03PS1) 10Dzahn: let install1002 be the new source for APT data rsync [puppet] - 10https://gerrit.wikimedia.org/r/337198
[00:22:20] <wikibugs>	 (03PS2) 10Dzahn: install: remove carbon from puppet and netboot [puppet] - 10https://gerrit.wikimedia.org/r/337197 (https://phabricator.wikimedia.org/T132757)
[00:23:11] <wikibugs>	 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#3018679 (10bd808)
[00:23:28] <icinga-wm>	 RECOVERY - configured eth on d-i-test is OK: OK - interfaces up
[00:23:28] <icinga-wm>	 RECOVERY - DPKG on d-i-test is OK: All packages OK
[00:23:38] <icinga-wm>	 RECOVERY - dhclient process on d-i-test is OK: PROCS OK: 0 processes with command name dhclient
[00:23:48] <icinga-wm>	 RECOVERY - Disk space on d-i-test is OK: DISK OK
[00:23:48] <icinga-wm>	 RECOVERY - puppet last run on d-i-test is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[00:23:58] <icinga-wm>	 PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:24:57] <wikibugs>	 06Operations, 10Gerrit, 13Patch-For-Review: Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3018680 (10demon) I've got a pretty strong suspicion that this will fix the overall issue.
[00:26:15] <wikibugs>	 (03PS1) 10Dzahn: delete install1001/2001 from Hiera data [puppet] - 10https://gerrit.wikimedia.org/r/337199 (https://phabricator.wikimedia.org/T157840)
[00:40:16] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 04-1] Don't enable the Diamond ntpd collector if systemd-timesyncd is used (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/337009 (https://phabricator.wikimedia.org/T157794) (owner: 10Muehlenhoff)
[00:44:18] <icinga-wm>	 RECOVERY - puppet last run on fluorine is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[00:49:37] <wikibugs>	 (03CR) 10Dzahn: [C: 032] delete install1001/2001 from Hiera data [puppet] - 10https://gerrit.wikimedia.org/r/337199 (https://phabricator.wikimedia.org/T157840) (owner: 10Dzahn)
[00:50:59] <wikibugs>	 (03PS2) 10Dzahn: delete install1001/2001 from Hiera data [puppet] - 10https://gerrit.wikimedia.org/r/337199 (https://phabricator.wikimedia.org/T157840)
[00:51:27] <wikibugs>	 (03CR) 10Dzahn: [V: 032 C: 032] delete install1001/2001 from Hiera data [puppet] - 10https://gerrit.wikimedia.org/r/337199 (https://phabricator.wikimedia.org/T157840) (owner: 10Dzahn)
[00:52:58] <icinga-wm>	 RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures
[01:01:10] <wikibugs>	 (03Abandoned) 10Dzahn: install: copy/move apt.wm.org setup to aptrepo module [puppet] - 10https://gerrit.wikimedia.org/r/325864 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn)
[01:06:38] <icinga-wm>	 PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:07:37] <wikibugs>	 (03PS1) 10Dzahn: lint: 'include base::firewall' -> 'include ::base::firewall' [puppet] - 10https://gerrit.wikimedia.org/r/337201
[01:08:08] <icinga-wm>	 PROBLEM - puppet last run on db1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:08:49] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] lint: 'include base::firewall' -> 'include ::base::firewall' [puppet] - 10https://gerrit.wikimedia.org/r/337201 (owner: 10Dzahn)
[01:10:18] <wikibugs>	 (03PS1) 10Dzahn: lint: 'include standard' -> 'include ::standard' [puppet] - 10https://gerrit.wikimedia.org/r/337202
[01:11:17] <wikibugs>	 (03PS2) 10Dzahn: lint: 'include base::firewall' -> 'include ::base::firewall' [puppet] - 10https://gerrit.wikimedia.org/r/337201
[01:13:29] <wikibugs>	 (03PS3) 10Faidon Liambotis: salt: use SHA256 master key fingeprint on newer systems [puppet] - 10https://gerrit.wikimedia.org/r/337189
[01:16:17] <wikibugs>	 (03CR) 10Faidon Liambotis: [C: 032] salt: use SHA256 master key fingeprint on newer systems [puppet] - 10https://gerrit.wikimedia.org/r/337189 (owner: 10Faidon Liambotis)
[01:16:34] <wikibugs>	 (03PS1) 10Dzahn: contint: drop npm settings for precise [puppet] - 10https://gerrit.wikimedia.org/r/337203
[01:17:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] contint: drop npm settings for precise [puppet] - 10https://gerrit.wikimedia.org/r/337203 (owner: 10Dzahn)
[01:18:08] <icinga-wm>	 RECOVERY - salt-minion processes on d-i-test is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[01:18:18] <icinga-wm>	 RECOVERY - Check systemd state on d-i-test is OK: OK - running: The system is fully operational
[01:18:25] <wikibugs>	 (03PS1) 10Dzahn: mariadb/prometheus: remove workaround for precise [puppet] - 10https://gerrit.wikimedia.org/r/337204
[01:21:22] <wikibugs>	 06Operations: systemd-timedated starting up every minute - https://phabricator.wikimedia.org/T157797#3016701 (10faidon) timedated is a socket-activated daemon. systemd spawns it every time its socket gets connected to, and timedatectl is doing that. We call timedatectl from an Icinga/NRPE check (check_timedatect...
[01:22:05] <wikibugs>	 (03PS2) 10Dzahn: contint: drop npm settings for precise [puppet] - 10https://gerrit.wikimedia.org/r/337203
[01:24:09] <wikibugs>	 06Operations, 13Patch-For-Review: Evaluate use of systemd-timesyncd on jessie for clock synchronisation - https://phabricator.wikimedia.org/T150257#2779478 (10faidon) It looks like timesyncd is enabled out of the box on new stretch installs. Our test system, which doesn't have the hiera flag set, thus exhibits...
[01:24:18] <wikibugs>	 (03PS1) 10Dzahn: labs_vagrant: drop precise support [puppet] - 10https://gerrit.wikimedia.org/r/337205
[01:25:01] <wikibugs>	 06Operations, 10Monitoring, 10Traffic, 13Patch-For-Review: diamond crashing on hosts using systemd-timesyncd - https://phabricator.wikimedia.org/T157794#3016635 (10faidon) FWIW, stretch's version (4.0.515-3) doesn't crash, but complains about being unable to connect to the NTP server every few minutes.
[01:29:20] <wikibugs>	 (03PS1) 10Dzahn: toollabs: drop precise-related monitoring check [puppet] - 10https://gerrit.wikimedia.org/r/337207
[01:35:38] <icinga-wm>	 RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[01:35:45] <wikibugs>	 06Operations: Replace nrpe 2.15 (& evaluate alternatives) - https://phabricator.wikimedia.org/T157853#3018830 (10faidon)
[01:37:08] <icinga-wm>	 RECOVERY - puppet last run on db1023 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[01:37:40] <wikibugs>	 06Operations, 10Wikimedia-Logstash: Get 5xx logs into kibana/logstash - https://phabricator.wikimedia.org/T149451#3018849 (10Tgr)
[02:01:11] <wikibugs>	 (03PS1) 10Faidon Liambotis: Remove jzerebecki from Icinga contact groups [puppet] - 10https://gerrit.wikimedia.org/r/337209
[02:21:27] <wikibugs>	 (03CR) 10Gergő Tisza: "Yes. See inline comments. '+' is handled at https://github.com/wikimedia/mediawiki/blob/d67197fa116acc366419faedeeacd91158a98f8b/includes/" (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336747 (https://phabricator.wikimedia.org/T157656) (owner: 10Gergő Tisza)
[02:22:08] <icinga-wm>	 PROBLEM - Disk space on elastic2001 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=98%)
[02:30:28] <icinga-wm>	 PROBLEM - Disk space on elastic2025 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=98%)
[02:31:51] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.11) (duration: 11m 31s)
[02:31:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:37:10] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Feb 11 02:37:10 UTC 2017 (duration 5m 19s)
[02:37:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:39:38] <icinga-wm>	 PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:04:58] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 1809.673338 Seconds
[03:05:58] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 24.36108 Seconds
[03:07:38] <icinga-wm>	 RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[03:30:58] <icinga-wm>	 PROBLEM - puppet last run on restbase1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:45:38] <icinga-wm>	 PROBLEM - Host cp1052 is DOWN: PING CRITICAL - Packet loss = 100%
[03:51:18] <icinga-wm>	 PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6
[03:51:18] <icinga-wm>	 PROBLEM - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6
[03:51:28] <icinga-wm>	 PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6
[03:51:38] <icinga-wm>	 PROBLEM - IPsec on cp4009 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:38] <icinga-wm>	 PROBLEM - IPsec on cp4018 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:38] <icinga-wm>	 PROBLEM - IPsec on cp4010 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:38] <icinga-wm>	 PROBLEM - IPsec on cp4017 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:39] <icinga-wm>	 PROBLEM - IPsec on cp4016 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:39] <icinga-wm>	 PROBLEM - IPsec on cp3030 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:39] <icinga-wm>	 PROBLEM - IPsec on cp3033 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:39] <icinga-wm>	 PROBLEM - IPsec on cp3042 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:40] <icinga-wm>	 PROBLEM - IPsec on cp3040 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:40] <icinga-wm>	 PROBLEM - IPsec on cp4008 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:48] <icinga-wm>	 PROBLEM - IPsec on cp3041 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:48] <icinga-wm>	 PROBLEM - IPsec on cp3031 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:51:58] <icinga-wm>	 PROBLEM - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6
[03:51:58] <icinga-wm>	 PROBLEM - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6
[03:51:58] <icinga-wm>	 PROBLEM - IPsec on cp2004 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6
[03:51:58] <icinga-wm>	 PROBLEM - IPsec on cp2019 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6
[03:51:58] <icinga-wm>	 PROBLEM - IPsec on cp3043 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:52:04] <JustBerry>	 zhuyifei1999_: ^^ ?
[03:52:08] <icinga-wm>	 PROBLEM - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6
[03:52:08] <icinga-wm>	 PROBLEM - IPsec on cp3032 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6
[03:52:28] <zhuyifei1999_>	 ?
[03:58:58] <icinga-wm>	 RECOVERY - puppet last run on restbase1008 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[04:00:09] <ori>	 a text varnish went down
[04:00:20] <ori>	 no impact, afaict
[04:15:58] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=2997.90 Read Requests/Sec=3076.50 Write Requests/Sec=6.60 KBytes Read/Sec=32710.40 KBytes_Written/Sec=2970.40
[04:26:58] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=80.00 Read Requests/Sec=202.90 Write Requests/Sec=252.90 KBytes Read/Sec=1868.40 KBytes_Written/Sec=2066.80
[06:46:38] <icinga-wm>	 PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:57:08] <icinga-wm>	 PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata]
[07:09:38] <icinga-wm>	 PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:14:38] <icinga-wm>	 RECOVERY - puppet last run on cp3047 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[07:15:08] <icinga-wm>	 PROBLEM - check_mysql on frdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2201
[07:19:58] <icinga-wm>	 PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:20:08] <icinga-wm>	 RECOVERY - check_mysql on frdb2001 is OK: Uptime: 1002652 Threads: 1 Questions: 20447294 Slow queries: 5269 Opens: 7939 Flush tables: 1 Open tables: 574 Queries per second avg: 20.393 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[07:25:08] <icinga-wm>	 RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[07:37:38] <icinga-wm>	 RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[07:45:28] <icinga-wm>	 PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:46:58] <icinga-wm>	 RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[08:00:28] <icinga-wm>	 PROBLEM - puppet last run on elastic2001 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago
[08:14:28] <icinga-wm>	 RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[08:24:28] <icinga-wm>	 PROBLEM - puppet last run on mc1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:29:28] <icinga-wm>	 PROBLEM - puppet last run on elastic2025 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago
[08:49:58] <icinga-wm>	 PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:53:28] <icinga-wm>	 RECOVERY - puppet last run on mc1006 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[09:02:48] <icinga-wm>	 PROBLEM - puppet last run on puppetmaster2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:09:28] <gehel>	 !log cleanup logs on elastic20(01|25) - T139043
[09:09:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:09:33] <stashbot>	 T139043: nested RemoteTransportExceptions filled the disk on elastic1036 and elastic1045 during a rolling restart - https://phabricator.wikimedia.org/T139043
[09:10:28] <icinga-wm>	 RECOVERY - puppet last run on elastic2001 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[09:11:28] <icinga-wm>	 RECOVERY - Disk space on elastic2025 is OK: DISK OK
[09:11:28] <icinga-wm>	 RECOVERY - puppet last run on elastic2025 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[09:13:18] <wikibugs>	 (03PS1) 10Gehel: elasticsearch - reimage elastic20(33|34|35|36) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/337218 (https://phabricator.wikimedia.org/T151326)
[09:14:28] <wikibugs>	 (03CR) 10Gehel: [C: 032] elasticsearch - reimage elastic20(33|34|35|36) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/337218 (https://phabricator.wikimedia.org/T151326) (owner: 10Gehel)
[09:15:23] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=elastic20(33|34|35|36).codfw.wmnet
[09:15:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:27] <wikibugs>	 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3019090 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2034.codfw.wmnet'] ```...
[09:16:30] <wikibugs>	 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3019091 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2033.codfw.wmnet'] ```...
[09:16:47] <wikibugs>	 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3019092 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2035.codfw.wmnet'] ```...
[09:16:51] <wikibugs>	 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3019093 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic2036.codfw.wmnet'] ```...
[09:17:58] <icinga-wm>	 RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[09:19:38] <icinga-wm>	 PROBLEM - salt-minion processes on puppetmaster1001 is CRITICAL: PROCS CRITICAL: 5 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[09:30:48] <icinga-wm>	 RECOVERY - puppet last run on puppetmaster2001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[09:35:12] <elukey>	 !log rebooting mw1236 to make sure that it comes up cleanly - T156610
[09:35:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:35:17] <stashbot>	 T156610: mw1236 powered down and not able to powerup - https://phabricator.wikimedia.org/T156610
[09:37:38] <icinga-wm>	 RECOVERY - salt-minion processes on puppetmaster1001 is OK: PROCS OK: 4 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[09:40:42] <wikibugs>	 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3019098 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2033.codfw.wmnet'] ```  Of which those **FAILED**: ``` set(['elastic203...
[09:42:25] <wikibugs>	 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3019099 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2036.codfw.wmnet'] ```  and were **ALL** successful.
[09:52:10] <logmsgbot>	 !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1236.eqiad.wmnet
[09:52:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:52:38] <wikibugs>	 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3019103 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2035.codfw.wmnet'] ```  and were **ALL** successful.
[09:53:28] <wikibugs>	 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3019104 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic2034.codfw.wmnet'] ```  and were **ALL** successful.
[09:53:33] <elukey>	 !log mw1236 back in production (scap pull executed before pooled=yes) - T156610
[09:53:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:53:38] <stashbot>	 T156610: mw1236 powered down and not able to powerup - https://phabricator.wikimedia.org/T156610
[09:54:11] <wikibugs>	 06Operations, 10ops-eqiad: mw1236 powered down and not able to powerup - https://phabricator.wikimedia.org/T156610#3019108 (10elukey) 05Open>03Resolved Thanks @Cmjohnson!!
[10:02:20] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=elastic20(33|34|35|36).codfw.wmnet
[10:02:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:00] <elukey>	 mw1236 looks good!
[10:08:05] <icinga-wm>	 RECOVERY - Disk space on elastic2001 is OK: DISK OK
[12:33:15] <icinga-wm>	 PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:02:15] <icinga-wm>	 RECOVERY - puppet last run on rcs1002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[14:38:25] <icinga-wm>	 PROBLEM - puppet last run on db1087 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:55:45] <icinga-wm>	 PROBLEM - puppet last run on ms-be1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:06:25] <icinga-wm>	 RECOVERY - puppet last run on db1087 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[15:24:45] <icinga-wm>	 RECOVERY - puppet last run on ms-be1022 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[15:27:29] <DatGuy>	 hoo hey
[15:27:40] <hoo>	 hi DatGuy 
[15:28:13] <DatGuy>	 do you have any idea about https://en.wikipedia.org/w/index.php?limit=100&title=Special%3AContributions&contribs=user&target=5.142.2*&namespace=&tagfilter=&year=2017&month=-1 ?
[15:28:20] <Chrissymad>	 hi I don't know much about all this bot stuff but there are a number of bot edits logged out right now 
[15:28:25] <Chrissymad>	 https://en.wikipedia.org/wiki/Special:Contributions/10.68.23.103 
[15:28:29] <Chrissymad>	 figured it was worth mentioning
[15:28:38] <DatGuy>	 whoops
[15:28:40] <DatGuy>	 ignore my message
[15:28:52] <DatGuy>	 Chrissymad's link is what i wanted to send
[15:28:53] <hoo>	 hm ok
[15:29:03] <DatGuy>	 could we add assert to the login?
[15:29:17] <DatGuy>	 This happens to AdminStats frequently
[15:29:21] <hoo>	 the bot author(s) can do that
[15:29:51] <hoo>	 if you deem this a problem, you can (soft!!!) block the ip with a notice
[15:29:53] <Reedy>	 Chrissymad: That's no good. I'd be tempted to block it
[15:30:20] <hoo>	 I guess we can even globally block
[15:30:35] <hoo>	 also a problem on commons, it seems (although a minor one, one edit only)
[15:30:35] <icinga-wm>	 PROBLEM - puppet last run on labsdb1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:30:54] <Reedy>	 I blocked it for 48 hours on enwiki
[15:31:00] <hoo>	 cool
[15:31:09] <hoo>	 https://commons.wikimedia.org/w/index.php?title=Commons:Database_reports/Unusually_long_user_blocks&diff=prev&oldid=225530941
[15:31:19] <hoo>	 maybe a year globally would alos be a good idea?
[15:31:28] <hoo>	 Does anyone ever need to do things as IP from that IP?
[15:31:33] <hoo>	 (Like signing up, …)
[15:31:44] <Reedy>	 You'd presume not
[15:32:08] <Reedy>	 I'll email JamesR
[15:32:48] <Reedy>	 It's supposed to be https://en.wikipedia.org/wiki/User:AdminStatsBot
[15:32:59] <hoo>	 Globally blocked for a year
[15:33:01] <Chrissymad>	 Is there a way for you guys to tell if an IP edit (or rather a series of IP edits) are actually a bot or is that something a CU would need to do? Cause I have some suspicions about an LTA issue. 
[15:33:34] <Chrissymad>	 Reedy, that ip I linked earlier also seems to be BernsteinBot https://en.wikipedia.org/w/index.php?title=Wikipedia:Database_reports/Largely_duplicative_file_names&diff=prev&oldid=764895275
[15:33:47] <Reedy>	 Chrissymad: It's likely many bots editing
[15:35:24] <Reedy>	 hoo: just block 10.0.0.0/8 ;P
[15:36:28] <hoo>	 Like that? https://qph.ec.quoracdn.net/main-qimg-ef279af035810c5317e01b5f24b8b8b9-c
[15:39:24] <Chrissymad>	 https://en.wikipedia.org/wiki/Special:Contributions/5.142.204.95 also may want to take a look at that I just rolled back their edits to datbot 
[15:40:35] <icinga-wm>	 PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[pt-heartbeat-kill]
[15:42:15] <icinga-wm>	 PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds
[15:46:55] <icinga-wm>	 RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 3.085 second response time
[15:58:35] <icinga-wm>	 RECOVERY - puppet last run on labsdb1007 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[16:08:35] <icinga-wm>	 RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[16:14:05] <Yvette>	 Chrissymad: Who cares?
[16:14:33] <Yvette>	 hoo, Reedy: I guess you all haven't been following this part of Gerrit.
[16:14:52] <Yvette>	 hoo, Reedy, Chrissymad: https://gerrit.wikimedia.org/r/#/c/324215/
[16:15:14] <Yvette>	 hoo: Blocking the IP address range doesn't actually solve whatever issue is logging these bots out.
[16:15:45] <Yvette>	 Or invalidating their sessions or whatever.
[16:21:02] <valhallasw`cloud>	 Yvette: it does, in some sense. Instead of editing anonymously, the bot will now crash (or not work), and the issue will end up back at the maintainer
[16:21:23] <valhallasw`cloud>	 Yvette: and it prevents a much larger issue: sysops blocking the IP //including// edits from logged-in accounts
[16:22:40] <Yvette>	 valhallasw`cloud: So because of some issue with user sessions, editors get punished?
[16:22:49] <Yvette>	 The bots don't make an edit instead of making an edit that "unattributed."
[16:22:58] <Yvette>	 that's *
[16:23:13] <Yvette>	 I'd personally rather have the edits. And maybe someone can fix whatever session issue is clearly happening.
[16:23:23] <Yvette>	 Saying you want to stop dumb admins is a lame excuse.
[16:25:49] <valhallasw`cloud>	 Sometimes bots don't edit because of bugs in their code. That's just how it goes.
[16:26:23] <valhallasw`cloud>	 And no, dumb admins are not a lame excuse. This has happened, and it of course caused larger scale issues.
[16:26:52] <Yvette>	 valhallasw`cloud: Bugs in their code? These scripts have mostly been running fine for years.
[16:27:02] <Yvette>	 You really think it's the scripts to blame?
[16:27:18] <Yvette>	 And it's a terrible excuse to say "well a dumb admin could make a dumb block, so let's make the dumb block first."
[16:27:30] <Yvette>	 Shrug.
[16:28:07] <Reedy>	 Well, they should be checking if they're logged in
[16:28:11] <Yvette>	 To what end?
[16:28:21] <valhallasw`cloud>	 Yvette: The ecosystem changes and sometimes bots have to be adapted. Then again, assert=user isn't exactly a new feature in the API, and logging in again when that assert fails is a pretty basic feature for a bot.
[16:28:24] <Yvette>	 The bots are logging in. Somehow they're losing their session.
[16:29:01] <Yvette>	 valhallasw`cloud: The ecosystem can change all it wants, but until someone can point to the script I'm using as the culprit, I'll assume the script I haven't touched in years isn't to blame.
[16:29:24] <Yvette>	 It's way more likely, IMO, that Wikimedia did something stupid with session handling here.
[16:29:26] <Reedy>	 But if you haven't updated it to take account of changes in the MW api etc
[16:29:30] <Yvette>	 What changes?
[16:29:37] <Reedy>	 "years" is vague
[16:29:40] <Reedy>	 Many things have changed
[16:30:02] <Yvette>	 https://en.wikipedia.org/w/index.php?title=Wikipedia:Database_reports/Largely_duplicative_file_names&action=history
[16:30:04] <Reedy>	 Login processes have changed
[16:30:06] <valhallasw`cloud>	 session cookies are no longer passed in the api response, for example (and just as actual cookies)
[16:30:08] <Yvette>	 You can easily look at that page history.
[16:30:13] <Yvette>	 And see that the bot logs in.
[16:30:19] <valhallasw`cloud>	 session lifetime may have changed
[16:30:26] <Yvette>	 It was working yesterday.
[16:30:28] <Yvette>	 It intermittently fails.
[16:30:35] <Yvette>	 Obviously I'm not changing anything.
[16:31:02] <Yvette>	 That script has been running since June 2011.
[16:31:49] <Yvette>	 Anyway, I'm not adding assert logic to fix this as I'm fine with the bot editing logged out.
[16:33:04] <Yvette>	 I might be willing to help diagnose whatever dumb issue is happening.
[16:33:58] <Yvette>	 https://en.wikipedia.org/wiki/Special:Contributions/10.68.23.223 <-- Which more likely: something in my scripts changed on November 4 or server-side session handling got screwed up somehow?
[16:34:04] <Yvette>	 I'm hungry.
[16:34:30] <Reedy>	 I'd presume a deploy happened
[16:34:57] <Yvette>	 Hmmm. That's an interesting theory. That shouldn't invalidate most user sessions, though.
[16:34:59] <valhallasw`cloud>	 Yvette: I'm assuming that at your cookie was no longer valid, and your bot ignored this instead of logging in again
[16:35:08] <Yvette>	 I mean, if it did, presumably users would complain a lot more often.
[16:35:23] <Yvette>	 valhallasw`cloud: Why would my cookie be invalid?
[16:35:26] <Reedy>	 What if all sessions were invalited for a specific reason?
[16:35:35] <Yvette>	 Then you'd annoy everyone a lot?
[16:35:35] <Reedy>	 You'll just carry on editing logged out because you don't care?
[16:35:39] <Reedy>	 Yeah
[16:35:41] <Yvette>	 Sure, why not?
[16:35:44] <Yvette>	 We allow anonymous editing.
[16:35:47] <Yvette>	 "Anonymous."
[16:35:51] <Reedy>	 Which is why it's not done purposefully without good reason
[16:35:55] <Yvette>	 It's more annoying to not have the edits, surely.
[16:36:03] <Reedy>	 Depends on the edits
[16:36:09] <Reedy>	 Many don't add any value
[16:36:11] <Yvette>	 Like nobody is looking at that page and going "gosh, I really wish I knew who wrote this!"
[16:36:17] <Yvette>	 Many edits?
[16:36:22] <Yvette>	 Talk to the editors, then.
[16:37:10] <Yvette>	 valhallasw`cloud: If it fails once every three months or whatever, I'm definitely not fixing it.
[16:37:19] <Yvette>	 Patches welcome, tho.
[16:37:35] <valhallasw`cloud>	 then the bot will not edit every three months or whatever ¯\_(ツ)_/¯
[16:37:44] <Reedy>	 So how come it works after that?
[16:37:48] <Reedy>	 something else obviously changes
[16:37:50] <Yvette>	 After what?
[16:37:56] <valhallasw`cloud>	 the bot probably logs in again?
[16:37:56] <Reedy>	 The next edit is logged in
[16:38:03] <Yvette>	 Oh, it logs in every time.
[16:38:12] <Yvette>	 It's a daily script.
[16:38:15] <Yvette>	 https://en.wikipedia.org/w/index.php?title=Wikipedia:Database_reports/Largely_duplicative_file_names/Configuration
[16:38:16] <Reedy>	 But then doesn't check if the login actually worked?
[16:38:21] <Yvette>	 Check how?
[16:38:25] <valhallasw`cloud>	 assert=user
[16:38:27] <Yvette>	 It logs in and edits the page every day.
[16:38:33] <Yvette>	 The bot logs in and edits the page every day.
[16:38:43] <Yvette>	 One some days, intermittently, it loses its session (we think).
[16:38:45] <Yvette>	 Or maybe the login failed.
[16:38:49] <Yvette>	 Probably the session, though.
[16:38:52] <Reedy>	 https://en.wikipedia.org/w/api.php?action=query&meta=userinfo will tell you what the API thinks you are
[16:38:58] <Yvette>	 I don't care, though.
[16:39:05] <Yvette>	 I understand how the assertion code works.
[16:39:18] <Yvette>	 I'm saying that this scripts works 88 out of 90 times.
[16:39:20] <Reedy>	 gj
[16:39:25] <Yvette>	 And the two times it fails are probably not its fault.
[16:39:33] <Yvette>	 Since I'm not changing the script or touching any of its code.
[16:39:45] <Yvette>	 Meanwhile, on the other side, people are constantly changing the code.
[16:40:02] <valhallasw`cloud>	 Yvette: so how does this script work? log in, do query, then edit the page with the result?
[16:40:09] <Yvette>	 Yeah.
[16:40:11] <Yvette>	 https://en.wikipedia.org/w/index.php?title=Wikipedia:Database_reports/Largely_duplicative_file_names/Configuration
[16:40:15] <Yvette>	 It's a pretty simple script.
[16:40:28] <Yvette>	 For long-running queries, I think I've sometimes moved the login code below the query execution.
[16:40:29] <Reedy>	 Interestingly, this assert edit has been around at least since Nov 2007
[16:40:29] <valhallasw`cloud>	 move the login to just before the edit?
[16:40:39] <Yvette>	 Yeah, I could.
[16:40:47] <valhallasw`cloud>	 talking about simple solutions
[16:40:48] <Yvette>	 But again, don't want to at all.
[16:40:49] <Reedy>	 So this problem has no doubt been solved since long before the script was written
[16:40:55] <Yvette>	 valhallasw`cloud: Or someone could diagnose the actual issue?
[16:41:02] <Yvette>	 Reedy: What problem?
[16:41:10] <Reedy>	 Finding out if you're logged in
[16:41:11] <Yvette>	 You both are looking at solutions instead of the actual problem.
[16:41:16] <Yvette>	 That's not the problem, bro.
[16:41:20] <Reedy>	 Have you reported it?
[16:41:22] <valhallasw`cloud>	 Yvette: you are assuming cookie invalidation is a problem. It's not.
[16:41:27] <Reedy>	 Have you provided detailed debugging details?
[16:41:29] <Yvette>	 valhallasw`cloud: Why not?
[16:41:34] <Reedy>	 Cookies, session info etc you think should be active?
[16:41:59] <Yvette>	 Reedy: No. I'm not sure how many ways I can say that I don't care about the occasional logged-out edit
[16:42:02] <Yvette>	 .
[16:42:06] <Reedy>	 Other people do
[16:42:08] <Yvette>	 Who?
[16:42:12] <valhallasw`cloud>	 Yvette: because sessions can be invalidated for many different reasons. They are an edge case a bot should take care of.
[16:42:27] <Yvette>	 valhallasw`cloud: Since when?
[16:42:34] <Yvette>	 Does your user session get invalidated in a browser?
[16:42:39] <Reedy>	 You presumably could've prevented the logged out edits from the bot/client side in less time than this discussion has been going on
[16:42:40] <Yvette>	 Like when does that ever happen?
[16:42:44] <valhallasw`cloud>	 Every 30 days
[16:42:47] <Yvette>	 No.
[16:42:49] <Yvette>	 That's not true.
[16:43:06] <Yvette>	 We set the cookie for 365 days on Wikimedia wikis. And this script is logging in every day.
[16:43:10] <Reedy>	 $wgCookieExpiration = 30 * 86400;
[16:43:10] <Reedy>	 $wgExtendedLoginCookieExpiration = 365 * 86400;
[16:43:21] <Reedy>	 Well, if it's logging in and it's not logged in, something is going wrong
[16:43:28] <Yvette>	 No kidding.
[16:43:29] <Reedy>	 Maybe your code. Maybe not
[16:43:33] <Yvette>	 Maybe someone should investigate that.
[16:43:38] <Yvette>	 It's not just this script, BTW.
[16:43:39] <Reedy>	 File a bug then?
[16:43:44] <Reedy>	 Report it
[16:43:49] <Reedy>	 Give detailed information
[16:43:58] <Reedy>	 Tell when you logged in, when the edits were made logged out
[16:44:00] <valhallasw`cloud>	 Yvette: you're right, it's a year. Yet I log in far more often.
[16:44:06] <Yvette>	 So first I should go edit all of these scripts.
[16:44:09] <Reedy>	 Rather than expecting someone to grep many logs
[16:44:13] <Yvette>	 Now I should go file a detailed ticket for you.
[16:44:19] <Yvette>	 Any other free work you'd like from me?
[16:44:22] <Yvette>	 On a Saturday.
[16:44:26] <Yvette>	 Just wondering.
[16:44:27] <Reedy>	 You've got nothing better to do, right?
[16:44:31] <Yvette>	 I need lunch!
[16:44:32] <Reedy>	 It's my birthday for christ sakes
[16:44:36] <Reedy>	 And I'm arguging about inane shit with you
[16:44:38] <Yvette>	 Oh, happy birthday!
[16:44:50] * Yvette hugs.
[16:44:58] * Chrissymad gives Reedy a beer
[16:45:23] <Yvette>	 Reedy: Might also be Wikimedia Labs.
[16:45:29] <Yvette>	 Since that's a common factor here, I think.
[16:45:31] <Reedy>	 Yvette: Wouldn't happened on toolserver.
[16:45:35] <Yvette>	 ikr
[16:45:43] <Yvette>	 The Toolserver had a soft-block, heh.
[16:45:52] <Yvette>	 At least on the English Wikipedia.
[16:46:15] <valhallasw`cloud>	 Ok, then my argument magically changes to 'This is how the Toolserver did it'. Done! :P
[16:46:44] <Yvette>	 https://gerrit.wikimedia.org/r/#/c/324215/ is the relevant changeset.
[16:46:48] <Yvette>	 I already said my piece there.
[16:47:05] <Yvette>	 I'm not really opposed to soft-blocking, especially as there's precedent, but it still doesn't solve whatever the actual issue is.
[16:47:10] <Yvette>	 And probably just masks it deeper.
[16:47:36] <Reedy>	 Well, pissing off people so they look into it more thoroughly seems like a way forward
[16:47:39] <Reedy>	 Rather than not fixing it at all
[16:48:16] <Yvette>	 Lawl. That's, uhh, quite an approach.
[16:48:55] <Yvette>	 If it happened regularly, I'd be more inclined to diagnose. But it happens like once a month or so.
[17:36:45] <icinga-wm>	 PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:04:45] <icinga-wm>	 RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures
[18:59:35] <icinga-wm>	 PROBLEM - puppet last run on db1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:26:49] <wikibugs>	 (03PS1) 10Brion VIBBER: Bump up number of queue runners for transcodes [puppet] - 10https://gerrit.wikimedia.org/r/337230 (https://phabricator.wikimedia.org/T108234)
[19:28:35] <icinga-wm>	 RECOVERY - puppet last run on db1049 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[20:37:15] <icinga-wm>	 PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:37:35] <icinga-wm>	 PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:05:35] <icinga-wm>	 RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[21:06:15] <icinga-wm>	 RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[22:35:25] <icinga-wm>	 PROBLEM - puppet last run on maps1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:41:25] <icinga-wm>	 PROBLEM - puppet last run on ms-be1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:03:25] <icinga-wm>	 RECOVERY - puppet last run on maps1003 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[23:09:25] <icinga-wm>	 RECOVERY - puppet last run on ms-be1016 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[23:50:13] <wikibugs>	 (03CR) 10Reedy: [C: 031] Fix SiteConfiguration array merge syntax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336747 (https://phabricator.wikimedia.org/T157656) (owner: 10Gergő Tisza)
[23:57:25] <icinga-wm>	 PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues