[00:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy Evening SWAT (Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190301T0000).
[00:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[00:00:36] <bblack>	 yeah
[00:00:52] <bblack>	 it might be interesting to do that check in a very general way across a lot of the fleet that's confd-controlled
[00:01:06] <thcipriani>	 is everything with logging back to normal? is it fine to run a noop deploy during this window? (testing scap feature)
[00:01:50] <bblack>	 (if a service on host X is depooled in confd, raise an icinga critical.  if someone's working on the box they should've icinga-disabled it anyways, and it provides a feedback when you go check icinga at the end of your work that "oh yeah I need to repool that")
[00:02:16] <bblack>	 we could probably write and deploy that check in a very profile-neutral way
[00:02:40] <XioNoX>	 alright, updated https://wikitech.wikimedia.org/wiki/Service_restarts#Cache_proxies_%28varnish%29_%28cp%29 so I don't forget in the future
[00:03:12] <XioNoX>	 bblack: yeah good point
[00:04:59] <bblack>	 thanks!
[00:05:09] <wikibugs>	 (03CR) 10Smalyshev: "@Jforrester: I think it's ok on Beta, can we move forward with this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/489598 (https://phabricator.wikimedia.org/T217276) (owner: 10Smalyshev)
[00:06:50] <thcipriani>	 looking at scrollback, seems like everything wrt deployment was figured out
[00:06:58] * thcipriani does scap fiddling
[00:09:56] <logmsgbot>	 !log thcipriani@deploy1001 Synchronized README: noop sync to test opcache-manager in scap 3.9.1-1 (duration: 00m 48s)
[00:09:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:12:27] <XioNoX>	 !log pre-configure asw-a3 ports on asw2-a3-eqiad - T187960
[00:12:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:12:32] <stashbot>	 T187960: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960
[00:13:06] <icinga-wm>	 PROBLEM - Long running screen/tmux on analytics-tool1003 is CRITICAL: CRIT: Long running SCREEN process. (user: nuria PID: 10608, 1737636s 1728000s).
[00:19:26] <wikibugs>	 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10ayounsi)
[00:39:30] <wikibugs>	 (03PS7) 10CRusnov: Add ganeti->netbox sync script [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229)
[01:15:22] <wikibugs>	 (03PS8) 10CRusnov: Add ganeti->netbox sync script [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229)
[02:16:38] <wikibugs>	 (03CR) 10Krinkle: [C: 03+1] Oversample navtiming on ruwiki and eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493055 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles)
[02:54:17] <wikibugs>	 (03PS1) 10Paladox: Merge branch 'stable-2.14' into stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/493636
[03:15:34] <wikibugs>	 (03CR) 10Paladox: [V: 03+2 C: 03+2] "Tested locally and works with bazel 0.23" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/493636 (owner: 10Paladox)
[03:33:00] <wikibugs>	 (03Abandoned) 10CRusnov: Update to upstream v2.5.7 tag. [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492577 (owner: 10CRusnov)
[03:33:14] <wikibugs>	 (03PS1) 10CRusnov: Update to upstream v2.5.7 tag. [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/493637
[04:00:48] <wikibugs>	 (03PS9) 10CRusnov: Add ganeti->netbox sync script [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229)
[04:01:53] <wikibugs>	 (03CR) 10CRusnov: "note that this has been successfully tested with the -i flag and a json dump from the ganeti api on the af-netbox instance." [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov)
[04:05:01] <wikibugs>	 (03PS2) 10CRusnov: Add dummy netbox tokens [labs/private] - 10https://gerrit.wikimedia.org/r/493084
[04:05:35] <wikibugs>	 (03CR) 10CRusnov: [V: 03+2 C: 03+2] Add dummy netbox tokens [labs/private] - 10https://gerrit.wikimedia.org/r/493084 (owner: 10CRusnov)
[04:51:52] <icinga-wm>	 PROBLEM - Long running screen/tmux on an-coord1001 is CRITICAL: CRIT: Long running SCREEN process. (user: otto PID: 26051, 3932719s 1728000s).
[05:51:22] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2033 - https://phabricator.wikimedia.org/T217301 (10Marostegui) 05Open→03Resolved Thank you! It looks good now `        logicaldrive 1 (3.3 TB, RAID 1+0, OK)        physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK)       physicaldrive 1I:1:2...
[05:54:13] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493646
[05:57:24] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1114 crashed (HW memory issues) - https://phabricator.wikimedia.org/T214720 (10Marostegui) No problem! let's leave the loop there for a few days to see if it crashes Thank you!
[05:57:50] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493646 (owner: 10Marostegui)
[05:58:52] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493646 (owner: 10Marostegui)
[05:59:58] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 51s)
[05:59:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:03:51] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: dbproxy1012 power supply without power - https://phabricator.wikimedia.org/T217394 (10Marostegui)
[06:04:06] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: dbproxy1012 power supply without power - https://phabricator.wikimedia.org/T217394 (10Marostegui) p:05Triage→03Normal
[06:05:54] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493646 (owner: 10Marostegui)
[06:10:43] <wikibugs>	 (03PS1) 10Marostegui: install_server: Remove dbstore1002 [puppet] - 10https://gerrit.wikimedia.org/r/493647 (https://phabricator.wikimedia.org/T216491)
[06:11:43] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Remove dbstore1002 [puppet] - 10https://gerrit.wikimedia.org/r/493647 (https://phabricator.wikimedia.org/T216491) (owner: 10Marostegui)
[06:28:32] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/puppet-enabled]
[06:30:20] <icinga-wm>	 PROBLEM - puppet last run on analytics1071 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/spark2_yarn_shuffle_jar_install]
[06:31:32] <icinga-wm>	 PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/hhvm-needs-restart]
[06:31:32] <icinga-wm>	 PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean]
[06:39:54] <_joe_>	 this is logrotate I guess
[06:40:27] <_joe_>	 !log upgrading php extensions on deploy* to versions compatible with php7.2
[06:40:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:43:27] <marostegui>	 !log Stop MySQL on db1094 for mysql upgrade
[06:43:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:48:34] <marostegui>	 !log Deploy schema change on s4 codfw, lag will appear on s4 codfw - T86342
[06:48:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:48:37] <stashbot>	 T86342: Dropping page.page_no_title_convert on wmf databases - https://phabricator.wikimedia.org/T86342
[06:51:31] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493648
[06:54:09] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493648 (owner: 10Marostegui)
[06:55:07] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493648 (owner: 10Marostegui)
[06:56:10] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1094 after mysql upgrade (duration: 00m 46s)
[06:56:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:56:20] <icinga-wm>	 RECOVERY - puppet last run on analytics1071 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:32] <icinga-wm>	 RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:57:32] <icinga-wm>	 RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:59:46] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1010 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[07:03:34] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493648 (owner: 10Marostegui)
[07:04:36] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Give more traffic to db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493649
[07:09:53] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Give more traffic to db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493649 (owner: 10Marostegui)
[07:10:58] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Give more traffic to db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493649 (owner: 10Marostegui)
[07:12:00] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
[07:12:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:14:50] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Give more traffic to db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493649 (owner: 10Marostegui)
[07:18:05] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] scap: fix php version, add php7 admin port [puppet] - 10https://gerrit.wikimedia.org/r/493485 (https://phabricator.wikimedia.org/T211964) (owner: 10Giuseppe Lavagetto)
[07:22:03] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: scap: fix php version, add php7 admin port [puppet] - 10https://gerrit.wikimedia.org/r/493485 (https://phabricator.wikimedia.org/T211964)
[07:23:26] <_joe_>	 !log installed php 7.2 compatible packages on deploy1001,2001
[07:23:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:25:58] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493650
[07:27:45] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493650 (owner: 10Marostegui)
[07:28:48] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493650 (owner: 10Marostegui)
[07:29:44] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Increase traffic for db1094 after mysql upgrade (duration: 00m 47s)
[07:29:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:30:24] <_joe_>	 marostegui: can I do a test deploy 1 second?
[07:30:29] <marostegui>	 sure!
[07:30:31] <marostegui>	 go ahead!
[07:31:31] <logmsgbot>	 !log oblivian@deploy1001 Synchronized README: Test deploy for new scap configuration (duration: 00m 46s)
[07:31:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:32:03] <_joe_>	 uhm
[07:37:51] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493650 (owner: 10Marostegui)
[07:39:17] <logmsgbot>	 !log oblivian@deploy1001 Synchronized README: noop sync to test opcache-manager (duration: 00m 47s)
[07:39:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:41:44] <_joe_>	 marostegui: last attempt I swear
[07:44:17] <logmsgbot>	 !log oblivian@deploy1001 Synchronized README: Test deploy for new scap configuration (duration: 00m 48s)
[07:44:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:03:02] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493652
[08:04:54] <_joe_>	 marostegui: I'm done btw
[08:05:00] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) Via HPE CLI I tried to flip /map1/config1/oemHPE_ipmi_dcmi_overlan_enable=yes but didn't work afaics..
[08:15:23] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['labsdb1012.eqiad.wmnet'] ` The log...
[08:20:07] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, to be merged on Mon I think?" [puppet] - 10https://gerrit.wikimedia.org/r/493610 (https://phabricator.wikimedia.org/T200960) (owner: 10Herron)
[08:24:38] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/493460 (https://phabricator.wikimedia.org/T216993) (owner: 10Mathew.onipe)
[08:26:24] <wikibugs>	 10Operations, 10DBA: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui)
[08:26:39] <wikibugs>	 10Operations, 10DBA: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui)
[08:26:46] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review, 10User-Marostegui: rack/setup/install db11[26-38].eqiad.wmnet - https://phabricator.wikimedia.org/T211613 (10Marostegui)
[08:27:06] <wikibugs>	 10Operations, 10DBA: Decommission db1061-db1073 - https://phabricator.wikimedia.org/T217396 (10Marostegui) p:05Triage→03Normal
[08:28:03] <wikibugs>	 (03CR) 10Muehlenhoff: "The old repo will eventually go away; it contained PHP packages synced from an external repository, which also rebuilds/upgrades a number " [puppet] - 10https://gerrit.wikimedia.org/r/493451 (https://phabricator.wikimedia.org/T216712) (owner: 10BryanDavis)
[08:29:57] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) The above setting seems to have done the trick, now wmf-auto-reimage works.. I got this:  ` 08:27:57 | labsdb1012.eqiad.wmnet | WARNI...
[08:31:50] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['labsdb1012.eqiad.wmnet'] `  and were **ALL** successful.
[08:33:16] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review, 10Performance-Team (Radar): Increase parsercache keys TTL  from 22 days back to 30 days - https://phabricator.wikimedia.org/T210992 (10Marostegui) 05Open→03Resolved
[08:38:31] <wikibugs>	 (03PS13) 10DCausse: [WIP] Add support for elasticsearch 6 [puppet] - 10https://gerrit.wikimedia.org/r/493234 (https://phabricator.wikimedia.org/T217196)
[08:41:43] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) a:05Cmjohnson→03elukey
[08:43:02] <wikibugs>	 (03PS5) 10Muehlenhoff: Create /etc/debdeploy-autorestarts.conf which lists all automated restarts [puppet] - 10https://gerrit.wikimedia.org/r/493401
[08:45:47] <wikibugs>	 (03PS1) 10Elukey: [WIP] Assign role labs::db::wikireplica_analytics to labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/493653 (https://phabricator.wikimedia.org/T215231)
[08:52:50] <godog>	 !log temporarily stop prometheus instances on prometheus2004 to take a snapshot
[08:52:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:52:57] <godog>	 ^ will cause some UNKNOWNs in icinga
[08:53:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Create /etc/debdeploy-autorestarts.conf which lists all automated restarts [puppet] - 10https://gerrit.wikimedia.org/r/493401 (owner: 10Muehlenhoff)
[08:58:08] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: scap: fix my typos [puppet] - 10https://gerrit.wikimedia.org/r/493654
[09:00:03] <icinga-wm>	 PROBLEM - puppet last run on db2044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:00:11] <icinga-wm>	 PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:00:21] <icinga-wm>	 PROBLEM - puppet last run on es2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:00:27] <icinga-wm>	 PROBLEM - puppet last run on elastic2039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:00:30] <moritzm>	 on it
[09:00:41] <icinga-wm>	 PROBLEM - puppet last run on db1117 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:00:43] <icinga-wm>	 PROBLEM - puppet last run on cp4025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:01:01] <icinga-wm>	 PROBLEM - puppet last run on mw2288 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:01:03] <icinga-wm>	 PROBLEM - puppet last run on mw2182 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:01:05] <icinga-wm>	 PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:01:05] <icinga-wm>	 PROBLEM - puppet last run on elastic2035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:01:09] <icinga-wm>	 PROBLEM - puppet last run on elastic2030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:01:13] <wikibugs>	 (03PS1) 10Muehlenhoff: Revert "Create /etc/debdeploy-autorestarts.conf which lists all automated restarts" [puppet] - 10https://gerrit.wikimedia.org/r/493655
[09:01:23] <icinga-wm>	 PROBLEM - puppet last run on an-worker1079 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:01:42] <wikibugs>	 (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Revert "Create /etc/debdeploy-autorestarts.conf which lists all automated restarts" [puppet] - 10https://gerrit.wikimedia.org/r/493655 (owner: 10Muehlenhoff)
[09:01:49] <icinga-wm>	 PROBLEM - puppet last run on es2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:01:51] <icinga-wm>	 PROBLEM - puppet last run on db2042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:01:51] <icinga-wm>	 PROBLEM - puppet last run on mw2223 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:25] <icinga-wm>	 PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:27] <icinga-wm>	 PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:31] <icinga-wm>	 PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:39] <icinga-wm>	 PROBLEM - puppet last run on mw1262 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:41] <icinga-wm>	 PROBLEM - puppet last run on elastic1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:43] <icinga-wm>	 PROBLEM - puppet last run on ms-be1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:45] <icinga-wm>	 PROBLEM - puppet last run on ms-be1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:47] <icinga-wm>	 PROBLEM - puppet last run on rdb1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:51] <icinga-wm>	 PROBLEM - puppet last run on wtp2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:53] <icinga-wm>	 PROBLEM - puppet last run on mc1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:02:53] <icinga-wm>	 PROBLEM - puppet last run on mw1309 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:03] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:03] <icinga-wm>	 PROBLEM - puppet last run on puppetmaster2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:03] <icinga-wm>	 PROBLEM - puppet last run on wtp2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:05] <icinga-wm>	 PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:15] <icinga-wm>	 PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:15] <icinga-wm>	 PROBLEM - puppet last run on cp2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:15] <icinga-wm>	 PROBLEM - puppet last run on wtp2010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:15] <icinga-wm>	 PROBLEM - puppet last run on pybal-test2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:15] <icinga-wm>	 PROBLEM - puppet last run on db2081 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:15] <icinga-wm>	 PROBLEM - puppet last run on aluminium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:15] <icinga-wm>	 PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:16] <icinga-wm>	 PROBLEM - puppet last run on cp4028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:16] <icinga-wm>	 PROBLEM - puppet last run on db1121 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:17] <icinga-wm>	 PROBLEM - puppet last run on restbase1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:17] <icinga-wm>	 PROBLEM - puppet last run on druid1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:18] <icinga-wm>	 PROBLEM - puppet last run on restbase1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:18] <icinga-wm>	 PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:19] <icinga-wm>	 PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:20] <icinga-wm>	 PROBLEM - puppet last run on mw2240 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:20] <icinga-wm>	 PROBLEM - puppet last run on es2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:21] <icinga-wm>	 PROBLEM - puppet last run on mw2137 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:21] <icinga-wm>	 PROBLEM - puppet last run on snapshot1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:22] <icinga-wm>	 PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:22] <icinga-wm>	 PROBLEM - puppet last run on cp3049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:23] <icinga-wm>	 PROBLEM - puppet last run on analytics1077 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:23] <icinga-wm>	 PROBLEM - puppet last run on ores1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:24] <icinga-wm>	 PROBLEM - puppet last run on analytics1053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:24] <icinga-wm>	 PROBLEM - puppet last run on proton1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:25] <icinga-wm>	 PROBLEM - puppet last run on elastic2048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:27] <icinga-wm>	 PROBLEM - puppet last run on scb2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:29] <icinga-wm>	 PROBLEM - puppet last run on mw2162 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:31] <icinga-wm>	 PROBLEM - puppet last run on pc1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:31] <icinga-wm>	 PROBLEM - puppet last run on actinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:31] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:31] <icinga-wm>	 PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:33] <icinga-wm>	 PROBLEM - puppet last run on ms-be1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:43] <icinga-wm>	 PROBLEM - puppet last run on mw2287 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:47] <icinga-wm>	 PROBLEM - puppet last run on mw2212 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:49] <icinga-wm>	 PROBLEM - puppet last run on mw2141 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:49] <icinga-wm>	 PROBLEM - puppet last run on mw2156 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:49] <icinga-wm>	 PROBLEM - puppet last run on debmonitor1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:57] <icinga-wm>	 PROBLEM - puppet last run on boron is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:59] <icinga-wm>	 PROBLEM - puppet last run on mw2266 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:59] <icinga-wm>	 PROBLEM - puppet last run on vega is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:03:59] <icinga-wm>	 PROBLEM - puppet last run on rdb2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:01] <icinga-wm>	 PROBLEM - puppet last run on db1097 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:01] <icinga-wm>	 PROBLEM - puppet last run on analytics1055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:01] <icinga-wm>	 PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:01] <icinga-wm>	 PROBLEM - puppet last run on elastic1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:01] <icinga-wm>	 PROBLEM - puppet last run on ms-be2032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:05] <icinga-wm>	 PROBLEM - puppet last run on mw1295 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:05] <icinga-wm>	 PROBLEM - puppet last run on labweb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:07] <_joe_>	 whoa
[09:04:13] <icinga-wm>	 PROBLEM - puppet last run on db1082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:17] <icinga-wm>	 PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:17] <icinga-wm>	 PROBLEM - puppet last run on db1070 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:17] <icinga-wm>	 PROBLEM - puppet last run on mw1312 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:17] <icinga-wm>	 PROBLEM - puppet last run on cloudvirt1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:17] <icinga-wm>	 PROBLEM - puppet last run on analytics1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:17] <icinga-wm>	 PROBLEM - puppet last run on ms-be1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:19] <icinga-wm>	 PROBLEM - puppet last run on elastic1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:19] <icinga-wm>	 PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:21] <icinga-wm>	 PROBLEM - puppet last run on lvs4006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:25] <icinga-wm>	 PROBLEM - puppet last run on acrux is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:27] <icinga-wm>	 PROBLEM - puppet last run on elastic1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:27] <icinga-wm>	 PROBLEM - puppet last run on mw2274 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:27] <icinga-wm>	 PROBLEM - puppet last run on es2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:28] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] scap: fix my typos [puppet] - 10https://gerrit.wikimedia.org/r/493654 (owner: 10Giuseppe Lavagetto)
[09:04:29] <moritzm>	 it's fixed, but Icinga is a little slow to report :-)
[09:04:29] <icinga-wm>	 PROBLEM - puppet last run on deploy2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:31] <icinga-wm>	 PROBLEM - puppet last run on dns1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:31] <icinga-wm>	 PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:31] <icinga-wm>	 PROBLEM - puppet last run on aqs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:31] <icinga-wm>	 PROBLEM - puppet last run on maps1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:34] <_joe_>	 moritzm: yeah I know
[09:04:37] <icinga-wm>	 PROBLEM - puppet last run on cp2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:37] <icinga-wm>	 PROBLEM - puppet last run on scb2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:37] <icinga-wm>	 PROBLEM - puppet last run on wtp2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:39] <icinga-wm>	 PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:39] <icinga-wm>	 PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:39] <icinga-wm>	 PROBLEM - puppet last run on mw2172 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:39] <icinga-wm>	 PROBLEM - puppet last run on mw2157 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:43] <icinga-wm>	 PROBLEM - puppet last run on pc2009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:43] <icinga-wm>	 PROBLEM - puppet last run on sessionstore2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:43] <icinga-wm>	 PROBLEM - puppet last run on ores2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:43] <icinga-wm>	 PROBLEM - puppet last run on dbmonitor1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:43] <icinga-wm>	 PROBLEM - puppet last run on mw1242 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:46] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: scap: fix my typos [puppet] - 10https://gerrit.wikimedia.org/r/493654
[09:04:50] <_joe_>	 grr
[09:04:51] <icinga-wm>	 PROBLEM - puppet last run on restbase1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:53] <icinga-wm>	 PROBLEM - puppet last run on mw1346 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:53] <icinga-wm>	 PROBLEM - puppet last run on restbase1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:04:53] <icinga-wm>	 PROBLEM - puppet last run on cloudvirt1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:01] <icinga-wm>	 PROBLEM - puppet last run on restbase2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:03] <icinga-wm>	 PROBLEM - puppet last run on elastic2053 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:03] <icinga-wm>	 PROBLEM - puppet last run on mw1322 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:03] <icinga-wm>	 PROBLEM - puppet last run on an-worker1082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:03] <icinga-wm>	 PROBLEM - puppet last run on ms-be2025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:03] <icinga-wm>	 PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:04] <icinga-wm>	 PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:05] <_joe_>	 where is icinga-wm running?
[09:05:07] <icinga-wm>	 PROBLEM - puppet last run on ms-be1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:07] <icinga-wm>	 PROBLEM - puppet last run on cp5012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:09] <icinga-wm>	 PROBLEM - puppet last run on db2045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:09] <icinga-wm>	 PROBLEM - puppet last run on mw2176 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:11] <icinga-wm>	 PROBLEM - puppet last run on db1118 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:15] <icinga-wm>	 PROBLEM - puppet last run on db1083 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:15] <icinga-wm>	 PROBLEM - puppet last run on an-worker1078 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:21] <icinga-wm>	 PROBLEM - puppet last run on db2035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:21] <icinga-wm>	 PROBLEM - puppet last run on mw2144 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:27] <icinga-wm>	 PROBLEM - puppet last run on analytics1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:27] <icinga-wm>	 PROBLEM - puppet last run on druid1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:27] <_joe_>	 sorry lemme rephrase, where is the code for icinga-wm?
[09:05:37] <icinga-wm>	 PROBLEM - puppet last run on mc2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:37] <icinga-wm>	 PROBLEM - puppet last run on cloudvirt1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:39] <icinga-wm>	 PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:45] <icinga-wm>	 PROBLEM - puppet last run on mw2290 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:47] <icinga-wm>	 PROBLEM - puppet last run on dns1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:53] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:55] <icinga-wm>	 PROBLEM - puppet last run on mw2280 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:55] <icinga-wm>	 PROBLEM - puppet last run on mw2233 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:55] <icinga-wm>	 PROBLEM - puppet last run on logstash1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:55] <icinga-wm>	 PROBLEM - puppet last run on eventlog1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:57] <icinga-wm>	 PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:59] <icinga-wm>	 PROBLEM - puppet last run on cp1075 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:05:59] <icinga-wm>	 PROBLEM - puppet last run on db1086 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:01] <icinga-wm>	 PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:09] <icinga-wm>	 PROBLEM - puppet last run on analytics1064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:09] <icinga-wm>	 PROBLEM - puppet last run on ores2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:09] <icinga-wm>	 PROBLEM - puppet last run on mw1345 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:11] <icinga-wm>	 PROBLEM - puppet last run on kubernetes1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:11] <icinga-wm>	 PROBLEM - puppet last run on db1062 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:17] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:17] <icinga-wm>	 PROBLEM - puppet last run on ores2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:17] <icinga-wm>	 PROBLEM - puppet last run on mw2256 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:19] <icinga-wm>	 PROBLEM - puppet last run on db2074 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:19] <icinga-wm>	 PROBLEM - puppet last run on mw2255 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:19] <icinga-wm>	 PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:19] <icinga-wm>	 PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:19] <icinga-wm>	 PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:19] <icinga-wm>	 PROBLEM - puppet last run on mw2202 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:19] <icinga-wm>	 PROBLEM - puppet last run on mw2169 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:21] <icinga-wm>	 PROBLEM - puppet last run on ms-be2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:23] <icinga-wm>	 PROBLEM - puppet last run on proton2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:23] <icinga-wm>	 PROBLEM - puppet last run on mw2251 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:23] <icinga-wm>	 PROBLEM - puppet last run on kubestagetcd1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:25] <icinga-wm>	 PROBLEM - puppet last run on kafka1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:27] <icinga-wm>	 PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:33] <icinga-wm>	 PROBLEM - puppet last run on mw2265 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:33] <icinga-wm>	 PROBLEM - puppet last run on mw2277 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:33] <icinga-wm>	 PROBLEM - puppet last run on ms-be2021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:39] <icinga-wm>	 PROBLEM - puppet last run on labsdb1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:39] <icinga-wm>	 PROBLEM - puppet last run on releases1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:39] <icinga-wm>	 PROBLEM - puppet last run on es1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:39] <icinga-wm>	 PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:39] <icinga-wm>	 PROBLEM - puppet last run on ms-be1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:45] <icinga-wm>	 PROBLEM - puppet last run on wtp1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:06:51] <icinga-wm>	 PROBLEM - puppet last run on db1106 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:05] <icinga-wm>	 PROBLEM - puppet last run on mw2225 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:07] <icinga-wm>	 PROBLEM - puppet last run on releases2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:07] <icinga-wm>	 PROBLEM - puppet last run on mw2220 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:41] <icinga-wm>	 PROBLEM - puppet last run on restbase1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:41] <icinga-wm>	 PROBLEM - puppet last run on wtp1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:43] <icinga-wm>	 PROBLEM - puppet last run on cloudnet2002-dev is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:47] <icinga-wm>	 PROBLEM - puppet last run on cp5003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:53] <icinga-wm>	 PROBLEM - puppet last run on wtp1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:55] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:55] <icinga-wm>	 PROBLEM - puppet last run on maps1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:07:57] <icinga-wm>	 PROBLEM - puppet last run on mw2201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[09:08:28] <moritzm>	 I've stopped irchecho, fixing puppet runs via cumin
[09:14:27] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) dbstore1002 just crashed: ` Thread pointer: 0x0x0 Attempting backtrace. You can use the following information to find out where mysql...
[09:14:47] <icinga-wm>	 RECOVERY - puppet last run on lvs4006 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:14:54] <marostegui>	 elukey: ^ dbstore1002 knows the time is arriving
[09:15:27] <icinga-wm>	 RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[09:16:13] <icinga-wm>	 RECOVERY - puppet last run on cp4025 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:16:21] <icinga-wm>	 RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:16:39] <icinga-wm>	 RECOVERY - puppet last run on mw2255 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[09:16:39] <icinga-wm>	 RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[09:16:39] <icinga-wm>	 RECOVERY - puppet last run on mw2177 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[09:17:27] <icinga-wm>	 RECOVERY - puppet last run on mw2223 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:17:29] <icinga-wm>	 RECOVERY - puppet last run on mw2225 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[09:18:09] <icinga-wm>	 RECOVERY - puppet last run on cloudnet2002-dev is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:18:39] <icinga-wm>	 RECOVERY - puppet last run on puppetmaster2002 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[09:18:39] <icinga-wm>	 RECOVERY - puppet last run on wtp2005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:18:41] <icinga-wm>	 RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:18:53] <icinga-wm>	 RECOVERY - puppet last run on ms-be2047 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:18:53] <icinga-wm>	 RECOVERY - puppet last run on gerrit2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:18:53] <icinga-wm>	 RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:18:55] <icinga-wm>	 RECOVERY - puppet last run on cp3049 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:18:55] <icinga-wm>	 RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:18:55] <icinga-wm>	 RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:19:03] <icinga-wm>	 RECOVERY - puppet last run on mw2162 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:19:03] <icinga-wm>	 RECOVERY - puppet last run on mc2023 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:19:23] <icinga-wm>	 RECOVERY - puppet last run on mw2253 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[09:19:23] <icinga-wm>	 RECOVERY - puppet last run on mw2141 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:19:33] <icinga-wm>	 RECOVERY - puppet last run on rdb2005 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[09:20:01] <icinga-wm>	 RECOVERY - puppet last run on mw2274 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:20:01] <icinga-wm>	 RECOVERY - puppet last run on acrux is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:20:03] <icinga-wm>	 RECOVERY - puppet last run on es2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:20:10] <elukey>	 marostegui: poor dbstore1002
[09:20:11] <icinga-wm>	 RECOVERY - puppet last run on cp2005 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[09:20:11] <icinga-wm>	 RECOVERY - puppet last run on scb2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:20:11] <icinga-wm>	 RECOVERY - puppet last run on wtp2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:20:13] <icinga-wm>	 RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[09:20:13] <icinga-wm>	 RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:20:13] <icinga-wm>	 RECOVERY - puppet last run on mw2172 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:20:15] <icinga-wm>	 RECOVERY - puppet last run on cp2011 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:20:17] <icinga-wm>	 RECOVERY - puppet last run on sessionstore2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:20:33] <icinga-wm>	 RECOVERY - puppet last run on restbase2016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:20:33] <icinga-wm>	 RECOVERY - puppet last run on elastic2053 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:20:37] <icinga-wm>	 RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[09:20:41] <icinga-wm>	 RECOVERY - puppet last run on db2045 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:20:41] <icinga-wm>	 RECOVERY - puppet last run on mw2176 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:20:43] <icinga-wm>	 RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:20:55] <icinga-wm>	 RECOVERY - puppet last run on mw2144 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:21:03] <icinga-wm>	 RECOVERY - puppet last run on es2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:21:07] <icinga-wm>	 RECOVERY - puppet last run on mc2019 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[09:21:09] <icinga-wm>	 RECOVERY - puppet last run on elastic2039 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[09:21:15] <icinga-wm>	 RECOVERY - puppet last run on mw2290 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:21:25] <icinga-wm>	 RECOVERY - puppet last run on mw2280 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:21:25] <icinga-wm>	 RECOVERY - puppet last run on mw2233 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[09:21:41] <icinga-wm>	 RECOVERY - puppet last run on ores2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:21:43] <icinga-wm>	 RECOVERY - puppet last run on mw2288 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:21:45] <icinga-wm>	 RECOVERY - puppet last run on mw2182 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[09:21:49] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:21:49] <icinga-wm>	 RECOVERY - puppet last run on mw2256 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:21:51] <icinga-wm>	 RECOVERY - puppet last run on mw2202 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:21:51] <icinga-wm>	 RECOVERY - puppet last run on db2074 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:21:51] <icinga-wm>	 RECOVERY - puppet last run on elastic2030 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures
[09:21:51] <icinga-wm>	 RECOVERY - puppet last run on mw2169 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[09:21:57] <icinga-wm>	 RECOVERY - puppet last run on proton2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:22:07] <icinga-wm>	 RECOVERY - puppet last run on mw2265 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:22:07] <icinga-wm>	 RECOVERY - puppet last run on mw2277 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:22:07] <icinga-wm>	 RECOVERY - puppet last run on ms-be2021 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures
[09:22:13] <icinga-wm>	 RECOVERY - puppet last run on ms-be2023 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[09:22:37] <icinga-wm>	 RECOVERY - puppet last run on es2013 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:22:39] <icinga-wm>	 RECOVERY - puppet last run on db2042 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[09:22:43] <icinga-wm>	 RECOVERY - puppet last run on releases2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:23:13] <icinga-wm>	 PROBLEM - Host elastic2038 is DOWN: PING CRITICAL - Packet loss = 100%
[09:23:29] <onimisionipe>	 huh
[09:23:31] <icinga-wm>	 RECOVERY - puppet last run on mw2201 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[09:23:41] <icinga-wm>	 RECOVERY - puppet last run on wtp2004 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[09:23:51] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2007 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:23:53] <icinga-wm>	 RECOVERY - puppet last run on pc2010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:24:01] <icinga-wm>	 RECOVERY - puppet last run on cp2012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:24:01] <icinga-wm>	 RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:24:01] <icinga-wm>	 RECOVERY - puppet last run on db2081 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:24:01] <icinga-wm>	 RECOVERY - puppet last run on pybal-test2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:24:01] <icinga-wm>	 RECOVERY - puppet last run on wtp2010 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:24:03] <icinga-wm>	 RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:24:03] <icinga-wm>	 RECOVERY - puppet last run on es2017 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:24:03] <icinga-wm>	 RECOVERY - puppet last run on mw2240 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:24:04] <icinga-wm>	 RECOVERY - puppet last run on mw2137 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:24:13] <icinga-wm>	 RECOVERY - puppet last run on elastic2048 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:24:13] <icinga-wm>	 RECOVERY - puppet last run on scb2004 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[09:24:17] <icinga-wm>	 RECOVERY - puppet last run on mw2252 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:24:21] <icinga-wm>	 RECOVERY - puppet last run on kafka2002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:24:31] <icinga-wm>	 RECOVERY - puppet last run on mw2287 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:24:35] <icinga-wm>	 RECOVERY - puppet last run on mw2156 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:24:43] <icinga-wm>	 RECOVERY - puppet last run on mw2266 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[09:24:45] <icinga-wm>	 RECOVERY - puppet last run on vega is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:24:49] <icinga-wm>	 RECOVERY - puppet last run on ms-be2032 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:25:17] <icinga-wm>	 RECOVERY - puppet last run on deploy2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:25:25] <icinga-wm>	 RECOVERY - puppet last run on mw2157 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:25:27] <icinga-wm>	 RECOVERY - puppet last run on pc2009 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:25:27] <icinga-wm>	 RECOVERY - puppet last run on ores2007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:25:49] <icinga-wm>	 RECOVERY - puppet last run on ms-be2025 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:26:07] <icinga-wm>	 RECOVERY - puppet last run on db2035 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:26:41] <icinga-wm>	 RECOVERY - puppet last run on cp1075 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[09:26:59] <icinga-wm>	 RECOVERY - puppet last run on elastic2035 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:27:01] <icinga-wm>	 RECOVERY - puppet last run on ores2002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:27:03] <icinga-wm>	 RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:27:03] <icinga-wm>	 RECOVERY - puppet last run on ms-be2017 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:27:07] <icinga-wm>	 RECOVERY - puppet last run on mw2251 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:27:19] <icinga-wm>	 RECOVERY - puppet last run on an-worker1079 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:27:31] <icinga-wm>	 RECOVERY - puppet last run on wtp1026 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[09:27:57] <icinga-wm>	 RECOVERY - puppet last run on mw2220 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[09:28:27] <icinga-wm>	 RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[09:28:29] <icinga-wm>	 RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[09:28:33] <icinga-wm>	 RECOVERY - puppet last run on db1073 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures
[09:28:43] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1006 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[09:28:43] <icinga-wm>	 RECOVERY - puppet last run on maps1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:28:47] <icinga-wm>	 RECOVERY - puppet last run on ms-be1018 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[09:28:49] <icinga-wm>	 RECOVERY - puppet last run on rdb1006 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures
[09:29:13] <icinga-wm>	 RECOVERY - puppet last run on aluminium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:29:13] <icinga-wm>	 RECOVERY - puppet last run on db1121 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures
[09:29:13] <icinga-wm>	 RECOVERY - puppet last run on druid1002 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[09:29:13] <icinga-wm>	 RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:29:13] <icinga-wm>	 RECOVERY - puppet last run on restbase1010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:29:15] <icinga-wm>	 RECOVERY - puppet last run on pc1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:29:17] <icinga-wm>	 RECOVERY - puppet last run on labsdb1011 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures
[09:29:17] <icinga-wm>	 RECOVERY - puppet last run on snapshot1009 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures
[09:29:19] <icinga-wm>	 RECOVERY - puppet last run on analytics1077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:29:19] <icinga-wm>	 RECOVERY - puppet last run on ores1008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:29:23] <icinga-wm>	 RECOVERY - puppet last run on analytics1053 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[09:29:29] <icinga-wm>	 RECOVERY - puppet last run on actinium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:29:31] <icinga-wm>	 RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:29:37] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1011 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:29:47] <icinga-wm>	 RECOVERY - puppet last run on debmonitor1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:29:55] <icinga-wm>	 RECOVERY - puppet last run on boron is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:29:59] <icinga-wm>	 RECOVERY - puppet last run on analytics1055 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:29:59] <icinga-wm>	 RECOVERY - puppet last run on db1097 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[09:29:59] <icinga-wm>	 RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:29:59] <icinga-wm>	 RECOVERY - puppet last run on elastic1025 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[09:30:09] <icinga-wm>	 RECOVERY - puppet last run on analytics1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:13] <icinga-wm>	 RECOVERY - puppet last run on db1082 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[09:30:15] <icinga-wm>	 RECOVERY - puppet last run on mw1312 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:15] <icinga-wm>	 RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:30:15] <icinga-wm>	 RECOVERY - puppet last run on cloudvirt1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:15] <icinga-wm>	 RECOVERY - puppet last run on analytics1065 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:30:15] <icinga-wm>	 RECOVERY - puppet last run on db1070 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[09:30:17] <icinga-wm>	 RECOVERY - puppet last run on ms-be1019 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[09:30:19] <icinga-wm>	 RECOVERY - puppet last run on elastic1043 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:30:19] <icinga-wm>	 RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:19] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1008 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[09:30:19] <icinga-wm>	 RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[09:30:25] <icinga-wm>	 RECOVERY - puppet last run on elastic1048 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:29] <icinga-wm>	 RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:30:31] <icinga-wm>	 RECOVERY - puppet last run on maps1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:31] <icinga-wm>	 RECOVERY - puppet last run on aqs1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:41] <icinga-wm>	 RECOVERY - puppet last run on dbmonitor1001 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[09:30:41] <icinga-wm>	 RECOVERY - puppet last run on mw1242 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:47] <icinga-wm>	 RECOVERY - puppet last run on restbase1017 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[09:30:51] <icinga-wm>	 RECOVERY - puppet last run on restbase1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:30:51] <icinga-wm>	 RECOVERY - puppet last run on mw1346 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[09:31:01] <icinga-wm>	 RECOVERY - puppet last run on an-worker1082 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:31:01] <icinga-wm>	 RECOVERY - puppet last run on mw1322 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:31:03] <icinga-wm>	 RECOVERY - puppet last run on ms-be1032 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:31:07] <icinga-wm>	 RECOVERY - puppet last run on db1118 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:31:11] <icinga-wm>	 RECOVERY - puppet last run on an-worker1078 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:31:11] <icinga-wm>	 RECOVERY - puppet last run on db1083 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[09:31:19] <icinga-wm>	 RECOVERY - puppet last run on analytics1051 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:31:23] <icinga-wm>	 RECOVERY - puppet last run on analytics1050 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:31:23] <icinga-wm>	 RECOVERY - puppet last run on druid1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:31:33] <icinga-wm>	 RECOVERY - puppet last run on cloudvirt1028 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:31:33] <icinga-wm>	 RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:31:45] <icinga-wm>	 RECOVERY - puppet last run on db1117 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:31:47] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1004 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[09:31:49] <icinga-wm>	 RECOVERY - puppet last run on logstash1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:31:49] <icinga-wm>	 RECOVERY - puppet last run on eventlog1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:31:51] <icinga-wm>	 RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures
[09:31:53] <icinga-wm>	 RECOVERY - puppet last run on db1086 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[09:32:01] <icinga-wm>	 RECOVERY - puppet last run on analytics1064 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:32:03] <icinga-wm>	 RECOVERY - puppet last run on db1062 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:32:03] <icinga-wm>	 RECOVERY - puppet last run on mw1345 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[09:32:05] <icinga-wm>	 RECOVERY - puppet last run on kubernetes1004 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[09:32:09] <icinga-wm>	 RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:32:17] <icinga-wm>	 RECOVERY - puppet last run on kubestagetcd1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:32:19] <icinga-wm>	 RECOVERY - puppet last run on kafka1013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:32:23] <icinga-wm>	 RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:32:33] <icinga-wm>	 RECOVERY - puppet last run on labsdb1010 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[09:32:33] <icinga-wm>	 RECOVERY - puppet last run on releases1001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[09:32:33] <icinga-wm>	 RECOVERY - puppet last run on es1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:32:33] <icinga-wm>	 RECOVERY - puppet last run on ms-be1034 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:32:47] <icinga-wm>	 RECOVERY - puppet last run on db1106 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:33:43] <icinga-wm>	 RECOVERY - puppet last run on restbase1016 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:33:43] <icinga-wm>	 RECOVERY - puppet last run on wtp1040 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:33:51] <icinga-wm>	 RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:33:53] <icinga-wm>	 RECOVERY - puppet last run on wtp1046 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:33:53] <icinga-wm>	 RECOVERY - puppet last run on elastic1026 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:33:57] <icinga-wm>	 RECOVERY - puppet last run on ms-be1043 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:34:05] <icinga-wm>	 RECOVERY - puppet last run on mc1028 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:34:05] <icinga-wm>	 RECOVERY - puppet last run on mw1309 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:34:27] <icinga-wm>	 RECOVERY - puppet last run on restbase1011 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:34:29] <icinga-wm>	 RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:34:41] <icinga-wm>	 RECOVERY - puppet last run on pc1010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:34:41] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1007 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:34:45] <icinga-wm>	 RECOVERY - puppet last run on ms-be1028 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:34:49] <icinga-wm>	 RECOVERY - puppet last run on mw1287 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:35:17] <icinga-wm>	 RECOVERY - puppet last run on labsdb1009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:35:17] <icinga-wm>	 RECOVERY - puppet last run on mw1295 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:35:17] <icinga-wm>	 RECOVERY - puppet last run on labweb1002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:36:03] <icinga-wm>	 RECOVERY - puppet last run on cloudvirt1019 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:44:19] <wikibugs>	 10Operations, 10ops-codfw: elastic2038 CPU/memory errors - https://phabricator.wikimedia.org/T217398 (10MoritzMuehlenhoff)
[09:45:46] <wikibugs>	 10Operations, 10ops-codfw: elastic2038 CPU/memory errors - https://phabricator.wikimedia.org/T217398 (10Mathew.onipe) p:05Triage→03High
[09:54:26] <wikibugs>	 (03PS1) 10Muehlenhoff: Create /etc/debdeploy-autorestarts.conf which lists all automated restarts [puppet] - 10https://gerrit.wikimedia.org/r/493659
[09:58:18] <wikibugs>	 (03PS1) 10Ema: trafficserver (8.0.2-1wm1) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/493660
[10:01:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] trafficserver (8.0.2-1wm1) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/493660 (owner: 10Ema)
[10:22:08] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] cloudelastic: Add cloudelastic configs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[10:23:45] <wikibugs>	 (03PS2) 10Ema: trafficserver (8.0.2-1wm1) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/493660
[10:23:51] <icinga-wm>	 PROBLEM - puppet last run on prometheus2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 seconds ago with 1 failures. Failed resources (up to 3 shown): File[/srv/prometheus/k8s/prometheus.yml]
[10:26:20] <godog>	 that's me ^
[10:26:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] trafficserver (8.0.2-1wm1) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/493660 (owner: 10Ema)
[10:27:32] <ema>	 that's debian-glue timing out after 180s ^
[10:29:03] <icinga-wm>	 RECOVERY - puppet last run on prometheus2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:30:49] <ema>	 which is strange because I did set BUILD_TIMEOUT to one hour: https://github.com/wikimedia/integration-config/blob/master/zuul/parameter_functions.py#L129
[10:42:47] <wikibugs>	 (03PS15) 10Mathew.onipe: cloudelastic: Add cloudelastic configs [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921)
[10:43:00] <wikibugs>	 (03CR) 10Mathew.onipe: cloudelastic: Add cloudelastic configs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[10:43:20] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Release-Engineering-Team (Kanban): zuul seemingly ignoring BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10ema)
[10:43:31] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Release-Engineering-Team (Kanban): zuul seemingly ignoring BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10ema) p:05Triage→03Normal
[10:47:24] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] cloudelastic: Add cloudelastic configs [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[10:59:28] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203)
[11:00:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[11:03:25] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493652 (owner: 10Marostegui)
[11:04:36] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493652 (owner: 10Marostegui)
[11:04:48] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493652 (owner: 10Marostegui)
[11:05:43] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1094 (duration: 00m 50s)
[11:05:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:06] <icinga-wm>	 RECOVERY - ensure kvm processes are running on labvirt1008 is OK: PROCS OK: 1 process with regex args /usr/bin/kvm
[11:17:32] <jbond42>	 !log rebooting labstore2004.codfw.wmnet
[11:17:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:19:06] <wikibugs>	 (03PS1) 10Elukey: hadoop: move ssl configs rendering out of hadoop.pp [puppet/cdh] - 10https://gerrit.wikimedia.org/r/493668
[11:20:40] <wikibugs>	 (03Abandoned) 10Elukey: hadoop: move ssl configs rendering out of hadoop.pp [puppet/cdh] - 10https://gerrit.wikimedia.org/r/493668 (owner: 10Elukey)
[11:20:53] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Add citoid specific statsd mappings [deployment-charts] - 10https://gerrit.wikimedia.org/r/493669 (https://phabricator.wikimedia.org/T213194)
[11:20:57] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Publish citoid 0.0.2 version [deployment-charts] - 10https://gerrit.wikimedia.org/r/493670 (https://phabricator.wikimedia.org/T213194)
[11:41:39] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/493659 (owner: 10Muehlenhoff)
[11:42:22] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: Serve >= 50% of production Prometheus systems with Prometheus v2 - https://phabricator.wikimedia.org/T187987 (10fgiunchedi) To test the new plan above I've started an rsync + migration of all instances of prometheus2003, starting from a snapshot of data from pr...
[11:47:46] <jbond42>	 !log rebooting labsdb1005.codfw.wmnet
[11:47:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:50:16] <wikibugs>	 (03PS1) 10KartikMistry: WIP: Enable ExternalGuidance to all Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493672 (https://phabricator.wikimedia.org/T216129)
[11:56:07] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received
[11:57:17] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[11:58:31] <icinga-wm>	 PROBLEM - mysqld processes on labsdb1005 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld
[11:58:57] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received
[11:58:58] <godog>	 expected looks like, jbond42 
[11:59:09] <jbond42>	 investigating now godog 
[11:59:13] <apergos>	 got the page...
[11:59:19] <arturo>	 labsdb1005 ?
[11:59:22] <apergos>	 uh huh
[11:59:35] <wikibugs>	 (03Restored) 10Elukey: hadoop: move ssl configs rendering out of hadoop.pp [puppet/cdh] - 10https://gerrit.wikimedia.org/r/493668 (owner: 10Elukey)
[11:59:47] <icinga-wm>	 RECOVERY - mysqld processes on labsdb1005 is OK: PROCS OK: 1 process with command name mysqld
[11:59:51] <jbond42>	 it was just rebooted for https://phabricator.wikimedia.org/T216802
[11:59:56] <jbond42>	 seems mysql didn't start
[12:00:02] <apergos>	 yeah that would have been it, if it wasn't downtimed
[12:00:39] <jbond42>	 it was downtimed in icinga but mysql didn't start when it came back up
[12:00:44] <apergos>	 hm
[12:01:17] <elukey>	 were all services on the host downtimed as well?
[12:01:30] <elukey>	 (trying to figure out why it paged)
[12:01:55] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] hadoop: move ssl configs rendering out of hadoop.pp [puppet/cdh] - 10https://gerrit.wikimedia.org/r/493668 (owner: 10Elukey)
[12:02:00] <jbond42>	 the downtime had finished becase the host came back up.  the alert was valid.  when the box came back up mysql and maradb was not started.  i had to start them manully
[12:03:00] <arturo>	 it seems jynus manually killed mysql on this server on 2019-02-18 according to the SAL
[12:04:48] <apergos>	 was it meant to remain not running?
[12:04:59] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 500 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Pr
[12:04:59] <icinga-wm>	 from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 500 (expecting: 200)
[12:05:20] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC is happy: https://puppet-compiler.wmflabs.org/compiler1002/14938/" [puppet] - 10https://gerrit.wikimedia.org/r/487129 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe)
[12:05:27] <arturo>	 not sure apergos, jynus should know more (or brooke, but is the middle of the night for her)
[12:05:33] <apergos>	 ok
[12:05:54] <apergos>	 it was not running at the time of the reboot?
[12:06:28] <jynus>	 mysql doesn't start automatically on reboot
[12:06:34] <jynus>	 that is a feature, not a bug
[12:06:36] <apergos>	 ah there's the answer, thank you
[12:06:40] <arturo>	 :-)
[12:06:53] <jynus>	 if you don't like it, you can configure the class to do so
[12:06:53] <jbond42>	 thanks jynus 
[12:06:54] <apergos>	 so the next question is about avoiding pages on reboot, for that service
[12:07:06] <jynus>	 but a) don't set it as default b) I don't recommend it
[12:07:25] <jynus>	 but you are free to do I think is autostart=1 parameter
[12:07:35] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 500 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a r
[12:07:35] <icinga-wm>	 ved
[12:08:32] <jynus>	 $ensure   = stopped is the paramter, on the mariadb::service
[12:08:52] <jynus>	 + managed = true
[12:09:17] <jynus>	 manage, not managed
[12:09:51] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 500 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Pr
[12:09:51] <icinga-wm>	 from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 500 (expecting: 200)
[12:13:31] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy
[12:16:54] <wikibugs>	 (03CR) 10Muehlenhoff: Add ability to filter out auto restarts (031 comment) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493463 (owner: 10Jbond)
[12:18:33] <icinga-wm>	 PROBLEM - configured eth on proton1002 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.64.32.61: Connection reset by peer
[12:19:39] <icinga-wm>	 RECOVERY - configured eth on proton1002 is OK: OK - interfaces up
[12:19:43] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobi
[12:19:43] <icinga-wm>	 ed the unexpected status 500 (expecting: 200)
[12:22:17] <icinga-wm>	 PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[12:22:52] <wikibugs>	 (03PS1) 10Jbond: Remove unused libraries and use collections.defaultdict [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493675
[12:22:54] * akosiaris looking into proton
[12:23:45] <akosiaris>	  ps auxww |grep chromium |wc -l
[12:23:45] <akosiaris>	 32
[12:23:47] <akosiaris>	 gulp
[12:24:05] <akosiaris>	 either someone decided to pdfize a ton of articles or there's a bug
[12:24:33] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received
[12:24:42] <akosiaris>	 proton1001 is even better... 99 chromium processes
[12:24:43] <icinga-wm>	 RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[12:25:11] <akosiaris>	 ah great... stuck from Jan 30
[12:25:13] <akosiaris>	 perfect
[12:26:55] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy
[12:31:44] <akosiaris>	 !log restart proton on proton1001, counted 99 chromium processes left running since at least Jan 30
[12:31:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:50] <akosiaris>	 !log restart proton1002, OOM showed up
[12:32:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:35:24] <akosiaris>	 https://grafana.wikimedia.org/d/000000563/proton?orgId=1
[12:35:33] <icinga-wm>	 PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[12:35:42] <akosiaris>	 hmm indeed someone is creating a lot of pdfs
[12:36:45] <icinga-wm>	 RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[12:37:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493675 (owner: 10Jbond)
[12:38:01] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Release-Engineering-Team (Kanban): zuul seemingly ignoring BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10hashar) The job has a BUILD_TIMEOUT parameter that defaults to 30 (minutes). That is configured in the job itself (as w...
[12:38:47] <akosiaris>	 seems like someone is trying to pdfize large parts of de.wikipedia.org
[12:40:15] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - proton_24766: Servers proton1002.eqiad.wmnet are marked down but pooled
[12:40:27] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on proton.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[12:40:44] <vgutierrez>	 uh
[12:41:03] <akosiaris>	 yeah big incoming traffic for pdfs
[12:41:11] <akosiaris>	 proton seems to not be able to keep up with the rate
[12:41:35] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on proton.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 951 bytes in 0.080 second response time
[12:41:35] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy
[12:42:07] <apergos>	 any way to throttle them?
[12:42:13] <akosiaris>	 that's what I am searching
[12:42:24] <akosiaris>	 supposedly proton has some queue but ...
[12:42:38] <akosiaris>	 it does return queue is full but still
[12:44:15] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Release-Engineering-Team (Kanban): zuul seemingly ignoring BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10hashar) I have refreshed the jenkins job just in case but it still comes hardcoded to 3 minutes apparently ;-(
[12:44:31] <akosiaris>	 I 'll silence the paging alert just to avoid more pages for the next couple of hours
[12:44:38] <akosiaris>	 I 'll leave the rest of the alerts as is however
[12:45:00] <apergos>	 f it can't keep up with the queue once it's full, maybe the queue needs to be shorter (so more requests are rejected)
[12:45:57] <akosiaris>	 the config says it's 3
[12:46:01] <akosiaris>	 whatever that 3 means
[12:46:32] <akosiaris>	       render_concurrency: 3
[12:46:32] <akosiaris>	       render_queue_timeout: 60
[12:46:33] <akosiaris>	       render_execution_timeout: 90
[12:46:33] <akosiaris>	       max_render_queue_size: 50
[12:46:33] <apergos>	 ugh
[12:46:45] <apergos>	 queue_size? maybe that?
[12:46:49] <apergos>	 but who knows
[12:47:08] <akosiaris>	 I can lower it, but supposedly with a render_concurrency of 3 we should not be having this problem
[12:47:15] <akosiaris>	 unless those things are really badly named
[12:47:21] <akosiaris>	 anyway, sure, I 'll drop it to 20
[12:49:58] <akosiaris>	 !log lower max_render_queue_size: to 20 for proton on proton100{1,2} 
[12:49:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:50:28] <_joe_>	 maybe we can rate-limit per ip at the varnish layer?
[12:50:31] <_joe_>	 ema: ^^
[12:50:36] <akosiaris>	 we already do IIRC
[12:50:45] <_joe_>	 but specifically for pdfs
[12:50:49] <_joe_>	 to like 1 per second
[12:50:54] <akosiaris>	 although it's almost certainly more than what this endpoint can survive
[12:50:56] <akosiaris>	 yeah sure
[12:51:21] <_joe_>	 <insert the "if proton was on k8s" comment>
[12:52:34] <apergos>	 The  maximum number of simultaneous requests the server can render successfully is  `max_render_queue_size + render_concurrency`. (from the docs)
[12:52:42] <apergos>	 guess we'll see what happens
[12:53:49] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 500 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Pr
[12:53:49] <icinga-wm>	 from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 500 (expecting: 200)
[12:57:27] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy
[12:57:39] <_joe_>	 akosiaris: want me to look into rate-limiting in varnish?
[12:58:02] <ema>	 _joe_: we can do that, we'll have to publish the IP though?
[12:58:26] <_joe_>	 ema: we want to limit the pdf creation urls I guess, not a specific ip
[12:58:36] <akosiaris>	 I think I can get them blocked at the router level as well
[12:58:49] <_joe_>	 if we want to just ban one ip, sure
[12:58:59] <ema>	 _joe_: smart! :)
[12:58:59] <_joe_>	 if we have a specific UA, even better
[12:59:24] <_joe_>	 ema: my idea was rate-limit to something like 1 request/IP/second
[12:59:56] <_joe_>	 but now that I think about it, what would we block? the url is localized IIRC
[13:00:00] <_joe_>	 :/
[13:00:48] <akosiaris>	 this /api/rest_v1/page/pdf/Esther_Sunday
[13:00:50] <_joe_>	 Special:Book I mean
[13:00:59] <akosiaris>	 no, it's over the restbase API
[13:01:13] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received
[13:01:32] <_joe_>	 oh ok
[13:01:40] <_joe_>	 rb has its own ratelimiting IIRC?
[13:01:53] <akosiaris>	 so now we are under some minor control
[13:02:05] <ema>	 yes rb does have its own rate limiter
[13:02:05] <akosiaris>	 I 've lowered both settings enough to allow the box to continue existing
[13:02:16] <_joe_>	 akosiaris: ok
[13:02:20] <ema>	 if (vsthrottle.is_denied("rest:" + req.http.X-Client-IP, 1000, 10s))
[13:02:23] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy
[13:02:36] <_joe_>	 ema:  no I meant inside the software
[13:02:50] <_joe_>	 rb can do concurrency limits
[13:02:57] <_joe_>	 to backends
[13:04:50] <akosiaris>	 ok if my back of the envelope calculations are correct there is an austrian IP having done some 36k requests to pdf restbase api since 06:27 this morning with probably the bulk of it happening since 11:00
[13:04:53] <akosiaris>	 all times UTC 
[13:05:09] <apergos>	 ouch!
[13:05:31] <wikibugs>	 (03CR) 10Mathew.onipe: "some comments" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/493234 (https://phabricator.wikimedia.org/T217196) (owner: 10DCausse)
[13:05:42] <apergos>	 blocking them directly (a 404 with body that explains why) would be nice if it were doable
[13:05:48] <apergos>	 *403
[13:05:58] <akosiaris>	 which is about 5 req/s so well without the global rate limits
[13:06:14] <akosiaris>	 within*
[13:06:43] <akosiaris>	 per https://grafana.wikimedia.org/d/000000563/proton?orgId=1&from=now-3h&to=now they 've up to at a max of 20
[13:06:43] <_joe_>	 A 429 maybe gentler
[13:07:11] <apergos>	 sure
[13:07:17] <akosiaris>	 now they are averaging on the 5req/s 
[13:07:33] <wikibugs>	 10Operations, 10ExternalGuidance, 10Traffic, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Deliver mobile-based version for automatic translations - https://phabricator.wikimedia.org/T212197 (10Pginer-WMF) >>! In T212197#4991170, @dr0ptp4kt wrote: > What's our latest read on the releas...
[13:08:07] <apergos>	 I guess it's a ua that looks like a browser?
[13:08:28] <akosiaris>	 good q, looking
[13:08:31] <apergos>	 no contact info inside the string or anything like that (script best practices as we recommend)
[13:08:32] <apergos>	 ?
[13:08:50] <wikibugs>	 (03PS1) 10Ema: varnish: rate limit proton [puppet] - 10https://gerrit.wikimedia.org/r/493683
[13:08:53] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Create /etc/debdeploy-autorestarts.conf which lists all automated restarts [puppet] - 10https://gerrit.wikimedia.org/r/493659 (owner: 10Muehlenhoff)
[13:09:19] <akosiaris>	 "user_agent":"-"
[13:09:21] <akosiaris>	 ok it's a bot
[13:09:25] <wikibugs>	 (03PS2) 10Ema: varnish: rate limit proton [puppet] - 10https://gerrit.wikimedia.org/r/493683
[13:09:29] <apergos>	 bockblockblock
[13:09:52] <akosiaris>	 "uri_path":"/api/rest_v1/page/pdf/Liste_der_h/u00f6chsten_Bauwerke_in_Sierra_Leone","uri_query":"","content_type":"application/problem+json","referer":"-","user_agent":"-","accept_language":"-"
[13:09:58] <akosiaris>	 yeah definitely a bot
[13:10:36] <apergos>	 if we're interruptng work by some community member or a researcher, they can find an email or irc channel info and come ask
[13:11:04] <wikibugs>	 (03PS3) 10Ema: varnish: rate limit proton [puppet] - 10https://gerrit.wikimedia.org/r/493683
[13:12:00] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] varnish: rate limit proton [puppet] - 10https://gerrit.wikimedia.org/r/493683 (owner: 10Ema)
[13:12:42] <apergos>	 ema, that's 10 req in 10 seconds?
[13:12:56] <akosiaris>	 yeah so 1req/s 
[13:12:57] <ema>	 apergos: yes
[13:13:04] <apergos>	 👍
[13:13:37] <ema>	 right, with 10 burst
[13:13:47] <akosiaris>	 hm
[13:14:00] <akosiaris>	 how is burst calculated?
[13:15:27] <ema>	 the idea as I understand it is that you can perform 10 req in 10s, so it's fine to perform 10 requests in 1s, but then for the remaining 9s you're rate limited
[13:15:51] <wikibugs>	 (03PS2) 10Jbond: Add ability to filter out auto restarts [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493463
[13:17:03] <apergos>	 good if it works
[13:17:20] <ema>	 akosiaris: ok to merge or do you want to change anything? 
[13:17:23] <apergos>	 I mean I don't love a burst of 10, if that could be capped too it would be better
[13:17:54] <akosiaris>	 ema: let's see how it works
[13:18:07] <akosiaris>	 cause it sounds ok in principle
[13:18:19] <wikibugs>	 (03CR) 10Ema: [C: 03+2] varnish: rate limit proton [puppet] - 10https://gerrit.wikimedia.org/r/493683 (owner: 10Ema)
[13:18:58] <akosiaris>	 https://grafana.wikimedia.org/d/000000563/proton?orgId=1&from=now-30m&to=now
[13:19:04] <akosiaris>	 hmm maybe they 've paused
[13:20:12] <wikibugs>	 (03CR) 10DCausse: [WIP] Add support for elasticsearch 6 (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/493234 (https://phabricator.wikimedia.org/T217196) (owner: 10DCausse)
[13:20:32] <apergos>	 that would be fine too
[13:22:29] <akosiaris>	 ema: let me know when the change has fully propagate so I can revert the configs in their original settings in the proton side
[13:24:04] <ema>	 akosiaris: either the usual 30m or I can cumin a puppet agent run if you wish 
[13:24:30] <moritzm>	 !log removed sca* hosts from debmonitor database
[13:24:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:38] <ema>	 akosiaris: in that case, is esams enough or was the IP non-EU?
[13:24:51] <akosiaris>	 esams is enough I 'd say
[13:26:11] <akosiaris>	 in any case things have returned to normal
[13:26:43] <apergos>	 akosiaris: maybe those settings shouldn't be reverted?
[13:26:56] <akosiaris>	 lol
[13:27:14] <akosiaris>	 could be, but it should be after a discussion with the team owning it
[13:27:41] <apergos>	 if the settings are such that proton can now keep up with its queue when there are a larger number of incming requests, maybe that's what we want
[13:28:08] <akosiaris>	 sure, but we don't know if it was the settings change or the bot just quit
[13:28:20] <apergos>	 hm you have a point
[13:28:45] <apergos>	 guess this needs to be a task, meh
[13:28:47] <ema>	 akosiaris: change fully applied to text_esams 
[13:28:53] <akosiaris>	 ema: cool, thanks
[13:29:03] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Release-Engineering-Team (Kanban): zuul seemingly ignoring BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10hashar) I created a job from scratch in jjb: ` - job:     name: build-timeout-jjb     node: contint1001     parameters:...
[13:29:16] <akosiaris>	 I 'll monitor this for the next hour or so
[13:29:24] <akosiaris>	 I haven't yet reverted the settings anyway
[13:30:45] <akosiaris>	 apergos: yeah it needs to be a task. I 'll file one, but after lunch
[13:30:57] <wikibugs>	 (03PS1) 10Mforns: Add timer to delete analytics EL unsanitized events after 90d [puppet] - 10https://gerrit.wikimedia.org/r/493687 (https://phabricator.wikimedia.org/T209503)
[13:31:01] <apergos>	 enjoy! (your lunch)
[13:39:35] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Release-Engineering-Team (Kanban), 10Upstream: zuul seemingly ignoring BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10hashar) The root cause is JJB tries to fetch informations for a plugin named `Jenkins build timeout plugi...
[13:40:57] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Release-Engineering-Team (Kanban), 10Upstream: Jenkins job builder ignores BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10hashar)
[14:00:58] <ema>	 akosiaris: I do see a bunch of requests being rate-limited now
[14:02:18] <wikibugs>	 (03CR) 10Herron: "Thanks! Normally I'd agree on waiting until Monday, but in this case the patch will persist the config change that was made to address the" [puppet] - 10https://gerrit.wikimedia.org/r/493610 (https://phabricator.wikimedia.org/T200960) (owner: 10Herron)
[14:10:20] <wikibugs>	 (03CR) 10Mathew.onipe: [WIP] Add support for elasticsearch 6 (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/493234 (https://phabricator.wikimedia.org/T217196) (owner: 10DCausse)
[14:15:48] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Release-Engineering-Team (Kanban), 10Upstream: Jenkins job builder ignores BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10hashar)
[14:22:14] <akosiaris>	 ema: ah nice!
[14:22:16] <akosiaris>	 thanks!
[14:22:31] <akosiaris>	 I do see a minor bump in requests but nothing alarming
[14:26:12] <wikibugs>	 (03CR) 10Mathew.onipe: [WIP] Add support for elasticsearch 6 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/493234 (https://phabricator.wikimedia.org/T217196) (owner: 10DCausse)
[14:26:47] <wikibugs>	 (03CR) 10CDanis: Add ganeti->netbox sync script (032 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov)
[14:33:39] <hashar>	 !log Updating all debian-glue Jenkins job to properly take in account the BUILD_TIMEOUT parameter # T217403
[14:33:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:43] <stashbot>	 T217403: Jenkins job builder ignores BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403
[14:36:38] <wikibugs>	 (03PS1) 10Elukey: hadoop: allow the configuration of ssl-(server|client).xml configs [puppet] - 10https://gerrit.wikimedia.org/r/493693
[14:38:12] <wikibugs>	 10Operations, 10Traffic: Indexing of https://www.wikidata.org in the Yandex Search Engine - https://phabricator.wikimedia.org/T217407 (10Anomie) This has nothing to do with the API itself, it's a question about use of the API. So I'm going to remove #mediawiki-api. It may be that the best way to index wikidata...
[14:38:39] <wikibugs>	 (03CR) 10Hashar: "recheck" [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/493660 (owner: 10Ema)
[14:39:00] <ema>	 hashar: thanks for working on the timeout thing :)
[14:39:12] <hashar>	 I am surprised we havent had the issue before :/
[14:39:18] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1024 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n] /usr/bin/nova-compute
[14:39:46] <ema>	 does the majority of our software build in < 3 minutes?
[14:44:21] <vgutierrez>	 seeing that bug.. it looks like it ;P
[14:44:41] <wikibugs>	 (03CR) 10Mathew.onipe: Add cookbook for elastic6 upgrade (034 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/493436 (owner: 10DCausse)
[14:48:27] <hashar_>	 ema: https://integration.wikimedia.org/ci/job/debian-glue/1446/ it works :)
[14:48:57] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Traffic, 10Release-Engineering-Team (Kanban), 10Upstream: Jenkins job builder ignores BUILD_TIMEOUT - https://phabricator.wikimedia.org/T217403 (10hashar) 05Open→03Resolved It works ! :)
[14:51:55] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/493610 (https://phabricator.wikimedia.org/T200960) (owner: 10Herron)
[14:52:03] <wikibugs>	 (03PS1) 10Muehlenhoff: Allow filtering services for restart notification (WIP) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493697
[14:54:06] <ema>	 hashar_: \o/
[14:58:55] <wikibugs>	 (03CR) 10Muehlenhoff: Add ability to filter out auto restarts (033 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493463 (owner: 10Jbond)
[15:02:25] <akosiaris>	 !log restore proton config values
[15:02:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:07:47] <wikibugs>	 (03CR) 10DCausse: Add cookbook for elastic6 upgrade (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/493436 (owner: 10DCausse)
[15:14:05] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+1] Remove unused libraries and use collections.defaultdict [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493675 (owner: 10Jbond)
[15:14:12] <wikibugs>	 (03CR) 10Jbond: [V: 03+2 C: 03+2] Remove unused libraries and use collections.defaultdict [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493675 (owner: 10Jbond)
[15:14:27] <wikibugs>	 (03PS2) 10Elukey: hadoop: allow the configuration of ssl-(server|client).xml configs [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412)
[15:17:35] <wikibugs>	 (03PS3) 10Jbond: Add ability to filter out auto restarts [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493463
[15:17:47] <wikibugs>	 (03PS3) 10Elukey: hadoop: allow the configuration of ssl-(server|client).xml configs [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412)
[15:18:56] <wikibugs>	 (03PS2) 10Herron: logstash: disable persisted queue [puppet] - 10https://gerrit.wikimedia.org/r/493610 (https://phabricator.wikimedia.org/T200960)
[15:20:29] <wikibugs>	 (03PS4) 10Elukey: hadoop: allow the configuration of ssl-(server|client).xml configs [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412)
[15:20:48] <wikibugs>	 (03CR) 10Herron: [C: 03+2] logstash: disable persisted queue [puppet] - 10https://gerrit.wikimedia.org/r/493610 (https://phabricator.wikimedia.org/T200960) (owner: 10Herron)
[15:21:45] <wikibugs>	 (03PS4) 10Jbond: Add ability to filter out auto restarts [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493463
[15:21:55] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Logstash packet loss - https://phabricator.wikimedia.org/T200960 (10CDanis) FTR:  Twice in two months we've seen all the logstashen in one cluster 'lock up' at around the same time: stop processing incoming events, huge backlog of socket recv-Q bytes, J...
[15:22:24] <wikibugs>	 (03PS5) 10Elukey: hadoop: allow the configuration of ssl-(server|client).xml configs [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412)
[15:23:57] <wikibugs>	 (03CR) 10Ema: [C: 03+2] trafficserver (8.0.2-1wm1) stretch-wikimedia; urgency=medium [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/493660 (owner: 10Ema)
[15:32:57] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/492948 (owner: 10Aaron Schulz)
[15:33:46] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/14945/ seems a no-op for the current code, need to add secrets to labs_private and see ho" [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey)
[15:38:48] <ema>	 !log trafficserver_8.0.2-1wm1 uploaded to stretch-wikimedia
[15:38:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:54:36] <wikibugs>	 (03CR) 10Paladox: [V: 03+2 C: 03+2] "Verified locally that this builds." [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/493311 (owner: 10Paladox)
[15:56:05] <wikibugs>	 (03CR) 10Paladox: [V: 03+2 C: 03+2] "This builds upstream so is verified." [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/493412 (owner: 10Paladox)
[15:58:50] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] "Im seeing" [puppet] - 10https://gerrit.wikimedia.org/r/493317 (https://phabricator.wikimedia.org/T217287) (owner: 10Thcipriani)
[16:06:22] <wikibugs>	 (03PS4) 10Esanders: VE: Enable true section editing for mobile on labswiki & testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365)
[16:15:15] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203)
[16:15:54] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[16:27:47] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] sudo: use validate_cmd [puppet] - 10https://gerrit.wikimedia.org/r/492718 (owner: 10Giuseppe Lavagetto)
[16:33:49] <jbond42>	 !log rolling security update of bind9 packages on jessie and trusty
[16:33:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:36:56] <wikibugs>	 (03PS3) 10Jcrespo: mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203)
[16:38:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[16:39:53] <icinga-wm>	 PROBLEM - DPKG on restbase1011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[16:42:17] <icinga-wm>	 RECOVERY - DPKG on restbase1011 is OK: All packages OK
[16:45:00] <wikibugs>	 10Operations, 10Traffic: Indexing of https://www.wikidata.org in the Yandex Search Engine - https://phabricator.wikimedia.org/T217407 (10Reedy) >Downloading the wikidata dumps might not help in this situation as we need to crawl pages a user sees them.  Noting they're wanting to crawl the user facing pages (wh...
[16:57:24] <wikibugs>	 (03PS1) 10Bstorm: wikireplicas: correct join for logging_compat [puppet] - 10https://gerrit.wikimedia.org/r/493718 (https://phabricator.wikimedia.org/T212972)
[16:59:58] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wikireplicas: correct join for logging_compat [puppet] - 10https://gerrit.wikimedia.org/r/493718 (https://phabricator.wikimedia.org/T212972) (owner: 10Bstorm)
[17:01:51] <wikibugs>	 (03PS5) 10Jbond: Add ability to filter out auto restarts [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493463
[17:02:04] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "Fine by me then" [deployment-charts] - 10https://gerrit.wikimedia.org/r/493444 (https://phabricator.wikimedia.org/T206785) (owner: 10Ottomata)
[17:06:44] <wikibugs>	 (03PS1) 10Jbond: Remove if statement as we now use defaultdict [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493720
[17:07:05] <wikibugs>	 (03CR) 10Bartosz Dziewoński: [C: 04-1] "I think ‘labswiki’ is wikitech.wikimedia.org, not Beta Cluster wikis. Not sure if this is what you want?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders)
[17:08:04] <wikibugs>	 (03CR) 10Reedy: "Yeah, labswiki == wikitech" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders)
[17:10:47] <wikibugs>	 (03CR) 10Jbond: Add ability to filter out auto restarts (034 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493463 (owner: 10Jbond)
[17:23:03] <wikibugs>	 (03CR) 10CRusnov: "Thank you for the review!" (032 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov)
[17:25:37] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Update pc1007 rack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493722
[17:31:09] <wikibugs>	 10Operations, 10ops-eqiad: Update pc1007,pc1010 status on netbox - https://phabricator.wikimedia.org/T217429 (10Marostegui)
[18:21:32] <wikibugs>	 10Operations, 10ExternalGuidance, 10Traffic, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review: Deliver mobile-based version for automatic translations - https://phabricator.wikimedia.org/T212197 (10dr0ptp4kt) Thanks @Pginer-WMF. I've put a HOLD on the calendar for March 6 to get the Varnis...
[18:32:51] <icinga-wm>	 PROBLEM - DPKG on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused
[18:33:01] <icinga-wm>	 PROBLEM - Disk space on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused
[18:33:45] <icinga-wm>	 PROBLEM - dhclient process on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused
[18:33:45] <icinga-wm>	 PROBLEM - Check systemd state on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused
[18:33:59] <icinga-wm>	 PROBLEM - configured eth on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused
[18:34:03] <icinga-wm>	 PROBLEM - MD RAID on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused
[18:35:25] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused
[18:37:09] <icinga-wm>	 PROBLEM - puppet last run on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused
[18:45:43] <icinga-wm>	 PROBLEM - SSH on notebook1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[18:45:52] <robh>	 hrmmm
[18:45:55] <robh>	 was that planned?
[18:46:47] <icinga-wm>	 RECOVERY - SSH on notebook1003 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u3 (protocol 2.0)
[18:48:08] <robh>	 =/
[18:53:41] <robh>	 !log notebook1003 has unusually high load recently (23) and seemed to lag in reporting to icinga.  no hardware failures, pinged about it in #wikimedia-analytics
[18:53:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:01:41] <icinga-wm>	 PROBLEM - IPMI Sensor Status on notebook1003 is CRITICAL: connect to address 10.64.21.109 port 5666: Connection refused
[19:12:54] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for sukhe - https://phabricator.wikimedia.org/T217438 (10ssingh)
[19:14:20] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for sukhe - https://phabricator.wikimedia.org/T217438 (10ssingh)
[19:16:03] <icinga-wm>	 RECOVERY - configured eth on notebook1003 is OK: OK - interfaces up
[19:16:05] <icinga-wm>	 RECOVERY - MD RAID on notebook1003 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[19:16:07] <icinga-wm>	 RECOVERY - DPKG on notebook1003 is OK: All packages OK
[19:16:17] <icinga-wm>	 RECOVERY - Disk space on notebook1003 is OK: DISK OK
[19:16:59] <icinga-wm>	 RECOVERY - dhclient process on notebook1003 is OK: PROCS OK: 0 processes with command name dhclient
[19:16:59] <icinga-wm>	 RECOVERY - Check systemd state on notebook1003 is OK: OK - running: The system is fully operational
[19:17:23] <XioNoX>	 !log pre-configure asw-a5 ports on asw2-a5-eqiad - T187960
[19:17:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:27] <stashbot>	 T187960: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960
[19:18:41] <icinga-wm>	 RECOVERY - puppet last run on notebook1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[19:29:01] <XioNoX>	 !log pre-configure asw-a6 ports on asw2-a6-eqiad - T187960
[19:29:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:29:04] <stashbot>	 T187960: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960
[19:31:01] <wikibugs>	 (03PS10) 10CRusnov: Add ganeti->netbox sync script [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229)
[19:31:05] <wikibugs>	 (03CR) 10Aaron Schulz: "If an actual error occurred on the DC-local mc server, then the "worst" reply would be an error code rather than NOT_STORED ( https://gith" [puppet] - 10https://gerrit.wikimedia.org/r/492948 (owner: 10Aaron Schulz)
[19:31:17] <wikibugs>	 (03PS4) 10Jcrespo: mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203)
[19:31:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[19:31:30] <wikibugs>	 (03CR) 10Jcrespo: "Still needs more work." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[19:31:53] <icinga-wm>	 RECOVERY - IPMI Sensor Status on notebook1003 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK
[19:32:22] <wikibugs>	 (03PS5) 10Jcrespo: mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203)
[19:32:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[19:32:37] <XioNoX>	 !log pre-configure asw-a7 ports on asw2-a7-eqiad - T187960
[19:32:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:32:59] <wikibugs>	 (03PS6) 10Jcrespo: mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203)
[19:33:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: Refactor dump_section.py and rename to match functionality [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/493664 (https://phabricator.wikimedia.org/T206203) (owner: 10Jcrespo)
[19:34:12] <wikibugs>	 (03CR) 10Esanders: "is there a beta cluster group?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders)
[19:35:49] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on notebook1003 is OK: OK: synced at Fri 2019-03-01 19:35:48 UTC.
[19:40:09] <XioNoX>	 !log pre-configure asw-a8 ports on asw2-a8-eqiad - T187960
[19:40:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:40:13] <stashbot>	 T187960: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960
[19:49:23] <wikibugs>	 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10ayounsi) a:03Cmjohnson
[20:01:32] <wikibugs>	 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10ayounsi)
[20:13:35] <wikibugs>	 (03PS1) 10Sbisson: Enable and configure the ORES goodfaith model on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032)
[20:28:28] <wikibugs>	 (03PS1) 10Tpt: Enables maplink for geocoordinate Wikibase statements display on clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493753 (https://phabricator.wikimedia.org/T217442)
[20:41:11] <wikibugs>	 (03PS5) 10Esanders: VE: Enable true section editing for mobile on labs & testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365)
[20:42:43] <wikibugs>	 (03CR) 10Andrew Bogott: "Is the current plan that each postgres-using project will have its own postgres server?  And/or is wikilabels currently the only postgres " (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493608 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm)
[20:45:32] <wikibugs>	 (03CR) 10Bstorm: "> Patch Set 2:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493608 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm)
[20:46:19] <wikibugs>	 (03CR) 10Bstorm: wikilabels: stage the postgres roles for virtualizing the database (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493608 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm)
[20:47:07] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "ok :)" [puppet] - 10https://gerrit.wikimedia.org/r/493608 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm)
[20:49:13] <wikibugs>	 (03CR) 10Aaron Schulz: [C: 03+1] Set expiry headers on thumbnails [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/489022 (https://phabricator.wikimedia.org/T211661) (owner: 10Gilles)
[20:52:59] <wikibugs>	 (03PS3) 10Bstorm: wikilabels: stage the postgres roles for virtualizing the database [puppet] - 10https://gerrit.wikimedia.org/r/493608 (https://phabricator.wikimedia.org/T193264)
[20:55:55] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wikilabels: stage the postgres roles for virtualizing the database [puppet] - 10https://gerrit.wikimedia.org/r/493608 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm)
[21:13:28] <wikibugs>	 (03PS2) 10CDanis: partman: grub-install on all RAID{1,10} drives [puppet] - 10https://gerrit.wikimedia.org/r/490404 (https://phabricator.wikimedia.org/T215183)
[21:15:14] <wikibugs>	 10Operations, 10SRE-Access-Requests: Add bmansurov to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T217447 (10bmansurov)
[21:15:24] <wikibugs>	 10Operations, 10SRE-Access-Requests: Add bmansurov to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T217447 (10bmansurov)
[21:27:25] <wikibugs>	 (03CR) 10Bartosz Dziewoński: "I think we don't need the $wmgFoo variable if it's just an alias for $wgFoo. Just set 'wgFoo' directly in InitialiseSettings. I struggle t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders)
[21:44:15] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 239, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:54:55] <wikibugs>	 10Operations, 10ops-eqiad: Update pc1007,pc1010 status on netbox - https://phabricator.wikimedia.org/T217429 (10ayounsi) Mentioning dbproxy1013 here in case it's a similar case https://netbox.wikimedia.org/dcim/devices/1550/
[22:16:27] <wikibugs>	 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10ayounsi)
[22:36:27] <wikibugs>	 (03CR) 10Esanders: "can we fix that as tech debt. I'm just copying the existing style" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders)
[22:37:50] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch - https://phabricator.wikimedia.org/T213898 (10herron)
[22:37:58] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch - https://phabricator.wikimedia.org/T213898 (10herron)
[22:48:43] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for sukhe - https://phabricator.wikimedia.org/T217438 (10RobH)
[22:49:46] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for sukhe - https://phabricator.wikimedia.org/T217438 (10RobH) a:03Nuria So for this to be approved, we need the approval of the analytics team (since they manage the server.)  We'll also need them to tell us exactly what groups to include.
[22:51:13] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Add bmansurov to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T217447 (10RobH)
[23:22:02] <wikibugs>	 (03PS1) 10BryanDavis: wmcs: Add profiles for oidentd proxy and client modes [puppet] - 10https://gerrit.wikimedia.org/r/493767 (https://phabricator.wikimedia.org/T151704)
[23:23:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: Add profiles for oidentd proxy and client modes [puppet] - 10https://gerrit.wikimedia.org/r/493767 (https://phabricator.wikimedia.org/T151704) (owner: 10BryanDavis)
[23:23:57] <wikibugs>	 (03PS1) 10Bstorm: osmdb: stage the roles and profiles for virtualizing the servers [puppet] - 10https://gerrit.wikimedia.org/r/493769 (https://phabricator.wikimedia.org/T193264)
[23:26:10] <wikibugs>	 (03PS2) 10BryanDavis: wmcs: Add profiles for oidentd proxy and client modes [puppet] - 10https://gerrit.wikimedia.org/r/493767 (https://phabricator.wikimedia.org/T151704)
[23:27:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs: Add profiles for oidentd proxy and client modes [puppet] - 10https://gerrit.wikimedia.org/r/493767 (https://phabricator.wikimedia.org/T151704) (owner: 10BryanDavis)
[23:27:23] <icinga-wm>	 PROBLEM - puppet last run on doc1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:39:30] <wikibugs>	 (03PS3) 10BryanDavis: wmcs: Add profiles for oidentd proxy and client modes [puppet] - 10https://gerrit.wikimedia.org/r/493767 (https://phabricator.wikimedia.org/T151704)
[23:41:28] <wikibugs>	 (03CR) 10Bartosz Dziewoński: [C: 03+1] "Hm, yeah, fair enough. I hadn't looked at the file outside of the diff, but it seems like we do the same for other VisualEditor config var" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders)
[23:53:17] <icinga-wm>	 RECOVERY - puppet last run on doc1001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[23:54:23] <wikibugs>	 (03CR) 10Catrope: [C: 03+1] Enable and configure the ORES goodfaith model on itwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) (owner: 10Sbisson)