[00:00:04] <jouncebot>	 twentyafterfour: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Phabricator update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T0000).
[00:00:17] <mutante>	 !log LDAP - removed demon from nda group 
[00:00:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:04:38] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[00:06:34] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[00:11:36] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "required per https://wiki.mozilla.org/Security/DOH-resolver-policy" [puppet] - 10https://gerrit.wikimedia.org/r/618376 (owner: 10Ssingh)
[00:12:46] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10User-dancy, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10Ejegg) @Ladsgroup The amount of content hosted here will be very small - just a handful of f...
[00:14:08] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog1001.eqiad.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[00:14:36] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog1001.eqiad.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[00:17:32] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[00:19:58] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[00:34:32] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` wtp2019.codfw.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/2020...
[00:35:19] <mutante>	 !log wtp2019 - reimaging - parsoid service does not work, unlike on all other wtp*, making sure it's clean
[00:35:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:36:10] <wikibugs>	 10Operations, 10ops-codfw, 10netops: (Need by:  ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Papaul)
[00:48:51] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:53:25] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 62 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[00:58:39] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 46 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[00:59:05] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog1001.eqiad.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[01:06:15] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[01:14:45] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[01:17:30] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[01:17:30] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[01:17:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:17:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:19:23] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[01:19:23] <logmsgbot>	 !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[01:19:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:19:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:21:50] <wikibugs>	 (03PS2) 10Ottomata: eventgate - use /v1/_test/events route for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/618624 (https://phabricator.wikimedia.org/T251935)
[01:22:51] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[01:31:33] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[01:39:17] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[01:40:21] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[01:40:53] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[01:51:14] <icinga-wm>	 PROBLEM - DPKG on wtp2019 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.34: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[01:51:14] <icinga-wm>	 PROBLEM - configured eth on wtp2019 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.32.34: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[01:52:01] <logmsgbot>	 !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime
[01:52:01] <logmsgbot>	 !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[01:52:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:52:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:52:09] <mutante>	 ACK - wtp2019
[01:58:40] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[02:03:20] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[02:23:21] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[02:26:18] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[02:29:06] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[02:29:14] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[02:57:15] <wikibugs>	 10Operations, 10serviceops: All wtp and parse servers have a bad partition scheme. - https://phabricator.wikimedia.org/T258775 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['wtp2019.codfw.wmnet'] `  and were **ALL** successful.
[02:57:30] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[02:57:59] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10User-dancy, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10DStrine) @Ladsgroup one other note. We only need a few pages here but they will take the ful...
[03:02:40] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:04:02] <icinga-wm>	 RECOVERY - DPKG on wtp2019 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[03:04:02] <icinga-wm>	 RECOVERY - configured eth on wtp2019 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[03:04:40] <logmsgbot>	 !log dzahn@cumin1001 conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet
[03:04:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:07:04] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:11:26] <wikibugs>	 (03PS1) 10Tim Starling: Enable fastStale mode on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618646 (https://phabricator.wikimedia.org/T250248)
[03:12:42] <wikibugs>	 (03CR) 10Tim Starling: "Please review" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618646 (https://phabricator.wikimedia.org/T250248) (owner: 10Tim Starling)
[03:35:52] <icinga-wm>	 PROBLEM - Host cloudcephosd1011.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[03:37:56] <icinga-wm>	 RECOVERY - Host cloudcephosd1011.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.07 ms
[04:02:29] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:08:31] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:11:05] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[04:13:51] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[04:27:05] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:37:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12179 and previous config saved to /var/cache/conftool/dbconfig/20200806-043758-marostegui.json
[04:38:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:46:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12180 and previous config saved to /var/cache/conftool/dbconfig/20200806-044608-marostegui.json
[04:46:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:51:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12181 and previous config saved to /var/cache/conftool/dbconfig/20200806-045107-marostegui.json
[04:51:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:53:34] <wikibugs>	 (03CR) 10Marostegui: "Brooke, I am surprised that after so many hours...it looks like there are no connections going thru the "new" proxy, so I am not sure if t" [puppet] - 10https://gerrit.wikimedia.org/r/534577 (https://phabricator.wikimedia.org/T231520) (owner: 10Marostegui)
[04:56:23] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1079', diff saved to https://phabricator.wikimedia.org/P12182 and previous config saved to /var/cache/conftool/dbconfig/20200806-045622-marostegui.json
[04:56:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:07:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1127 for MCR', diff saved to https://phabricator.wikimedia.org/P12184 and previous config saved to /var/cache/conftool/dbconfig/20200806-050743-marostegui.json
[05:07:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:10:58] <wikibugs>	 (03PS1) 10Marostegui: db1132: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/618655 (https://phabricator.wikimedia.org/T259589)
[05:11:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1132: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/618655 (https://phabricator.wikimedia.org/T259589) (owner: 10Marostegui)
[05:15:36] <wikibugs>	 (03CR) 10Marostegui: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/534577 (https://phabricator.wikimedia.org/T231520) (owner: 10Marostegui)
[05:16:11] <wikibugs>	 (03CR) 10Marostegui: "Brooke, I am surprised that after so many hours...it looks like there are no connections going thru the "new" proxy, so I am not sure if t" [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[05:18:14] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10User-dancy, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10Dzahn) >>! In T259002#6364836, @DStrine wrote: >  We're talking hundreds of thousands of use...
[05:21:51] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 52 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:30:35] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[05:31:03] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[05:33:31] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[05:33:59] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[05:39:07] <wikibugs>	 (03CR) 10Bstorm: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/534577 (https://phabricator.wikimedia.org/T231520) (owner: 10Marostegui)
[05:42:43] <wikibugs>	 (03PS2) 10KartikMistry: Update cxserver to 2020-08-05-070016-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618525 (https://phabricator.wikimedia.org/T258919)
[05:43:33] * kart_ updating cxserver..
[05:46:37] <wikibugs>	 (03CR) 10Bstorm: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[05:47:03] <wikibugs>	 (03PS1) 10QChris: Add .gitreview [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/618656
[05:47:05] <wikibugs>	 (03CR) 10QChris: [V: 03+2 C: 03+2] Add .gitreview [debs/grafana-plugins] - 10https://gerrit.wikimedia.org/r/618656 (owner: 10QChris)
[05:48:36] <wikibugs>	 (03CR) 10KartikMistry: [C: 03+2] Update cxserver to 2020-08-05-070016-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618525 (https://phabricator.wikimedia.org/T258919) (owner: 10KartikMistry)
[05:48:38] <wikibugs>	 (03CR) 10Marostegui: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[05:49:37] <wikibugs>	 (03Merged) 10jenkins-bot: Update cxserver to 2020-08-05-070016-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618525 (https://phabricator.wikimedia.org/T258919) (owner: 10KartikMistry)
[05:53:22] <wikibugs>	 (03CR) 10Bstorm: "It all matches this patch in cloud DNS from everything I'm able to check:" [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[05:55:00] <wikibugs>	 (03CR) 10Marostegui: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[05:57:40] <wikibugs>	 (03CR) 10Bstorm: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[05:59:26] <wikibugs>	 (03CR) 10Marostegui: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[06:02:50] <wikibugs>	 (03PS1) 10Marostegui: install_server: Reimage dbproxy1018 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/618659 (https://phabricator.wikimedia.org/T255408)
[06:03:31] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Reimage dbproxy1018 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/618659 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[06:05:05] <wikibugs>	 (03PS1) 10Marostegui: install_server: Actually set dbproxy1018 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/618660
[06:05:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Actually set dbproxy1018 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/618660 (owner: 10Marostegui)
[06:06:45] <kart_>	 I'm getting unusual error while running helmfile command. Did anything change with it?
[06:08:12] <kart_>	 https://www.irccloud.com/pastebin/lrj5Zab5/
[06:10:17] <kart_>	 akosiaris: When you're around ^
[06:11:30] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:16:19] <wikibugs>	 (03CR) 10Elukey: "Can this be deployed asap? We have recurrent alerts in icinga for netbox1001's root partition filling up :)" [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512) (owner: 10CRusnov)
[06:20:30] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime
[06:20:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:22:43] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[06:22:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:24:20] <wikibugs>	 10Operations, 10ops-eqiad: relforge1001's mgmt IP not reachable - https://phabricator.wikimedia.org/T259777 (10elukey)
[06:36:33] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper
[06:36:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:37:30] <elukey>	 !log roll restart of druid clusters' zookeeper and an-conf* zookeeper for openjdk-11 upgrades
[06:37:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:41:41] <wikibugs>	 (03PS1) 10Marostegui: Revert "wikireplica_dns.yaml: Depool dbproxy1018" [puppet] - 10https://gerrit.wikimedia.org/r/618570
[06:42:17] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[06:42:49] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "wikireplica_dns.yaml: Depool dbproxy1018" [puppet] - 10https://gerrit.wikimedia.org/r/618570 (owner: 10Marostegui)
[06:43:03] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
[06:43:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:43:57] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[06:47:08] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper
[06:47:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:53:27] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
[06:53:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:54:31] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[06:54:31] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[06:57:04] <marostegui>	 !log Truncate tables on zerowiki T227717
[06:57:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:57:07] <stashbot>	 T227717: Drop DB tables for now-deleted zerowiki from production - https://phabricator.wikimedia.org/T227717
[06:57:11] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] mjolnir: Increase msearch daemon parallelism to 25 [puppet] - 10https://gerrit.wikimedia.org/r/618538 (owner: 10Ebernhardson)
[06:57:24] <logmsgbot>	 !log elukey@cumin1001 START - Cookbook sre.zookeeper.roll-restart-zookeeper
[06:57:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:59:10] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[06:59:10] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[07:00:07] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] kafkamon: add role::kafka::monitoring_buster, assign kafkamon[12]002 [puppet] - 10https://gerrit.wikimedia.org/r/618359 (https://phabricator.wikimedia.org/T252773) (owner: 10Herron)
[07:03:46] <logmsgbot>	 !log elukey@cumin1001 END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
[07:03:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:04:57] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 50 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:14:13] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 51 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:19:51] <icinga-wm>	 PROBLEM - Thanos compact has not run on icinga1001 is CRITICAL: 4.435e+05 ge 24 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[07:25:08] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10jcrespo) Thank you Papaul very much for you work, really appreciated how fast this was completed!
[07:27:14] <librenms-wmf>	 04Critical Alert for device cr2-eqdfw.wikimedia.org - Primary outbound port utilisation over 80%  #page
[07:27:26] <librenms-wmf>	 04Critical Alert for device cr2-eqdfw.wikimedia.org - Primary inbound port utilisation over 80%  #page
[07:28:14] <librenms-wmf>	 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-eqdfw.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page
[07:28:18] * volans here
[07:28:26] <librenms-wmf>	 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-eqdfw.wikimedia.org recovered from Primary inbound port utilisation over 80%  #page
[07:29:34] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 48 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[07:29:37] <volans>	 cc XioNoX ^^^
[07:32:36] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[07:33:37] <elukey>	 there was a jump for cr1-codfw too
[07:40:20] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[07:44:16] <icinga-wm>	 RECOVERY - Thanos compact has not run on icinga1001 is OK: (C)24 ge (W)12 ge 0.01612 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact
[07:46:18] <wikibugs>	 (03PS1) 10Elukey: druid: fix monitoring configuration [puppet] - 10https://gerrit.wikimedia.org/r/618705 (https://phabricator.wikimedia.org/T244482)
[07:46:23] <XioNoX>	 volans: on my phone, looks like it recovered
[07:47:09] <volans>	 XioNoX: yeah was a spike in traffic actually, we're looking into it
[07:51:33] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] druid: fix monitoring configuration [puppet] - 10https://gerrit.wikimedia.org/r/618705 (https://phabricator.wikimedia.org/T244482) (owner: 10Elukey)
[07:55:13] <elukey>	 !roll restart druid brokers on druid-analytics to pick up new settings
[08:02:04] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-backup: Initial setup of dbprov2003 [puppet] - 10https://gerrit.wikimedia.org/r/618706 (https://phabricator.wikimedia.org/T257551)
[08:03:44] <wikibugs>	 (03PS6) 10Ema: ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692)
[08:04:38] <elukey>	 !roll restart druid brokers on druid-public to pick up new settings
[08:05:41] <wikibugs>	 (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692) (owner: 10Ema)
[08:05:51] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[08:06:38] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[08:07:02] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:08:44] <wikibugs>	 (03PS2) 10Thiemo Kreuz (WMDE): Remove deprecated setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618482 (https://phabricator.wikimedia.org/T232542) (owner: 10Awight)
[08:11:56] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:14:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12185 and previous config saved to /var/cache/conftool/dbconfig/20200806-081416-marostegui.json
[08:14:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:14:36] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[08:16:41] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[08:21:36] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Generate ATS cache.config from software-agnostic data structures - https://phabricator.wikimedia.org/T259692 (10ema) On a text node, [[https://puppet-compiler.wmflabs.org/compiler1001/527/|applying the change]] results in the following diff:  ` --- /etc/trafficser...
[08:25:13] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] kafkamon: add role::kafka::monitoring_buster, assign kafkamon[12]002 [puppet] - 10https://gerrit.wikimedia.org/r/618359 (https://phabricator.wikimedia.org/T252773) (owner: 10Herron)
[08:30:34] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12186 and previous config saved to /var/cache/conftool/dbconfig/20200806-083033-marostegui.json
[08:30:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:37:43] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12187 and previous config saved to /var/cache/conftool/dbconfig/20200806-083743-marostegui.json
[08:37:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:36] <wikibugs>	 (03CR) 10Vgutierrez: ATS: add function profile::trafficserver_caching_rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692) (owner: 10Ema)
[08:44:07] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1127', diff saved to https://phabricator.wikimedia.org/P12188 and previous config saved to /var/cache/conftool/dbconfig/20200806-084406-marostegui.json
[08:44:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:45:10] <wikibugs>	 (03CR) 10Ema: ATS: add function profile::trafficserver_caching_rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692) (owner: 10Ema)
[08:45:32] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-backup: Initial setup of dbprov2003 [puppet] - 10https://gerrit.wikimedia.org/r/618706 (https://phabricator.wikimedia.org/T257551) (owner: 10Jcrespo)
[08:46:03] <wikibugs>	 (03PS7) 10Ema: ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692)
[08:57:00] <wikibugs>	 (03PS1) 10Mvolz: Update citoid to 37e45898 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618713 (https://phabricator.wikimedia.org/T259469)
[08:57:34] <kart_>	 What can be this error in deployment-charts? https://pastebin.com/xr5iyBzN :/
[08:57:49] <wikibugs>	 (03PS2) 10Mvolz: Update citoid to 37e45898 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618713 (https://phabricator.wikimedia.org/T259469)
[08:58:40] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Move x1 snapshots for the temporary backup2002 to dbprov2003 [puppet] - 10https://gerrit.wikimedia.org/r/618714 (https://phabricator.wikimedia.org/T257551)
[09:00:13] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692) (owner: 10Ema)
[09:02:39] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[09:03:33] <wikibugs>	 (03PS1) 10Mvolz: Update zotero to use buster10 images [deployment-charts] - 10https://gerrit.wikimedia.org/r/618716 (https://phabricator.wikimedia.org/T258158)
[09:05:47] <wikibugs>	 (03PS1) 10Ema: varnish: add Go-http-client to cache_upload naughty list [puppet] - 10https://gerrit.wikimedia.org/r/618717
[09:06:39] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] "looking good from ats-tls and TLS point of view" [puppet] - 10https://gerrit.wikimedia.org/r/615797 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn)
[09:07:17] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[09:08:00] <godog>	 FYI I'm debugging these rsyslog failures ^ see also T259780
[09:08:01] <stashbot>	 T259780: rsyslog occasional segfault on centrallog hosts - https://phabricator.wikimedia.org/T259780
[09:10:46] <wikibugs>	 (03PS1) 10Jcrespo: install: Prevent full wipe of dbprov2003 data by changing its recipe [puppet] - 10https://gerrit.wikimedia.org/r/618718 (https://phabricator.wikimedia.org/T257551)
[09:11:26] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Move x1 snapshots from the temporary backup2002 to dbprov2003 [puppet] - 10https://gerrit.wikimedia.org/r/618714 (https://phabricator.wikimedia.org/T257551)
[09:11:53] <wikibugs>	 (03PS2) 10Jcrespo: install: Prevent full wipe of dbprov2003 data by changing its recipe [puppet] - 10https://gerrit.wikimedia.org/r/618718 (https://phabricator.wikimedia.org/T257551)
[09:12:56] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Move x1 snapshots from the temporary backup2002 to dbprov2003 [puppet] - 10https://gerrit.wikimedia.org/r/618714 (https://phabricator.wikimedia.org/T257551) (owner: 10Jcrespo)
[09:15:35] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] install: Prevent full wipe of dbprov2003 data by changing its recipe [puppet] - 10https://gerrit.wikimedia.org/r/618718 (https://phabricator.wikimedia.org/T257551) (owner: 10Jcrespo)
[09:16:39] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: add alert[12]001 to monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/618719 (https://phabricator.wikimedia.org/T247966)
[09:16:42] <wikibugs>	 (03PS5) 10ZPapierski: Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625)
[09:20:01] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-backups: Reenable notifications for dbprov2003 after maintenance [puppet] - 10https://gerrit.wikimedia.org/r/618720 (https://phabricator.wikimedia.org/T138562)
[09:20:29] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[09:23:17] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog1001.eqiad.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[09:26:11] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[09:31:56] <wikibugs>	 (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/607524 (owner: 10Hashar)
[09:32:09] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/607525 (owner: 10Hashar)
[09:32:21] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/611369 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar)
[09:34:06] <wikibugs>	 (03PS2) 10Ema: varnish: add Go-http-client to cache_upload naughty list [puppet] - 10https://gerrit.wikimedia.org/r/618717 (https://phabricator.wikimedia.org/T192688)
[09:38:01] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[09:40:52] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-backups: Move x1 and misc logical dumps to dbprov1003 [puppet] - 10https://gerrit.wikimedia.org/r/618722 (https://phabricator.wikimedia.org/T138562)
[09:47:53] <wikibugs>	 (03PS1) 10Jcrespo: BackupStatistics: Do not raise an exception if metadata cannot be sent [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618723 (https://phabricator.wikimedia.org/T138562)
[09:49:33] <wikibugs>	 (03PS2) 10Jcrespo: BackupStatistics: Do not raise an exception if metadata cannot be sent [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618723 (https://phabricator.wikimedia.org/T138562)
[09:50:17] <icinga-wm>	 PROBLEM - Check systemd state on stat1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[09:51:51] <volans>	 elukey: jupyter-mgerlach-singleuser.service failed ^^^
[09:51:54] <volans>	  /bin/bash: line 0: exec: jupyterhub-singleuser: not found
[09:52:02] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-2] "Actually, the code is good; it just logs the exception, it doesn't rise it further." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618723 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[09:53:21] <elukey>	 volans: thanks!
[09:53:30] <wikibugs>	 (03Abandoned) 10Jcrespo: BackupStatistics: Do not raise an exception if metadata cannot be sent [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618723 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[09:58:17] <wikibugs>	 (03PS1) 10Hashar: Fix changelog filename [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/618724
[09:58:34] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[09:59:24] <wikibugs>	 (03CR) 10Hashar: "puppet catalog diff: https://puppet-compiler.wmflabs.org/compiler1001/528/contint2001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/607524 (owner: 10Hashar)
[10:00:04] <jouncebot>	 mvolz: Time to snap out of that daydream and deploy Services – Citoid /  Zotero. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T1000).
[10:00:14] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "Puppet catalog diff looks fine https://puppet-compiler.wmflabs.org/compiler1003/529/doc1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/607525 (owner: 10Hashar)
[10:01:11] <wikibugs>	 (03PS3) 10Mvolz: Update citoid to 37e45898 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618713 (https://phabricator.wikimedia.org/T259469)
[10:03:01] <wikibugs>	 (03PS3) 10Hashar: ci: switch integration.wikimedia.org to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/611369 (https://phabricator.wikimedia.org/T149924)
[10:03:14] <wikibugs>	 (03PS4) 10Hnowlan: api-gateway: open parts of the admin interface internally [deployment-charts] - 10https://gerrit.wikimedia.org/r/616121 (https://phabricator.wikimedia.org/T254908)
[10:03:47] <wikibugs>	 (03PS4) 10Hashar: ci: switch integration.wikimedia.org to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/611369 (https://phabricator.wikimedia.org/T149924)
[10:03:53] <wikibugs>	 (03CR) 10Mvolz: [C: 03+2] Update citoid to 37e45898 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618713 (https://phabricator.wikimedia.org/T259469) (owner: 10Mvolz)
[10:04:29] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[10:04:43] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/611369 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar)
[10:04:58] <wikibugs>	 (03Merged) 10jenkins-bot: Update citoid to 37e45898 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618713 (https://phabricator.wikimedia.org/T259469) (owner: 10Mvolz)
[10:05:34] <elukey>	 ls
[10:07:21] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[10:07:22] <kormat>	 shame-elukey-for-mixing-up-windows.txt
[10:07:53] <elukey>	 ahahahh well deserved
[10:11:29] <logmsgbot>	 !log mvolz@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
[10:11:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:56] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "PS3 gives more context in the commit message" [puppet] - 10https://gerrit.wikimedia.org/r/611369 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar)
[10:12:02] <wikibugs>	 (03PS5) 10Hnowlan: Add discovery and disabled LVS components for API gateway [puppet] - 10https://gerrit.wikimedia.org/r/615512 (https://phabricator.wikimedia.org/T254908)
[10:12:29] <logmsgbot>	 !log jynus@cumin2001 START - Cookbook sre.hosts.downtime
[10:12:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:42] <logmsgbot>	 !log jynus@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[10:14:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:56] <wikibugs>	 (03PS1) 10Elukey: Add basic Debian packaging [debs/hue] - 10https://gerrit.wikimedia.org/r/618728 (https://phabricator.wikimedia.org/T233073)
[10:16:50] <logmsgbot>	 !log mvolz@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
[10:16:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:17:44] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog1001.eqiad.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[10:23:43] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: add alert[12]001 to monitoring hosts [puppet] - 10https://gerrit.wikimedia.org/r/618719 (https://phabricator.wikimedia.org/T247966) (owner: 10Filippo Giunchedi)
[10:23:48] <logmsgbot>	 !log mvolz@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
[10:23:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:51] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] "+1 on the basis that this briefly saturated one of our outbound link" [puppet] - 10https://gerrit.wikimedia.org/r/618717 (https://phabricator.wikimedia.org/T192688) (owner: 10Ema)
[10:30:38] <icinga-wm>	 RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:32:14] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 51 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:35:54] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[10:36:04] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[10:36:28] <wikibugs>	 (03PS3) 10Hnowlan: Add api.wikimedia.org and api.m.wikimedia.org DNS entries [dns] - 10https://gerrit.wikimedia.org/r/599273 (https://phabricator.wikimedia.org/T246945) (owner: 10Ladsgroup)
[10:37:48] <wikibugs>	 (03PS2) 10Mvolz: Update zotero to use buster10 images [deployment-charts] - 10https://gerrit.wikimedia.org/r/618716 (https://phabricator.wikimedia.org/T258158)
[10:38:30] <wikibugs>	 10Operations, 10observability: Making centrallog syslog easier and faster to work with - https://phabricator.wikimedia.org/T254605 (10fgiunchedi) Something else that occurred to me today while debugging {T259780}: sometimes it is useful to be able to access the "syslog firehose" for fleet-wide real time monito...
[10:39:11] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: admin: add Edward Tadros to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/609158 (https://phabricator.wikimedia.org/T256435) (owner: 10Ssingh)
[10:41:36] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_proton_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[10:41:52] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[10:41:58] <icinga-wm>	 PROBLEM - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) timed out before a response was received: /{domain}/v1/page/random/title (retrieve a random article title) is CRITICAL: Test retrieve a random article title returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org
[10:42:20] <wikibugs>	 (03CR) 10Mvolz: [C: 03+2] Update zotero to use buster10 images [deployment-charts] - 10https://gerrit.wikimedia.org/r/618716 (https://phabricator.wikimedia.org/T258158) (owner: 10Mvolz)
[10:43:12] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/618719 (https://phabricator.wikimedia.org/T247966) (owner: 10Filippo Giunchedi)
[10:43:21] <wikibugs>	 (03Merged) 10jenkins-bot: Update zotero to use buster10 images [deployment-charts] - 10https://gerrit.wikimedia.org/r/618716 (https://phabricator.wikimedia.org/T258158) (owner: 10Mvolz)
[10:43:32] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[10:43:40] <icinga-wm>	 RECOVERY - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Wikifeeds
[10:45:12] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[10:45:50] <logmsgbot>	 !log mvolz@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
[10:45:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:47:13] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Reenable notifications for dbprov2003 after maintenance [puppet] - 10https://gerrit.wikimedia.org/r/618720 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[10:48:27] <logmsgbot>	 !log mvolz@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
[10:48:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:51:06] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "Moritz wrote:" [puppet] - 10https://gerrit.wikimedia.org/r/606286 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn)
[10:52:45] <logmsgbot>	 !log mvolz@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
[10:52:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:53:45] <wikibugs>	 (03PS1) 10Ladsgroup: Fix CachingFallbackLabelDescriptionLookup failing in edge-cases [extensions/Wikibase] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618579 (https://phabricator.wikimedia.org/T259744)
[10:54:00] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] Fix CachingFallbackLabelDescriptionLookup failing in edge-cases [extensions/Wikibase] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618579 (https://phabricator.wikimedia.org/T259744) (owner: 10Ladsgroup)
[10:56:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Fix CachingFallbackLabelDescriptionLookup failing in edge-cases [extensions/Wikibase] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618579 (https://phabricator.wikimedia.org/T259744) (owner: 10Ladsgroup)
[10:57:10] <Lucas_WMDE>	 jouncebot: refresh
[10:57:11] <jouncebot>	 I refreshed my knowledge about deployments.
[10:57:13] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[10:57:16] <Lucas_WMDE>	 thx :)
[10:58:51] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Pass jQuery objects into jqueryMsg [extensions/Flow] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618580
[10:59:00] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] Scap: git_fat -> git_binary_manager [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/404228 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[10:59:21] <Lucas_WMDE>	 jouncebot: refresh
[10:59:22] <jouncebot>	 I refreshed my knowledge about deployments.
[10:59:43] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] admin: add Edward Tadros to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/609158 (https://phabricator.wikimedia.org/T256435) (owner: 10Ssingh)
[10:59:47] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[11:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor I � Unicode. All rise for European mid-day backport window(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T1100).
[11:00:04] <jouncebot>	 Lucas_WMDE: A patch you scheduled for European mid-day backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:16] <Lucas_WMDE>	 o/
[11:00:22] <Amir1>	 o/
[11:00:41] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] Scap: git_fat -> git_binary_manager [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/404227 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[11:00:59] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] Scap: git_fat -> git_binary_manager [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/404226 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[11:01:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Pass jQuery objects into jqueryMsg [extensions/Flow] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618580 (owner: 10Lucas Werkmeister (WMDE))
[11:01:14] <Amir1>	 Lucas_WMDE: are you deploying?
[11:01:19] <Amir1>	 or should I?
[11:01:20] <Lucas_WMDE>	 yup, just a second
[11:01:24] <Amir1>	 coolio
[11:01:26] <Amir1>	 Thanks!
[11:01:27] <Lucas_WMDE>	 I can do it, I have another backport too
[11:02:03] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] Scap: git_fat -> git_binary_manager [software/prometheus_jmx_exporter] - 10https://gerrit.wikimedia.org/r/404224 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[11:02:30] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [V: 03+2 C: 03+2] "CI fails due to an npmjs.com outage (https://status.npmjs.org/incidents/cksjqc1w11v5). Force-merging and deploying with extra caution." [extensions/Wikibase] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618579 (https://phabricator.wikimedia.org/T259744) (owner: 10Ladsgroup)
[11:02:53] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] Scap: git_fat -> git_binary_manager [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/404222 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[11:03:28] * Amir1 sits here to be mad at npm
[11:03:46] <Lucas_WMDE>	 first seeing if I can reproduce the bug *without* deploying the fix
[11:05:13] <Lucas_WMDE>	 someone at my door, brb
[11:05:52] <Lucas_WMDE>	 back
[11:07:16] <Lucas_WMDE>	 Amir1: do you know how to reproduce this?
[11:07:38] <Amir1>	 no tbh but also isn't it rolledbacked?
[11:07:46] <Lucas_WMDE>	 I tried =mw.wikibase.getLabel('Q11') and =mw.wikibase.getLabel('Q11', 'de-formal') in a Lua console on test wikidata, but that doesn’t do anything
[11:07:49] <Lucas_WMDE>	 all groups?
[11:08:00] <Lucas_WMDE>	 no, group0 should still have the bug, right?
[11:08:04] <Amir1>	 hmm, I thought you're checking it on wikidata
[11:08:18] <Lucas_WMDE>	 no, test wikidata
[11:08:24] <Amir1>	 test is good it seems
[11:08:31] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10akosiaris) a:03DVrandecic
[11:08:37] <Amir1>	 does logstash have anything for test wikidata?
[11:09:06] <Lucas_WMDE>	 I’m looking at the mwdebug1002 board because I’m testing with X-Wikimedia-Debug, nothing there
[11:09:10] <Lucas_WMDE>	 I can check the mediawiki-errors board too
[11:09:57] <Lucas_WMDE>	 nothing there either afaict
[11:11:00] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users and nda groups for edtadros - https://phabricator.wikimedia.org/T256435 (10akosiaris) 05Open→03Resolved a:03akosiaris Change merged, user added to the NDA group as requested. @Edtadros you should be good to go. I 'll re...
[11:11:12] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users and nda groups for edtadros - https://phabricator.wikimedia.org/T256435 (10akosiaris)
[11:11:59] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production shell for Denny Vrandecic - https://phabricator.wikimedia.org/T259388 (10akosiaris) p:05Triage→03Medium
[11:13:07] <XioNoX>	 !log drain traffic away cr2-eqdfw - T259621
[11:13:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:13:43] <wikibugs>	 (03CR) 10Ema: ATS: add new backend for phabricator aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/615797 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn)
[11:16:09] <Lucas_WMDE>	 Amir1: so what should we do about that change?
[11:16:16] <Lucas_WMDE>	 I still haven’t been able to reproduce the bug it fixes
[11:16:42] <Amir1>	 hmm, I'm not 100% sure, I'd better safe than sorry
[11:16:46] <Amir1>	 *I'd say
[11:18:08] <wikibugs>	 (03PS6) 10ZPapierski: Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625)
[11:20:24] <Lucas_WMDE>	 and that means deploy or not deploy? ^^
[11:21:54] <Lucas_WMDE>	 aha! managed to reproduce it
[11:21:57] <Lucas_WMDE>	 finally
[11:21:59] <Lucas_WMDE>	 https://test.wikidata.org/wiki/Module_talk:T259744?uselang=%E2%A7%BCLang%E2%A7%BD
[11:22:00] <stashbot>	 T259744: Argument 3 passed to CachingFallbackLabelDescriptionLookup::buildCacheKey() must be of the type string, null given - https://phabricator.wikimedia.org/T259744
[11:22:24] <Lucas_WMDE>	 ok, let’s scap pull and see if that fixes it
[11:22:32] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] Detect kubeconfig as known argument in plugin invocations [debs/helm] - 10https://gerrit.wikimedia.org/r/618556 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[11:22:52] <XioNoX>	 !log reboot cr2-eqdfw - T259621
[11:22:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:22] <Lucas_WMDE>	 change is on mwdebug1001
[11:23:32] <Lucas_WMDE>	 yup, that fixes it
[11:23:46] <Lucas_WMDE>	 testing a bit of other stuff just to see if anything else breaks
[11:24:57] <Lucas_WMDE>	 looks fine as far as I can tell
[11:25:02] <Lucas_WMDE>	 Amir1: ok if I sync?
[11:25:23] <icinga-wm>	 PROBLEM - OSPF status on cr4-ulsfo is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:25:28] <Amir1>	 I think so
[11:25:30] <icinga-wm>	 PROBLEM - OSPF status on cr3-knams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:25:31] <Amir1>	 Thanks!
[11:25:49] <Amir1>	 XioNoX: ^ 
[11:25:54] <icinga-wm>	 PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:26:00] <XioNoX>	 yep, expected
[11:26:16] <Lucas_WMDE>	 syncing
[11:26:29] <Amir1>	 Cool. Thanks!
[11:26:29] <Lucas_WMDE>	 and +2ing my second backport to get CI going
[11:26:37] <wikibugs>	 10Operations: rsyslog occasional segfault on centrallog hosts - https://phabricator.wikimedia.org/T259780 (10Peachey88)
[11:26:38] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] "backporting this" [extensions/Flow] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618580 (owner: 10Lucas Werkmeister (WMDE))
[11:26:51] <Lucas_WMDE>	 let’s see if npmjs recovered
[11:26:53] <wikibugs>	 (03PS7) 10ZPapierski: Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625)
[11:26:54] <Amir1>	 Is npm back?
[11:26:59] <wikibugs>	 10Operations, 10observability: rsyslog occasional segfault on centrallog hosts - https://phabricator.wikimedia.org/T259780 (10Peachey88)
[11:27:17] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.36.0-wmf.3/extensions/Wikibase/lib/: Backport: [[gerrit:618579|Fix CachingFallbackLabelDescriptionLookup failing in edge-cases (T259744)]] (duration: 01m 10s)
[11:27:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:27:20] <stashbot>	 T259744: Argument 3 passed to CachingFallbackLabelDescriptionLookup::buildCacheKey() must be of the type string, null given - https://phabricator.wikimedia.org/T259744
[11:28:12] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625) (owner: 10ZPapierski)
[11:28:38] <Lucas_WMDE>	 gate-and-submit hasn’t failed yet, at least
[11:29:42] <icinga-wm>	 RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:30:30] <icinga-wm>	 RECOVERY - OSPF status on cr4-ulsfo is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:30:40] <icinga-wm>	 RECOVERY - OSPF status on cr3-knams is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:34:03] <XioNoX>	 cr2-eqdfw is all back to normal
[11:34:06] <XioNoX>	 cr2-eqord soon
[11:34:07] <micgro42>	 Amir1: might not be reliably back yet: https://status.npmjs.org/
[11:37:56] <XioNoX>	 !log drain traffic away cr2-eqord - T259621
[11:37:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:38:24] <icinga-wm>	 RECOVERY - BGP status on cr2-eqord is OK: BGP OK - up: 6, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[11:41:15] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: varnishmtail silently stops working if varnishncsa crashes - https://phabricator.wikimedia.org/T259020 (10ema) 05Open→03Stalled p:05Medium→03Lowest
[11:41:46] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: varnishmtail silently stops working if varnishncsa crashes - https://phabricator.wikimedia.org/T259020 (10ema) Workaround deployed, now stalling while waiting for a proper solution to be implemented in mtail.
[11:47:44] <wikibugs>	 10Operations, 10Epic, 10Performance-Team (Radar), 10Services (watching): 2017/18 Annual Plan Program 8: Multi-datacenter support - https://phabricator.wikimedia.org/T175206 (10Gilles) 05Open→03Resolved a:03Gilles
[11:47:53] <wikibugs>	 10Operations, 10Epic, 10Performance-Team (Radar), 10Services (watching): 2017/18 Annual Plan Program 8: Multi-datacenter support, Q2 goals - https://phabricator.wikimedia.org/T175213 (10Gilles) 05Open→03Resolved a:03Gilles
[11:47:56] <wikibugs>	 10Operations, 10Epic, 10Performance-Team (Radar), 10Services (watching): 2017/18 Annual Plan Program 8: Multi-datacenter support - https://phabricator.wikimedia.org/T175206 (10Gilles)
[11:50:43] <wikibugs>	 (03Merged) 10jenkins-bot: Pass jQuery objects into jqueryMsg [extensions/Flow] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618580 (owner: 10Lucas Werkmeister (WMDE))
[11:50:58] <Lucas_WMDE>	 alright, doing that backport
[11:51:29] <wikibugs>	 (03CR) 10Ema: [C: 03+2] varnish: add Go-http-client to cache_upload naughty list [puppet] - 10https://gerrit.wikimedia.org/r/618717 (https://phabricator.wikimedia.org/T192688) (owner: 10Ema)
[11:51:50] <wikibugs>	 10Operations: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10Aklapper)
[11:52:16] <Lucas_WMDE>	 yup, works like a charm (tested on mwdebug1001)
[11:52:18] <Lucas_WMDE>	 syncing
[11:52:40] <kart_>	 Lucas_WMDE: ping me when Backport window is done, need to update cxserver.
[11:53:02] <Lucas_WMDE>	 yup
[11:53:35] <XioNoX>	 !log reboot cr2-eqord - T259621
[11:53:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:08] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.36.0-wmf.3/extensions/Flow/: Backport: [[gerrit:618580|Pass jQuery objects into jqueryMsg]] (duration: 01m 09s)
[11:54:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:27] <Lucas_WMDE>	 !log EU backport window done
[11:54:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:32] <Lucas_WMDE>	 kart_: the floor is yours
[11:55:07] <wikibugs>	 10Operations, 10ops-codfw, 10RESTBase: restbase2009 down - https://phabricator.wikimedia.org/T256863 (10hnowlan) I'm looking into this today - I see that restbase2009 is up 9 days, has been configured by puppet and added to the Cassandra cluster but I don't see anything in SAL about who did it. Still investi...
[11:56:00] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[11:56:34] <kart_>	 Lucas_WMDE: Thanks!
[11:57:09] <logmsgbot>	 !log kartik@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
[11:57:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:57:46] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 238, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[11:57:54] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[11:58:50] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 130, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[11:59:06] <wikibugs>	 10Operations, 10Domains, 10Traffic: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10Peachey88)
[11:59:22] <icinga-wm>	 PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[11:59:28] <logmsgbot>	 !log kartik@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
[11:59:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:44] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:01:18] <icinga-wm>	 RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:01:40] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 240, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[12:03:15] <logmsgbot>	 !log kartik@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
[12:03:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:04:00] <XioNoX>	 cr2-eqord back to normal
[12:06:34] <kart_>	 !log Updated cxserver to 2020-08-05-070016-production (T258919, T199523, T257943, T256194)
[12:06:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:06:40] <stashbot>	 T258919: Enable MT based on closely-related languages based on community input - https://phabricator.wikimedia.org/T258919
[12:06:40] <stashbot>	 T257943: Create Wikipedia Kotava - https://phabricator.wikimedia.org/T257943
[12:06:40] <stashbot>	 T199523: Expose Machine Translation services supporting Chinese to closer languages/variants - https://phabricator.wikimedia.org/T199523
[12:06:40] <stashbot>	 T256194: Provide section order information in the section suggestions API - https://phabricator.wikimedia.org/T256194
[12:16:50] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:21:40] <wikibugs>	 10Operations, 10Wikimedia-Logstash: Kibana ng sending telemetry to elastic.io - https://phabricator.wikimedia.org/T259794 (10Rxy)
[12:22:05] <wikibugs>	 10Operations, 10Wikimedia-Logstash: Kibana next sending telemetry to elastic.io - https://phabricator.wikimedia.org/T259794 (10Rxy)
[12:22:10] <wikibugs>	 (03PS1) 10Ema: varnish: lower cache_upload rate limit for Facebook [puppet] - 10https://gerrit.wikimedia.org/r/618736 (https://phabricator.wikimedia.org/T192688)
[12:22:34] <wikibugs>	 10Operations, 10Wikimedia-Logstash: Kibana next sending telemetry to elastic.co - https://phabricator.wikimedia.org/T259794 (10Rxy)
[12:22:48] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[12:23:12] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10Privacy: Kibana next sending telemetry to elastic.co - https://phabricator.wikimedia.org/T259794 (10Majavah)
[12:24:32] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[12:24:38] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[12:25:28] <wikibugs>	 10Operations, 10observability: rsyslog occasional segfault on centrallog hosts - https://phabricator.wikimedia.org/T259780 (10fgiunchedi) I've captured core dumps on centrallog2001 for this issue, unclear yet what the root cause is. The trigger was a big influx of firewall drop logs for NRPE (port 5666) from a...
[12:29:58] <wikibugs>	 (03PS8) 10ZPapierski: Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625)
[12:30:36] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10Privacy: Kibana next sending telemetry to elastic.co - https://phabricator.wikimedia.org/T259794 (10Rxy)
[12:36:20] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) timed out before a response was received: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[12:40:04] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[12:40:38] <wikibugs>	 (03PS1) 10JMeybohm: Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738
[12:41:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738 (owner: 10JMeybohm)
[12:42:03] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] Scap: git_fat -> git_binary_manager [software/prometheus_jmx_exporter] - 10https://gerrit.wikimedia.org/r/404224 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[12:45:14] <wikibugs>	 (03PS1) 10Kormat: admin: Update kormat configs [puppet] - 10https://gerrit.wikimedia.org/r/618739
[12:46:26] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] admin: Update kormat configs [puppet] - 10https://gerrit.wikimedia.org/r/618739 (owner: 10Kormat)
[12:47:19] <wikibugs>	 (03PS2) 10JMeybohm: Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738
[12:48:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738 (owner: 10JMeybohm)
[12:51:25] <wikibugs>	 (03PS3) 10JMeybohm: Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738
[12:51:48] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[12:52:26] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738 (owner: 10JMeybohm)
[12:53:40] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[12:57:01] <wikibugs>	 (03PS9) 10ZPapierski: Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625)
[12:58:36] <wikibugs>	 (03PS4) 10JMeybohm: Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738
[13:00:57] <wikibugs>	 (03PS5) 10Kormat: Split utilities into separate packages [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513 (https://phabricator.wikimedia.org/T259516)
[13:01:30] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:01:42] <wikibugs>	 (03PS5) 10JMeybohm: Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738
[13:02:09] <wikibugs>	 (03PS10) 10ZPapierski: Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625)
[13:02:38] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10observability, 10Privacy: Kibana next sending telemetry to elastic.co - https://phabricator.wikimedia.org/T259794 (10jcrespo)
[13:03:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625) (owner: 10ZPapierski)
[13:07:18] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:07:32] <wikibugs>	 (03PS11) 10ZPapierski: Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625)
[13:08:46] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:09:31] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Detect kubeconfig as known argument in plugin invocations [debs/helm] - 10https://gerrit.wikimedia.org/r/618556 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[13:12:06] <wikibugs>	 (03CR) 10ZPapierski: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/24353/ - changes are related to prefixes file configuration." [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625) (owner: 10ZPapierski)
[13:12:36] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[13:15:27] <wikibugs>	 (03Merged) 10jenkins-bot: Detect kubeconfig as known argument in plugin invocations [debs/helm] - 10https://gerrit.wikimedia.org/r/618556 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[13:15:29] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "LGTM" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/618542 (owner: 10Alexandros Kosiaris)
[13:18:27] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] helm: Replace repo update cronjob by systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/618350 (owner: 10JMeybohm)
[13:18:29] <wikibugs>	 (03CR) 10Gilles: [C: 03+1] arclamp: require python-swiftclient [puppet] - 10https://gerrit.wikimedia.org/r/618626 (https://phabricator.wikimedia.org/T244776) (owner: 10Dave Pifke)
[13:18:36] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] helm: Replace repo update cronjob by systemd timer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618350 (owner: 10JMeybohm)
[13:20:05] <wikibugs>	 (03CR) 10Volans: "Thanks for adventuring into your first cookbook! Looks mostly ok, few nits inline. Ping me offline if something is not clear or you have a" (0313 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/618738 (owner: 10JMeybohm)
[13:20:19] <wikibugs>	 (03PS1) 10Kormat: Use 'native' debian format, and exclude irrlevant dirs. [software/transferpy] - 10https://gerrit.wikimedia.org/r/618743
[13:22:17] <wikibugs>	 (03CR) 10Gilles: "This has been stalled for a couple of months. @Cdanis who else do you think should review this?" [puppet] - 10https://gerrit.wikimedia.org/r/597176 (https://phabricator.wikimedia.org/T225739) (owner: 10Dave Pifke)
[13:23:40] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10observability, 10Privacy: Kibana next sending telemetry to elastic.co - https://phabricator.wikimedia.org/T259794 (10Rxy) perhaps rOPUP[/modules/kibana/manifests/init.pp$39-47](https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/kibana/...
[13:24:19] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] "This is LGTM from me.  I thought dpifke had +2 on the repo and could merge?  If not I'm happy to do that." [puppet] - 10https://gerrit.wikimedia.org/r/597176 (https://phabricator.wikimedia.org/T225739) (owner: 10Dave Pifke)
[13:24:33] <jayme>	 !log imported helm_2.16.9-2 and tiller_2.16.9-2 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
[13:24:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:24:47] <wikibugs>	 (03PS1) 10Volans: dbmonitor: use default 1H TTL [dns] - 10https://gerrit.wikimedia.org/r/618744
[13:24:50] <wikibugs>	 (03CR) 10Ayounsi: Configure transport links OSPF based on Netbox data (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/617603 (https://phabricator.wikimedia.org/T200277) (owner: 10Ayounsi)
[13:26:16] <wikibugs>	 (03PS2) 10Kormat: Use 'native' debian format, and exclude irrelevant dirs. [software/transferpy] - 10https://gerrit.wikimedia.org/r/618743
[13:26:28] <wikibugs>	 (03PS1) 10Ema: cache: add type Profile::Cache::Sites [puppet] - 10https://gerrit.wikimedia.org/r/618745
[13:27:31] <wikibugs>	 (03PS3) 10Kormat: Use 'native' debian format, and exclude irrelevant dirs. [software/transferpy] - 10https://gerrit.wikimedia.org/r/618743
[13:27:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cache: add type Profile::Cache::Sites [puppet] - 10https://gerrit.wikimedia.org/r/618745 (owner: 10Ema)
[13:28:14] <wikibugs>	 (03CR) 10Gilles: "He doesn't, unfortunately. Is there a process for him to request getting +2 in operations/puppet?" [puppet] - 10https://gerrit.wikimedia.org/r/597176 (https://phabricator.wikimedia.org/T225739) (owner: 10Dave Pifke)
[13:30:24] <wikibugs>	 (03PS2) 10Ema: cache: add type Profile::Cache::Sites [puppet] - 10https://gerrit.wikimedia.org/r/618745
[13:32:28] <jayme>	 !log updated helm to 2.16.9-2 on contint*, deploy* and chartmuseum*
[13:32:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:34] <wikibugs>	 (03PS3) 10Ema: cache: add type Profile::Cache::Sites [puppet] - 10https://gerrit.wikimedia.org/r/618745
[13:40:45] <wikibugs>	 (03PS4) 10Ema: cache: add type Profile::Cache::Sites [puppet] - 10https://gerrit.wikimedia.org/r/618745
[13:42:12] <wikibugs>	 (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/618745 (owner: 10Ema)
[13:48:56] <wikibugs>	 (03CR) 10Ppchelko: [C: 04-1] "For health checks, why not use https://www.envoyproxy.io/docs/envoy/latest/api-v2/config/filter/http/health_check/v2/health_check.proto wi" [deployment-charts] - 10https://gerrit.wikimedia.org/r/616121 (https://phabricator.wikimedia.org/T254908) (owner: 10Hnowlan)
[13:49:30] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[13:49:53] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Move x1 and misc logical dumps to dbprov1003 [puppet] - 10https://gerrit.wikimedia.org/r/618722 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[13:50:24] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[13:51:01] <wikibugs>	 (03CR) 10Ppchelko: [C: 04-1] "As for stats endpoint, is the purpose to use /stats/prometheus to enable native prometheus exporting?" [deployment-charts] - 10https://gerrit.wikimedia.org/r/616121 (https://phabricator.wikimedia.org/T254908) (owner: 10Hnowlan)
[13:51:39] <wikibugs>	 (03PS1) 10Kormat: Ignore debuild-generated files [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618749
[13:53:23] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] "Why not just make cumin a build depends?" [software/transferpy] - 10https://gerrit.wikimedia.org/r/618743 (owner: 10Kormat)
[13:54:14] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] Ignore debuild-generated files [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618749 (owner: 10Kormat)
[13:54:54] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] dbmonitor: use default 1H TTL [dns] - 10https://gerrit.wikimedia.org/r/618744 (owner: 10Volans)
[13:55:48] <wikibugs>	 (03CR) 10Kormat: "> Patch Set 3: Code-Review+1" [software/transferpy] - 10https://gerrit.wikimedia.org/r/618743 (owner: 10Kormat)
[13:56:10] <wikibugs>	 (03PS3) 10Ottomata: Add eventgate service specific test.event streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618550 (https://phabricator.wikimedia.org/T251935)
[13:56:15] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Use 'native' debian format, and exclude irrelevant dirs. [software/transferpy] - 10https://gerrit.wikimedia.org/r/618743 (owner: 10Kormat)
[13:56:41] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Ignore debuild-generated files [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618749 (owner: 10Kormat)
[13:56:46] <wikibugs>	 (03Merged) 10jenkins-bot: Use 'native' debian format, and exclude irrelevant dirs. [software/transferpy] - 10https://gerrit.wikimedia.org/r/618743 (owner: 10Kormat)
[13:57:30] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Add eventgate service specific test.event streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618550 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[13:58:08] <wikibugs>	 10Operations, 10ops-codfw, 10RESTBase: restbase2009 down - https://phabricator.wikimedia.org/T256863 (10Eevans) >>! In T256863#6365459, @hnowlan wrote: > I'm looking into this today - I see that restbase2009 is up 9 days, has been configured by puppet and added to the Cassandra cluster but I don't see anythi...
[13:59:14] <wikibugs>	 (03PS1) 10Andrew Bogott: openstack haproxy: increase server timeout to 120s [puppet] - 10https://gerrit.wikimedia.org/r/618753
[14:00:54] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-* test.event streams - T251935 (duration: 01m 08s)
[14:00:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:00:58] <stashbot>	 T251935: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935
[14:01:36] <wikibugs>	 10Operations, 10netops, 10Patch-For-Review: OSPF metrics - https://phabricator.wikimedia.org/T200277 (10ayounsi) Discussed it with Faidon and created/populated the custom fields in Netbox.
[14:02:51] <wikibugs>	 10Operations, 10ops-codfw, 10RESTBase: restbase2009 down - https://phabricator.wikimedia.org/T256863 (10Eevans) >>! In T256863#6365705, @Eevans wrote: >>>! In T256863#6365459, @hnowlan wrote: >> I'm looking into this today - I see that restbase2009 is up 9 days, has been configured by puppet and added to the...
[14:05:36] <hashar>	 kormat: CI can build the transferpy debian package ;)
[14:06:21] <hashar>	 there is some documentation for it at https://wikitech.wikimedia.org/wiki/Debian_Glue ;)
[14:08:23] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] openstack haproxy: increase server timeout to 120s [puppet] - 10https://gerrit.wikimedia.org/r/618753 (owner: 10Andrew Bogott)
[14:09:29] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] dbmonitor: use default 1H TTL [dns] - 10https://gerrit.wikimedia.org/r/618744 (owner: 10Volans)
[14:09:31] <kormat>	 hashar: huh, TIL
[14:10:03] <wikibugs>	 (03PS3) 10Ottomata: eventgate - use /v1/_test/events route for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/618624 (https://phabricator.wikimedia.org/T251935)
[14:10:08] <wikibugs>	 (03PS4) 10Ottomata: eventgate - use /v1/_test/events route for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/618624 (https://phabricator.wikimedia.org/T251935)
[14:10:28] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 58 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[14:10:34] <wikibugs>	 (03CR) 10Volans: [C: 03+2] dbmonitor: use default 1H TTL [dns] - 10https://gerrit.wikimedia.org/r/618744 (owner: 10Volans)
[14:16:18] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 46 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[14:16:27] <wikibugs>	 10Operations, 10ops-codfw, 10RESTBase: restbase2009 down - https://phabricator.wikimedia.org/T256863 (10Papaul) @Eevans @hnowlan this is a different machine same disks. The disks were taken out of the old machine and placed into the new machine
[14:17:30] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:18:29] <wikibugs>	 (03PS5) 10Hnowlan: api-gateway: open parts of the admin interface internally [deployment-charts] - 10https://gerrit.wikimedia.org/r/616121 (https://phabricator.wikimedia.org/T254908)
[14:21:10] <wikibugs>	 (03PS5) 10CDanis: Add check_prometheus rules for navtiming [puppet] - 10https://gerrit.wikimedia.org/r/597176 (https://phabricator.wikimedia.org/T225739) (owner: 10Dave Pifke)
[14:21:13] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:23:00] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Add check_prometheus rules for navtiming [puppet] - 10https://gerrit.wikimedia.org/r/597176 (https://phabricator.wikimedia.org/T225739) (owner: 10Dave Pifke)
[14:23:07] <wikibugs>	 (03CR) 10Multichill: [C: 04-1] "Bad solution, see phabricator. Wrong approach." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618245 (https://phabricator.wikimedia.org/T258354) (owner: 10Tobias Andersson)
[14:29:13] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Switch service-checker-image to python3 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/618542 (owner: 10Alexandros Kosiaris)
[14:37:00] <wikibugs>	 (03PS5) 10Ottomata: eventgate - use /v1/_test/events route for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/618624 (https://phabricator.wikimedia.org/T251935)
[14:38:40] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:38:47] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate - use /v1/_test/events route for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/618624 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[14:40:38] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:42:44] <wikibugs>	 (03PS1) 10Ottomata: eventgate-logging-external - use api-ro.discovery.wmnet for remote stream config [deployment-charts] - 10https://gerrit.wikimedia.org/r/618762
[14:44:57] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (CI & Testing services): Assess whether we should still disable seccomp in Docker for CI - https://phabricator.wikimedia.org/T249729 (10hashar) 05Open→03Resolved
[14:46:10] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate-logging-external - use api-ro.discovery.wmnet for remote stream config [deployment-charts] - 10https://gerrit.wikimedia.org/r/618762 (owner: 10Ottomata)
[14:46:46] <wikibugs>	 (03PS8) 10Cwhite: prometheus: puppetized install of prometheus-es-exporter [puppet] - 10https://gerrit.wikimedia.org/r/617260 (https://phabricator.wikimedia.org/T256418)
[14:50:53] <logmsgbot>	 !log fdans@deploy1001 Started deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3
[14:50:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:21] <wikibugs>	 10Operations, 10DNS, 10Traffic: Verify diff.wikimedia.org ownership for Facebook - https://phabricator.wikimedia.org/T259807 (10CKoerner_WMF)
[14:55:39] <wikibugs>	 (03PS6) 10JMeybohm: Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738
[14:56:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738 (owner: 10JMeybohm)
[14:57:01] <wikibugs>	 (03CR) 10JMeybohm: Add basic sre.discovery.pool and sre.discovery.depool (0313 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/618738 (owner: 10JMeybohm)
[14:58:09] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Add Debian packaging [debs/karma] - 10https://gerrit.wikimedia.org/r/618764 (https://phabricator.wikimedia.org/T258948)
[14:58:37] <wikibugs>	 10Operations, 10Keyholder: After arming a new key in keyholder, the identity file path does not show up - https://phabricator.wikimedia.org/T257329 (10hashar) - 4096 SHA256:qoe6/ybxTT1xw+RXdA1ecioQFh1AYjzGjluYt1uT25s /etc/keyholder.d/deploy_ci_docroot (RSA)  Thank you ;)
[14:58:48] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:02:40] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:02:49] <wikibugs>	 (03PS1) 10Volans: stdlib: add netmask_to_cidr parser function [puppet] - 10https://gerrit.wikimedia.org/r/618765
[15:02:51] <wikibugs>	 (03PS1) 10Volans: interface::alias: add optional is_service_ip param [puppet] - 10https://gerrit.wikimedia.org/r/618766
[15:02:53] <wikibugs>	 (03PS1) 10Volans: cassandra::instance: use real netmask for IP alias [puppet] - 10https://gerrit.wikimedia.org/r/618767
[15:03:03] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the 'wmf' group for Monte Hurd - https://phabricator.wikimedia.org/T259382 (10akosiaris) 05Open→03Invalid I am gonna close this as invalid. Monte has been around for a long time and is definitely in the wmf group. @Mhurd if there is some kind of access y...
[15:05:40] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:07:32] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:10:55] <logmsgbot>	 !log fdans@deploy1001 Finished deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3 (duration: 20m 01s)
[15:10:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:10] <icinga-wm>	 PROBLEM - k8s API server requests latencies on neon is CRITICAL: instance=10.64.0.40 verb=PATCH https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[15:13:20] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:13:48] <wikibugs>	 (03CR) 10Cwhite: "Overall LGTM (haven't tried to build it though)." (031 comment) [debs/karma] - 10https://gerrit.wikimedia.org/r/618764 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[15:14:04] <icinga-wm>	 RECOVERY - k8s API server requests latencies on neon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[15:14:08] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:18:00] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:19:03] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:21:48] <wikibugs>	 (03CR) 10Volans: "Compiler results on few random hosts that use the define:" [puppet] - 10https://gerrit.wikimedia.org/r/618766 (owner: 10Volans)
[15:23:23] <wikibugs>	 (03CR) 10Volans: "Some compiler results available here:" [puppet] - 10https://gerrit.wikimedia.org/r/618767 (owner: 10Volans)
[15:24:22] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: aptrepo: Update jenkins gpg release key [puppet] - 10https://gerrit.wikimedia.org/r/618771 (https://phabricator.wikimedia.org/T259116)
[15:24:40] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10Jenkins, 10Patch-For-Review: Update Jenkins gpg release key in reprepro - https://phabricator.wikimedia.org/T259116 (10akosiaris) > I could not find where we store that key in puppet :-\  That's cause we don't store it. We just use the fingerprint.
[15:29:45] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/618767 (owner: 10Volans)
[15:31:32] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:32:17] <wikibugs>	 (03PS1) 10Bartosz Dziewoński: Enable DiscussionTools as a beta feature on 8 more wikis ("phase 1") [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618773 (https://phabricator.wikimedia.org/T259574)
[15:32:24] <wikibugs>	 (03CR) 10Ayounsi: "I don't know Ruby enough to review that." [puppet] - 10https://gerrit.wikimedia.org/r/618765 (owner: 10Volans)
[15:32:36] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:34:59] <wikibugs>	 (03PS1) 10Ottomata: eventgate - test_events cannot be templated; it is needed in values to be used in deployment.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/618775 (https://phabricator.wikimedia.org/T251609)
[15:36:28] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[15:37:20] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate - test_events cannot be templated; it is needed in values to be used in deployment.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/618775 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata)
[15:37:20] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:37:45] <wikibugs>	 (03CR) 10Ayounsi: "2 comments, otherwise LGTM." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/618766 (owner: 10Volans)
[15:37:50] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] interface::alias: add optional is_service_ip param [puppet] - 10https://gerrit.wikimedia.org/r/618766 (owner: 10Volans)
[15:40:22] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
[15:40:23] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
[15:40:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:40:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:42:10] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM to my ruby-untrained eye, see inline for (quite optional) additional tests" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618765 (owner: 10Volans)
[15:42:28] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] interface::alias: add optional is_service_ip param (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/618766 (owner: 10Volans)
[15:46:36] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] interface::alias: add optional is_service_ip param [puppet] - 10https://gerrit.wikimedia.org/r/618766 (owner: 10Volans)
[15:46:54] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 50 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[15:46:58] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:47:58] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] cassandra::instance: use real netmask for IP alias [puppet] - 10https://gerrit.wikimedia.org/r/618767 (owner: 10Volans)
[15:49:08] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Revert "Remove access for nathante" [puppet] - 10https://gerrit.wikimedia.org/r/618779 (https://phabricator.wikimedia.org/T256356)
[15:55:54] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] Revert "Remove access for nathante" [puppet] - 10https://gerrit.wikimedia.org/r/618779 (https://phabricator.wikimedia.org/T256356) (owner: 10Alexandros Kosiaris)
[15:57:24] <wikibugs>	 (03PS7) 10JMeybohm: Add basic sre.discovery.pool and sre.discovery.depool [cookbooks] - 10https://gerrit.wikimedia.org/r/618738
[16:00:04] <jouncebot>	 godog and _joe_: (Dis)respected human, time to deploy Puppet request window(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T1600). Please do the needful.
[16:04:38] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 52 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[16:05:03] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10VPS-Projects: Puppet failures on deployment-docker-changeprop01, deployment-docker-cpjobqueue01, deployment-push-notifications01, and deployment-docker-proton01 due to Docker version pinning - https://phabricator.wikimedia.org/T259812 (10bd808)
[16:06:21] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Add Debian packaging [debs/karma] - 10https://gerrit.wikimedia.org/r/618764 (https://phabricator.wikimedia.org/T258948)
[16:06:47] <wikibugs>	 (03CR) 10Filippo Giunchedi: "> Patch Set 1:" (031 comment) [debs/karma] - 10https://gerrit.wikimedia.org/r/618764 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[16:09:26] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:11:23] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:11:23] <wikibugs>	 (03PS2) 10Volans: stdlib: add netmask_to_cidr parser function [puppet] - 10https://gerrit.wikimedia.org/r/618765
[16:11:25] <wikibugs>	 (03PS2) 10Volans: interface::alias: add optional is_service_ip param [puppet] - 10https://gerrit.wikimedia.org/r/618766
[16:11:27] <wikibugs>	 (03PS2) 10Volans: cassandra::instance: use real netmask for IP alias [puppet] - 10https://gerrit.wikimedia.org/r/618767
[16:13:57] <wikibugs>	 (03CR) 10Volans: "reply inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618765 (owner: 10Volans)
[16:15:14] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:15:34] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] stdlib: add netmask_to_cidr parser function (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618765 (owner: 10Volans)
[16:18:24] <logmsgbot>	 !log chrisalbon@deploy1001 Started deploy [ores/deploy@f3c44be]: T258435
[16:18:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:27] <stashbot>	 T258435: ORES deployment Late July 2020 - https://phabricator.wikimedia.org/T258435
[16:18:48] <logmsgbot>	 !log dpifke@deploy1001 Started deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167
[16:18:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:18:51] <stashbot>	 T259167: Truncated ArcLamp output files - https://phabricator.wikimedia.org/T259167
[16:18:54] <logmsgbot>	 !log dpifke@deploy1001 Finished deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167 (duration: 00m 05s)
[16:18:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:06] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[16:19:17] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "Thanks!" [debs/karma] - 10https://gerrit.wikimedia.org/r/618764 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[16:19:41] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "Nice! LGTM, last couple of replies inline." (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/618738 (owner: 10JMeybohm)
[16:20:13] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] prometheus: puppetized install of prometheus-es-exporter [puppet] - 10https://gerrit.wikimedia.org/r/617260 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[16:20:55] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] prometheus: puppetized install of prometheus-es-exporter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/617260 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[16:25:43] <wikibugs>	 (03CR) 10Volans: "Updated compiler results:" [puppet] - 10https://gerrit.wikimedia.org/r/618767 (owner: 10Volans)
[16:27:29] <wikibugs>	 (03PS1) 10Brennen Bearnes: Fix array unpacking as argument list [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618582 (https://phabricator.wikimedia.org/T259745)
[16:32:36] <logmsgbot>	 !log chrisalbon@deploy1001 Finished deploy [ores/deploy@f3c44be]: T258435 (duration: 14m 12s)
[16:32:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:39] <stashbot>	 T258435: ORES deployment Late July 2020 - https://phabricator.wikimedia.org/T258435
[16:38:35] <wikibugs>	 (03PS1) 10Gergő Tisza: Fix "Ask mentor" help panel button styling [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618583 (https://phabricator.wikimedia.org/T250235)
[16:39:41] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "per https://wiki.mozilla.org/Security/DOH-resolver-policy" [puppet] - 10https://gerrit.wikimedia.org/r/618591 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh)
[16:41:54] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10User-dancy, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10Ladsgroup) Seconding Daniel here. In cases of peak, we had experiences of outages (after dea...
[16:44:53] <wikibugs>	 (03PS1) 10Gergő Tisza: Direct GrowthExperiments help panel questions to mentors on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618786 (https://phabricator.wikimedia.org/T250235)
[16:45:37] <wikibugs>	 (03CR) 10Gergő Tisza: "To be deployed on Monday." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618786 (https://phabricator.wikimedia.org/T250235) (owner: 10Gergő Tisza)
[16:45:39] <wikibugs>	 (03PS1) 10JMeybohm: eventgate: Fix repository URL in requirements, bump to 0.2.9 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618787
[16:46:08] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] eventgate: Fix repository URL in requirements, bump to 0.2.9 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618787 (owner: 10JMeybohm)
[16:46:26] <wikibugs>	 (03PS1) 10Volans: wmcs: remove unused leftover records [dns] - 10https://gerrit.wikimedia.org/r/618788
[16:46:42] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] eventgate: Fix repository URL in requirements, bump to 0.2.9 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618787 (owner: 10JMeybohm)
[16:47:44] <wikibugs>	 (03Merged) 10jenkins-bot: eventgate: Fix repository URL in requirements, bump to 0.2.9 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618787 (owner: 10JMeybohm)
[16:49:51] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] wmcs: remove unused leftover records [dns] - 10https://gerrit.wikimedia.org/r/618788 (owner: 10Volans)
[16:51:15] <wikibugs>	 (03PS1) 10JMeybohm: changeprop: Fix repository URL in requirements, bump to 0.9.52 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618790
[16:52:38] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10User-dancy, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10DStrine) Fishbowl please and thanks!
[16:53:53] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+1] changeprop: Fix repository URL in requirements, bump to 0.9.52 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618790 (owner: 10JMeybohm)
[16:54:50] <wikibugs>	 10Operations, 10Traffic: Enable DNSSEC validation in Wikidough - https://phabricator.wikimedia.org/T259816 (10ssingh)
[16:56:08] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh)
[16:56:10] <wikibugs>	 10Operations, 10Traffic: Enable DNSSEC validation in Wikidough - https://phabricator.wikimedia.org/T259816 (10ssingh)
[16:59:54] <wikibugs>	 10Operations, 10ops-codfw, 10RESTBase: restbase2009 down - https://phabricator.wikimedia.org/T256863 (10WDoranWMF) @Papaul @wkandek @akosiaris @wiki_willy   Hope everyone is relatively well. I've also sent this as an email.  There is an issue as a result of 2009 coming back online. The rough chronology I hav...
[17:00:04] <jouncebot>	 halfak and accraze: #bothumor My software never has bugs. It just develops random features. Rise for Services – Graphoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T1700).
[17:03:28] <brennen>	 i'm going to get https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/+/618582 out and roll the train to group1 shortly - cc: dancy.
[17:04:46] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Fix array unpacking as argument list [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618582 (https://phabricator.wikimedia.org/T259745) (owner: 10Brennen Bearnes)
[17:08:09] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Bstorm)
[17:14:31] <wikibugs>	 (03CR) 10Volans: [C: 03+2] wmcs: remove unused leftover records [dns] - 10https://gerrit.wikimedia.org/r/618788 (owner: 10Volans)
[17:15:54] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 50 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:16:03] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Bstorm)
[17:23:41] <wikibugs>	 (03Merged) 10jenkins-bot: Fix array unpacking as argument list [extensions/WikibaseMediaInfo] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618582 (https://phabricator.wikimedia.org/T259745) (owner: 10Brennen Bearnes)
[17:25:46] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 51 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[17:28:54] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10VPS-Projects: Puppet failures on deployment-docker-changeprop01, deployment-docker-cpjobqueue01, deployment-push-notifications01, deployment-docker-mobileapps01, and deployment-docker-proton01 due to Docker version pinning - https://phabricator.wikimedia.org/T259812 (...
[17:31:06] <wikibugs>	 (03PS1) 10Dzahn: set TTL for webperf* static entries to default 1H [dns] - 10https://gerrit.wikimedia.org/r/618792
[17:31:15] <mutante>	 volans: ^ fixing that
[17:31:28] <wikibugs>	 (03PS1) 10Cwhite: prometheus: add first draft query to es_exporter [puppet] - 10https://gerrit.wikimedia.org/r/618793 (https://phabricator.wikimedia.org/T256418)
[17:33:49] <wikibugs>	 (03CR) 10Volans: "Thanks! see one comment inline" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/618792 (owner: 10Dzahn)
[17:33:50] <volans>	 thx mutante 
[17:34:21] <volans>	 you got to it before me, still stuck in other rabbit holes
[17:35:05] <wikibugs>	 (03PS2) 10Dzahn: set TTL for webperf* static entries to default 1H [dns] - 10https://gerrit.wikimedia.org/r/618792
[17:35:13] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] prometheus: add first draft query to es_exporter [puppet] - 10https://gerrit.wikimedia.org/r/618793 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[17:35:43] <wikibugs>	 (03PS1) 10BryanDavis: ci: Bump Buster docker.io version to match apt repos [puppet] - 10https://gerrit.wikimedia.org/r/618795 (https://phabricator.wikimedia.org/T259812)
[17:35:49] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, thx" [dns] - 10https://gerrit.wikimedia.org/r/618792 (owner: 10Dzahn)
[17:36:14] <logmsgbot>	 !log brennen@deploy1001 Synchronized php-1.36.0-wmf.3/extensions/WikibaseMediaInfo/src/View/MediaInfoEntityTermsView.php: Backport: [[gerrit:618582|Fix array unpacking as argument list]] (T259745) (duration: 01m 07s)
[17:36:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:36:18] <stashbot>	 T259745: Uncaught ArgumentCountError: Too few arguments to function OOUI\Tag::appendContent(), 0 passed - https://phabricator.wikimedia.org/T259745
[17:37:52] <brennen>	 !log train 1.36.0-wmf.3: proceeding to group1
[17:37:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:38:35] <wikibugs>	 (03PS1) 10Brennen Bearnes: group1 wikis to 1.36.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618796
[17:38:37] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] group1 wikis to 1.36.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618796 (owner: 10Brennen Bearnes)
[17:39:11] <wikibugs>	 (03CR) 10BryanDavis: "Currently pinned version has been replaced in apt repo" [puppet] - 10https://gerrit.wikimedia.org/r/618795 (https://phabricator.wikimedia.org/T259812) (owner: 10BryanDavis)
[17:39:24] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.36.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618796 (owner: 10Brennen Bearnes)
[17:39:38] <wikibugs>	 (03CR) 10Dzahn: "e" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/618792 (owner: 10Dzahn)
[17:40:32] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10VPS-Projects, 10Patch-For-Review, and 2 others: Puppet failures on deployment-docker-changeprop01, deployment-docker-cpjobqueue01, deployment-push-notifications01, deployment-docker-mobileapps01, and deployment-docker-prot... - https://phabricator.wikimedia.org/T259812
[17:41:32] <logmsgbot>	 !log brennen@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
[17:41:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:42:39] <logmsgbot>	 !log brennen@deploy1001 Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 06s)
[17:42:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:47:27] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] set TTL for webperf* static entries to default 1H [dns] - 10https://gerrit.wikimedia.org/r/618792 (owner: 10Dzahn)
[17:48:39] <wikibugs>	 (03PS1) 10Cwhite: profile,prometheus: create define for prometheus-es-exporter configs [puppet] - 10https://gerrit.wikimedia.org/r/618797 (https://phabricator.wikimedia.org/T256418)
[17:50:58] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] ci: Bump Buster docker.io version to match apt repos [puppet] - 10https://gerrit.wikimedia.org/r/618795 (https://phabricator.wikimedia.org/T259812) (owner: 10BryanDavis)
[17:51:07] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] "pcc checks out https://puppet-compiler.wmflabs.org/compiler1002/24360/" [puppet] - 10https://gerrit.wikimedia.org/r/618797 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[17:54:27] <wikibugs>	 (03PS1) 10Cwhite: profile:prometheus: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/618798 (https://phabricator.wikimedia.org/T256418)
[17:54:46] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] profile:prometheus: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/618798 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[17:56:23] <wikibugs>	 (03PS2) 10Cwhite: profile:prometheus: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/618798 (https://phabricator.wikimedia.org/T256418)
[17:56:27] <wikibugs>	 (03CR) 10Cwhite: [V: 03+2 C: 03+2] profile:prometheus: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/618798 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[18:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Morning backport window(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T1800).
[18:00:04] <jouncebot>	 Pchelolo, tgr, and MatmaRex: A patch you scheduled for Morning backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[18:00:34] <MatmaRex>	 hello
[18:01:00] <Urbanecm>	 hello MatmaRex 
[18:01:03] <Urbanecm>	 happy to deploy today :)
[18:01:21] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "B&C" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618773 (https://phabricator.wikimedia.org/T259574) (owner: 10Bartosz Dziewoński)
[18:02:00] <Urbanecm>	 Pchelolo: do you want me to ping you at the end to self-service?
[18:02:07] <wikibugs>	 (03Merged) 10jenkins-bot: Enable DiscussionTools as a beta feature on 8 more wikis ("phase 1") [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618773 (https://phabricator.wikimedia.org/T259574) (owner: 10Bartosz Dziewoński)
[18:02:16] <wikibugs>	 (03PS1) 10Cwhite: prometheus: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/618799 (https://phabricator.wikimedia.org/T256418)
[18:02:27] <Pchelolo>	 Urbanecm: if you are going to be deploying, would you mind doing mine too?
[18:02:33] <Pchelolo>	 it's trivial
[18:02:36] <tgr>	 o/
[18:02:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/618799 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[18:03:00] <wikibugs>	 (03PS2) 10Cwhite: prometheus: fix typo [puppet] - 10https://gerrit.wikimedia.org/r/618799 (https://phabricator.wikimedia.org/T256418)
[18:03:03] <Urbanecm>	 not at all, it makes complete sense, so happy to do that too :)
[18:03:13] <Pchelolo>	 thank you!
[18:03:35] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Fix "Ask mentor" help panel button styling [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618583 (https://phabricator.wikimedia.org/T250235) (owner: 10Gergő Tisza)
[18:03:37] <Urbanecm>	 no problem :)
[18:04:01] <Urbanecm>	 MatmaRex: your patch is at mwdebug1001
[18:04:41] <MatmaRex>	 Urbanecm: seems good
[18:04:49] <Urbanecm>	 thanks, syncing
[18:07:31] <wikibugs>	 (03PS9) 10Urbanecm: Remove temporary logging for mediamoderation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606239 (https://phabricator.wikimedia.org/T259742) (owner: 10Cicalese)
[18:07:33] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 9695811a30de30471a81b6ad05aa5e625f52caf1: : Enable DiscussionTools as a beta feature on 8 more wikis ("phase 1") (T259574) (duration: 01m 06s)
[18:07:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:38] <stashbot>	 T259574: Make config change to enable Reply Tool as Beta Feature at Phase 1 wikis - https://phabricator.wikimedia.org/T259574
[18:07:38] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Remove temporary logging for mediamoderation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606239 (https://phabricator.wikimedia.org/T259742) (owner: 10Cicalese)
[18:07:45] <Urbanecm>	 MatmaRex: here you go
[18:07:57] <MatmaRex>	 thanks!
[18:08:28] <wikibugs>	 (03Merged) 10jenkins-bot: Remove temporary logging for mediamoderation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606239 (https://phabricator.wikimedia.org/T259742) (owner: 10Cicalese)
[18:08:42] <Urbanecm>	 my pleasure MatmaRex :)
[18:10:53] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 9db96595695b5ec1144c078e8961b3c04e8983cf: Remove temporary logging for mediamoderation (T259742) (duration: 01m 07s)
[18:10:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:10:57] <stashbot>	 T259742: Turn off MediaModeration debug logging in Production - https://phabricator.wikimedia.org/T259742
[18:11:27] <Urbanecm>	 Pchelolo: done, labs should be done automatically
[18:11:41] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
[18:11:41] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
[18:11:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:44] <Pchelolo>	 thank you!
[18:12:44] <Urbanecm>	 tgr: as soon as CI allows us, your patch will be ready. Wanna self-service, or should I?
[18:13:04] <tgr>	 Urbanecm: please do if it's no trouble
[18:13:10] <Urbanecm>	 not at all
[18:13:18] <wikibugs>	 (03Merged) 10jenkins-bot: Fix "Ask mentor" help panel button styling [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618583 (https://phabricator.wikimedia.org/T250235) (owner: 10Gergő Tisza)
[18:15:36] <Urbanecm>	 tgr: available at mwdebug1001
[18:17:20] <tgr>	 Urbanecm: thanks, tested
[18:17:27] <Urbanecm>	 thanks, syncing
[18:20:19] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/modules/: fb4a80830d7d915479e097cc82c681c5fb03d51b: Fix "Ask mentor" help panel button styling (T250235) (duration: 01m 07s)
[18:20:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:20:22] <stashbot>	 T250235: Scale: pilot help panel with mentorship - https://phabricator.wikimedia.org/T250235
[18:20:25] <Urbanecm>	 tgr: done
[18:20:38] <icinga-wm>	 PROBLEM - Ensure hosts are not performing a change on every puppet run on puppetdb1002 is CRITICAL: CRITICAL: the following (5) node(s) change every puppet run: analytics1041.eqiad.wmnet, puppetmaster2001.codfw.wmnet, deneb.codfw.wmnet, wdqs1009.eqiad.wmnet, testreduce1001.eqiad.wmnet https://wikitech.wikimedia.org/wiki/Puppet%23check_puppet_run_changes
[18:20:39] <tgr>	 thx!
[18:20:49] <Urbanecm>	 no problem
[18:21:21] <Urbanecm>	 !log Morning B&C window was completed
[18:21:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:22:03] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10VPS-Projects: Puppet failures on deployment-docker-changeprop01, deployment-docker-cpjobqueue01, deployment-push-notifications01, deployment-docker-mobileapps01, and deployment-docker-proton01 due to Docker version pinning - https://phabricator.wikimedia.org/T259812 (...
[18:29:51] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
[18:29:52] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
[18:29:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:29:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:30:17] <wikibugs>	 (03CR) 10Ebernhardson: [C: 03+2] Scap: git_fat -> git_binary_manager [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/404222 (https://phabricator.wikimedia.org/T184882) (owner: 10Thcipriani)
[18:33:19] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Add Backy2 module and profile [puppet] - 10https://gerrit.wikimedia.org/r/617841 (https://phabricator.wikimedia.org/T259192) (owner: 10Andrew Bogott)
[18:34:42] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki2001 - https://phabricator.wikimedia.org/T259825 (10RobH)
[18:35:03] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki1001 - https://phabricator.wikimedia.org/T259826 (10RobH)
[18:35:35] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki2001 - https://phabricator.wikimedia.org/T259825 (10RobH)
[18:35:53] <wikibugs>	 10Operations, 10User-jbond: OKR: Install and configure new CFSSL PKI server - https://phabricator.wikimedia.org/T259117 (10RobH)
[18:36:07] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki1001 - https://phabricator.wikimedia.org/T259826 (10RobH)
[18:36:22] <wikibugs>	 10Operations, 10User-jbond: OKR: Install and configure new CFSSL PKI server - https://phabricator.wikimedia.org/T259117 (10RobH)
[18:36:51] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] wikidough: enable QNAME minimisation for the dnsrecursor module [puppet] - 10https://gerrit.wikimedia.org/r/618591 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh)
[18:40:46] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:42:42] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:55:47] <wikibugs>	 (03PS1) 10Bstorm: share storage: remove nfs-manage-binds [puppet] - 10https://gerrit.wikimedia.org/r/618804 (https://phabricator.wikimedia.org/T169570)
[18:57:21] <wikibugs>	 (03PS2) 10Bstorm: shared-storage: remove nfs-manage-binds [puppet] - 10https://gerrit.wikimedia.org/r/618804 (https://phabricator.wikimedia.org/T169570)
[18:57:59] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
[18:57:59] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
[18:58:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:58:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:00:05] <jouncebot>	 brennen and dancy: #bothumor My software never has bugs. It just develops random features. Rise for Mediawiki train - American Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T1900).
[19:00:25] * dancy salutes.
[19:00:56] <brennen>	 dancy: logs still looking good, running deploy-promote.
[19:01:27] <wikibugs>	 (03PS1) 10Brennen Bearnes: all wikis to 1.36.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618805
[19:01:29] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] all wikis to 1.36.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618805 (owner: 10Brennen Bearnes)
[19:02:14] <wikibugs>	 (03Merged) 10jenkins-bot: all wikis to 1.36.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618805 (owner: 10Brennen Bearnes)
[19:04:23] <logmsgbot>	 !log brennen@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.3
[19:04:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:05:45] <wikibugs>	 (03CR) 10Bstorm: "https://puppet-compiler.wmflabs.org/compiler1003/24361/" [puppet] - 10https://gerrit.wikimedia.org/r/618804 (https://phabricator.wikimedia.org/T169570) (owner: 10Bstorm)
[19:05:50] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] shared-storage: remove nfs-manage-binds [puppet] - 10https://gerrit.wikimedia.org/r/618804 (https://phabricator.wikimedia.org/T169570) (owner: 10Bstorm)
[19:06:07] <wikibugs>	 (03PS1) 10Catrope: WelcomeSurvey: Use autonyms for language question [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618806 (https://phabricator.wikimedia.org/T232410)
[19:06:35] <wikibugs>	 (03PS1) 10Catrope: WelcomeSurvey: Reuse server-rendered language question field [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618807 (https://phabricator.wikimedia.org/T232410)
[19:08:04] <wikibugs>	 (03PS1) 10Ayounsi: Netbox driven interfaces for cr1/2-eqiad [homer/public] - 10https://gerrit.wikimedia.org/r/618827
[19:09:15] <dancy>	 brennen: Lookin' good
[19:09:25] <wikibugs>	 (03PS2) 10Catrope: WelcomeSurvey: Reuse server-rendered language question field [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618807 (https://phabricator.wikimedia.org/T232410)
[19:10:02] <wikibugs>	 (03PS1) 10Ssingh: dnsrecursor: update the location of socket-dir for 4.3.0 [puppet] - 10https://gerrit.wikimedia.org/r/618830
[19:10:13] <wikibugs>	 (03CR) 10Catrope: "This change is ready for review." [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618807 (https://phabricator.wikimedia.org/T232410) (owner: 10Catrope)
[19:10:28] <brennen>	 dancy: yep.  i'll keep an eye on it, but smooth sailing so far.
[19:13:53] <wikibugs>	 (03CR) 10Ssingh: "Confirming no change to dns2001 and cloudservices:" [puppet] - 10https://gerrit.wikimedia.org/r/618830 (owner: 10Ssingh)
[19:18:47] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] dnsrecursor: update the location of socket-dir for 4.3.0 [puppet] - 10https://gerrit.wikimedia.org/r/618830 (owner: 10Ssingh)
[19:19:36] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] dnsrecursor: update the location of socket-dir for 4.3.0 [puppet] - 10https://gerrit.wikimedia.org/r/618830 (owner: 10Ssingh)
[19:22:14] <wikibugs>	 10Operations, 10Analytics-Radar, 10Traffic: Spammy events coming our way for sites such us https://ru.wikipedia.kim - https://phabricator.wikimedia.org/T190843 (10Nathan708) is there any legal caution mated for such an act; because it looks as if these guys mirror wikipedia. But they can only mirror wikipedi...
[19:26:04] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
[19:26:04] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
[19:26:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:26:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:33:43] <wikibugs>	 10Operations, 10Toolhub, 10Wikimedia-Mailing-lists, 10User-bd808: Create toolhub-dev@lists.wikimedia.org - https://phabricator.wikimedia.org/T259830 (10bd808)
[19:35:12] <wikibugs>	 (03PS1) 10Ottomata: wgEventStreams - fix typo in eventgate stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618833 (https://phabricator.wikimedia.org/T251935)
[19:35:46] <ottomata>	 brennen: train ok?  lemme know if it i ok to sync ^
[19:36:23] <ottomata>	 or dancy  ^
[19:36:39] <dancy>	 Train looks ok.
[19:36:56] <dancy>	 No alarming log entries.
[19:37:45] <ottomata>	 ok proceeding shoudln't affect any mw stuff
[19:38:40] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] wgEventStreams - fix typo in eventgate stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618833 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[19:40:13] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix typo in eventgate stream config - T251935 (duration: 00m 59s)
[19:40:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:40:17] <stashbot>	 T251935: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935
[19:42:32] <wikibugs>	 (03PS1) 10Ottomata: wgEventStreams - fix another typo in eventgate stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618837 (https://phabricator.wikimedia.org/T251935)
[19:42:41] <subbu>	 brennen do you know why https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/618038 ( which had cherrypicked 0.13.0-a4 onto wmf-3 branch ) didn't work? It looks like it didn't ride the train .. https://www.mediawiki.org/wiki/Special:Version says that Parsoid is at 0.13.0-a3 /cc cscott 
[19:43:21] <brennen>	 subbu: was just looking at that
[19:43:23] <brennen>	 unclear
[19:43:27] <subbu>	 ok.
[19:43:49] <subbu>	 ah scott was asking in -releng. :)
[19:44:02] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] wgEventStreams - fix another typo in eventgate stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618837 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[19:45:27] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix another typo in eventgate stream config - T251935 (duration: 00m 58s)
[19:45:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:45:31] <stashbot>	 T251935: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935
[19:47:35] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
[19:47:35] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
[19:48:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:48:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:49:11] <wikibugs>	 10Operations, 10Toolhub, 10Wikimedia-Mailing-lists, 10User-bd808: Create toolhub-dev@lists.wikimedia.org - https://phabricator.wikimedia.org/T259830 (10bd808)
[19:51:22] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:54:09] <wikibugs>	 (03PS1) 10Ottomata: eventgate-* - precache /test/event/1.0.0 schema [deployment-charts] - 10https://gerrit.wikimedia.org/r/618838 (https://phabricator.wikimedia.org/T251935)
[19:56:00] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate-* - precache /test/event/1.0.0 schema [deployment-charts] - 10https://gerrit.wikimedia.org/r/618838 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[19:57:12] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[19:58:13] <wikibugs>	 (03PS1) 10Tchanders: Enable Special:Investigate on French Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618839 (https://phabricator.wikimedia.org/T257891)
[20:01:02] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:01:08] <wikibugs>	 (03PS1) 10Ottomata: eventgate - fix httpGet port [deployment-charts] - 10https://gerrit.wikimedia.org/r/618840
[20:02:13] <wikibugs>	 (03CR) 10DannyS712: [C: 03+1] Enable Special:Investigate on French Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618839 (https://phabricator.wikimedia.org/T257891) (owner: 10Tchanders)
[20:02:20] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate - fix httpGet port [deployment-charts] - 10https://gerrit.wikimedia.org/r/618840 (owner: 10Ottomata)
[20:02:58] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:06:01] <wikibugs>	 (03PS1) 10Andrew Bogott: backy2: fix up some dependency issues in install [puppet] - 10https://gerrit.wikimedia.org/r/618842 (https://phabricator.wikimedia.org/T259192)
[20:06:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] backy2: fix up some dependency issues in install [puppet] - 10https://gerrit.wikimedia.org/r/618842 (https://phabricator.wikimedia.org/T259192) (owner: 10Andrew Bogott)
[20:07:28] <wikibugs>	 (03PS2) 10Andrew Bogott: backy2: fix up some dependency issues in install [puppet] - 10https://gerrit.wikimedia.org/r/618842 (https://phabricator.wikimedia.org/T259192)
[20:08:32] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] backy2: fix up some dependency issues in install [puppet] - 10https://gerrit.wikimedia.org/r/618842 (https://phabricator.wikimedia.org/T259192) (owner: 10Andrew Bogott)
[20:11:57] <wikibugs>	 (03PS1) 10Ottomata: eventgate - bump chart version to 0.2.10 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618843
[20:13:11] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate - bump chart version to 0.2.10 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618843 (owner: 10Ottomata)
[20:15:53] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
[20:15:53] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
[20:15:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:16] <brennen>	 jouncebot now
[20:19:16] <jouncebot>	 For the next 0 hour(s) and 40 minute(s): Mediawiki train - American Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T1900)
[20:19:55] <brennen>	 !log manually updating the vendor submodule on 1.36.0 for T259832
[20:19:56] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
[20:19:56] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
[20:19:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:19:58] <stashbot>	 T259832: Deployed 1.36.0-wmf.3 does not have the 1.36.0-wmf.3 branch of mediawiki-vendor - https://phabricator.wikimedia.org/T259832
[20:19:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:20:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:20:20] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1007 is CRITICAL: 0.8342 ge 0.1 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[20:20:28] <brennen>	 ottomata: am i going to step on your toes in any way with that?
[20:25:23] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:26:32] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.7518 ge 0.1 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[20:27:20] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[20:30:34] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1009 is CRITICAL: 0.788 ge 0.1 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[20:35:51] <wikibugs>	 (03PS1) 10Ottomata: eventgate-* - bump image to 2020-08-06-202915-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618847
[20:37:24] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate-* - bump image to 2020-08-06-202915-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618847 (owner: 10Ottomata)
[20:38:09] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
[20:38:09] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
[20:38:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:38:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:39:15] <wikibugs>	 (03PS4) 10Dzahn: profile::gerrit::migrations: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617676 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[20:39:27] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/24364/" [puppet] - 10https://gerrit.wikimedia.org/r/617676 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[20:40:21] <wikibugs>	 (03PS1) 10Andrew Bogott: Introduce role::wmcs::ceph::backup [puppet] - 10https://gerrit.wikimedia.org/r/618849 (https://phabricator.wikimedia.org/T259192)
[20:43:13] <mutante>	 andrewbogott: can i merge backy2 change?
[20:43:29] <andrewbogott>	 yes plesae
[20:43:32] <andrewbogott>	 please
[20:43:46] <mutante>	 done!
[20:44:28] <andrewbogott>	 thx
[20:46:18] <wikibugs>	 (03PS1) 10Brennen Bearnes: Update git submodules [core] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618850
[20:47:16] <wikibugs>	 (03PS1) 10Ottomata: eventgate-logging-external: use remote stream config, eventgate-analytics-external: use constraints [deployment-charts] - 10https://gerrit.wikimedia.org/r/618851 (https://phabricator.wikimedia.org/T251935)
[20:47:43] <shdubsh>	 !log restart logstash -- pipeline appears stuck
[20:47:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:48:31] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate-logging-external: use remote stream config, eventgate-analytics-external: use constraints [deployment-charts] - 10https://gerrit.wikimedia.org/r/618851 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[20:48:53] <wikibugs>	 (03CR) 10Dzahn: "noop" [puppet] - 10https://gerrit.wikimedia.org/r/617676 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[20:49:01] <wikibugs>	 (03PS4) 10Dzahn: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617683 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[20:49:07] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Update git submodules [core] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618850 (owner: 10Brennen Bearnes)
[20:49:39] <wikibugs>	 (03Merged) 10jenkins-bot: eventgate-logging-external: use remote stream config, eventgate-analytics-external: use constraints [deployment-charts] - 10https://gerrit.wikimedia.org/r/618851 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[20:49:59] <brennen>	 jouncebot next
[20:50:00] <jouncebot>	 In 2 hour(s) and 10 minute(s): Evening backport window(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T2300)
[20:50:32] <wikibugs>	 (03PS1) 10C. Scott Ananian: Update git submodules [core] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618852 (https://phabricator.wikimedia.org/T259832)
[20:50:57] <wikibugs>	 (03Abandoned) 10C. Scott Ananian: Update git submodules [core] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618852 (https://phabricator.wikimedia.org/T259832) (owner: 10C. Scott Ananian)
[20:51:18] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1007 is OK: (C)0.1 ge (W)0.05 ge 0.01629 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[20:51:37] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
[20:51:37] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
[20:51:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:51:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:53:22] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:53:38] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1008 is OK: (C)0.1 ge (W)0.05 ge 0 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[20:53:48] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1009 is OK: (C)0.1 ge (W)0.05 ge 0.007107 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[20:54:06] <wikibugs>	 (03PS1) 10Andrew Bogott: Introduce role::wmcs::ceph::backup [puppet] - 10https://gerrit.wikimedia.org/r/618853 (https://phabricator.wikimedia.org/T259192)
[20:54:08] <wikibugs>	 (03PS1) 10Andrew Bogott: Retool cloudvirt1004 and cloudvirt1006 as ceph/backy2 test hosts [puppet] - 10https://gerrit.wikimedia.org/r/618854 (https://phabricator.wikimedia.org/T259192)
[20:54:19] <wikibugs>	 (03Abandoned) 10Andrew Bogott: Introduce role::wmcs::ceph::backup [puppet] - 10https://gerrit.wikimedia.org/r/618849 (https://phabricator.wikimedia.org/T259192) (owner: 10Andrew Bogott)
[20:55:20] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Introduce role::wmcs::ceph::backup [puppet] - 10https://gerrit.wikimedia.org/r/618853 (https://phabricator.wikimedia.org/T259192) (owner: 10Andrew Bogott)
[20:55:36] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Retool cloudvirt1004 and cloudvirt1006 as ceph/backy2 test hosts [puppet] - 10https://gerrit.wikimedia.org/r/618854 (https://phabricator.wikimedia.org/T259192) (owner: 10Andrew Bogott)
[20:59:06] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 50 probes of 568 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[20:59:40] <wikibugs>	 (03PS5) 10Dzahn: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617683 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[21:00:11] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "compiler says noop in prod and cloud" [puppet] - 10https://gerrit.wikimedia.org/r/617683 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[21:01:06] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:10:07] <wikibugs>	 (03Merged) 10jenkins-bot: Update git submodules [core] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618850 (owner: 10Brennen Bearnes)
[21:10:30] <wikibugs>	 (03PS5) 10Dzahn: profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617690 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[21:16:51] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] profile::gerrit::server: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617690 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[21:18:46] <wikibugs>	 (03PS1) 10Andrew Bogott: cloudvirt100[4-9]: use hp raid recipe [puppet] - 10https://gerrit.wikimedia.org/r/618860
[21:19:22] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt100[4-9]: use hp raid recipe [puppet] - 10https://gerrit.wikimedia.org/r/618860 (owner: 10Andrew Bogott)
[21:24:37] <wikibugs>	 (03CR) 10Dzahn: "still all noop in prod" [puppet] - 10https://gerrit.wikimedia.org/r/617690 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[21:26:52] <wikibugs>	 (03PS1) 10Mholloway: Update wikifeeds to 2020-08-06-212118-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618861
[21:29:02] <wikibugs>	 (03CR) 10Dzahn: "the reason for it being "gerrit::server" and not just "gerrit" is historic. there used to be a time when we did not have the role/profile " [puppet] - 10https://gerrit.wikimedia.org/r/617691 (owner: 10Jbond)
[21:29:41] <wikibugs>	 (03PS2) 10Dzahn: profile::gerrit::server: rename profile [puppet] - 10https://gerrit.wikimedia.org/r/617691 (owner: 10Jbond)
[21:29:43] <wikibugs>	 (03CR) 10Mholloway: [C: 03+2] Update wikifeeds to 2020-08-06-212118-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618861 (owner: 10Mholloway)
[21:30:50] <wikibugs>	 (03Merged) 10jenkins-bot: Update wikifeeds to 2020-08-06-212118-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618861 (owner: 10Mholloway)
[21:32:41] <logmsgbot>	 !log mholloway-shell@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
[21:32:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:33:20] <logmsgbot>	 !log brennen@deploy1001 Synchronized php-1.36.0-wmf.3/vendor: [[gerrit:618850|Update git submodules (vendor)]] (T259832) (duration: 01m 08s)
[21:33:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:33:23] <stashbot>	 T259832: Deployed 1.36.0-wmf.3 does not have the 1.36.0-wmf.3 branch of mediawiki-vendor - https://phabricator.wikimedia.org/T259832
[21:34:13] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:34:19] <Amir1>	 beta on master is broken: https://en.wikipedia.beta.wmflabs.org/wiki/Main_Page
[21:34:57] <RhinosF1>	 Amir1: it's fine mobile site if that helps
[21:35:53] <logmsgbot>	 !log mholloway-shell@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
[21:35:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:39:09] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[21:39:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:39:39] <logmsgbot>	 !log mholloway-shell@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
[21:39:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:40:03] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] profile::gerrit::server: rename profile [puppet] - 10https://gerrit.wikimedia.org/r/617691 (owner: 10Jbond)
[21:40:11] <logmsgbot>	 !log andrew@cumin1001 START - Cookbook sre.hosts.downtime
[21:40:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:13] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[21:41:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:47] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:43:11] <logmsgbot>	 !log andrew@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[21:43:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:26:46] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review, 10User-Elukey: Reimage one memcached shard to Buster - https://phabricator.wikimedia.org/T252391 (10MMiller_WMF) @kostajh -- are you asking whether we should deactivate EditorJourney in all wikis, so as to stop it from recording data anywhere?  If so, I am fi...
[22:30:55] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=thanos&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All
[22:41:58] <wikibugs>	 (03PS1) 10Ladsgroup: Use a new page for mentor list in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618866 (https://phabricator.wikimedia.org/T253291)
[23:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening backport window(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200806T2300).
[23:00:05] <jouncebot>	 kaldari, RoanKattouw, and Amir1: A patch you scheduled for Evening backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:00:20] <Amir1>	 o/
[23:01:09] <RoanKattouw>	 I'll do the deployment
[23:01:32] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Use a new page for mentor list in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618866 (https://phabricator.wikimedia.org/T253291) (owner: 10Ladsgroup)
[23:01:44] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] WelcomeSurvey: Use autonyms for language question [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618806 (https://phabricator.wikimedia.org/T232410) (owner: 10Catrope)
[23:01:50] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] WelcomeSurvey: Reuse server-rendered language question field [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618807 (https://phabricator.wikimedia.org/T232410) (owner: 10Catrope)
[23:02:15] <wikibugs>	 (03Merged) 10jenkins-bot: Use a new page for mentor list in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618866 (https://phabricator.wikimedia.org/T253291) (owner: 10Ladsgroup)
[23:04:37] <logmsgbot>	 !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Change GrowthExperiments mentor list on fawiki (T253291) (duration: 00m 59s)
[23:04:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:04:42] <stashbot>	 T253291: Deploy Growth features on Persian Wikipedia - https://phabricator.wikimedia.org/T253291
[23:06:18] <RoanKattouw>	 Amir1: Yours is deployed. It's tricky to test so I didn't put it on the debug server first
[23:06:33] <Amir1>	 thanks. I was thinking of the same
[23:06:38] <RoanKattouw>	 To test, you'd create a new account or enable the homepage on an account that's never had it enabled before, then verify that it gets a mentor assignment
[23:09:01] <Amir1>	 Done and it works just fine
[23:09:03] <Amir1>	 https://usercontent.irccloud-cdn.com/file/KQZVRAGU/image.png
[23:09:40] <wikibugs>	 (03Merged) 10jenkins-bot: WelcomeSurvey: Use autonyms for language question [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618806 (https://phabricator.wikimedia.org/T232410) (owner: 10Catrope)
[23:10:53] <wikibugs>	 (03Merged) 10jenkins-bot: WelcomeSurvey: Reuse server-rendered language question field [extensions/GrowthExperiments] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618807 (https://phabricator.wikimedia.org/T232410) (owner: 10Catrope)
[23:16:27] <icinga-wm>	 PROBLEM - Check systemd state on aphlict1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:17:33] <wikibugs>	 (03PS1) 10Cwhite: prometheus: add default count all query [puppet] - 10https://gerrit.wikimedia.org/r/618869 (https://phabricator.wikimedia.org/T256418)
[23:17:57] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] prometheus: add default count all query [puppet] - 10https://gerrit.wikimedia.org/r/618869 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[23:19:54] <wikibugs>	 (03PS1) 10Cwhite: profile: update mediawiki errors query to count beyond the 10k limit [puppet] - 10https://gerrit.wikimedia.org/r/618870 (https://phabricator.wikimedia.org/T256418)
[23:20:11] <wikibugs>	 (03PS2) 10Cwhite: prometheus: add default count all query [puppet] - 10https://gerrit.wikimedia.org/r/618869 (https://phabricator.wikimedia.org/T256418)
[23:20:35] <wikibugs>	 (03PS2) 10Cwhite: profile: update mediawiki errors query to count beyond the 10k limit [puppet] - 10https://gerrit.wikimedia.org/r/618870 (https://phabricator.wikimedia.org/T256418)
[23:20:37] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile: update mediawiki errors query to count beyond the 10k limit [puppet] - 10https://gerrit.wikimedia.org/r/618870 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[23:21:26] <logmsgbot>	 !log catrope@deploy1001 Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/: Fixes for WelcomeSurvey language question (T232410) (duration: 00m 59s)
[23:21:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:21:29] <stashbot>	 T232410: Newcomer tasks: add language question to welcome survey - https://phabricator.wikimedia.org/T232410
[23:21:54] <wikibugs>	 (03PS3) 10Cwhite: prometheus: add default count all query [puppet] - 10https://gerrit.wikimedia.org/r/618869 (https://phabricator.wikimedia.org/T256418)
[23:29:05] <icinga-wm>	 PROBLEM - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[23:30:41] <icinga-wm>	 PROBLEM - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[23:42:33] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1006 is CRITICAL: connect to address 10.64.20.24 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:42:41] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1006 is CRITICAL: connect to address 10.64.20.24 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:42:55] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1006 is CRITICAL: connect to address 10.64.20.24 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:43:51] <icinga-wm>	 PROBLEM - ensure kvm processes are running on cloudvirt1004 is CRITICAL: connect to address 10.64.20.22 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:43:59] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1004 is CRITICAL: connect to address 10.64.20.22 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:44:55] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1004 is CRITICAL: connect to address 10.64.20.22 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:46:23] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on cloudvirt1006 is CRITICAL: connect to address 10.64.20.24 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP
[23:50:37] <icinga-wm>	 ACKNOWLEDGEMENT - Check the NTP synchronisation status of timesyncd on cloudvirt1004 is CRITICAL: connect to address 10.64.20.22 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/NTP
[23:50:37] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on cloudvirt1004 is CRITICAL: connect to address 10.64.20.22 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/Microcode
[23:50:40] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1004 is CRITICAL: connect to address 10.64.20.22 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:50:41] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc maximum on cloudvirt1004 is CRITICAL: connect to address 10.64.20.22 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:50:42] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1004 is CRITICAL: connect to address 10.64.20.22 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:50:43] <icinga-wm>	 ACKNOWLEDGEMENT - Check the NTP synchronisation status of timesyncd on cloudvirt1006 is CRITICAL: connect to address 10.64.20.24 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/NTP
[23:50:44] <icinga-wm>	 ACKNOWLEDGEMENT - Check whether microcode mitigations for CPU vulnerabilities are applied on cloudvirt1006 is CRITICAL: connect to address 10.64.20.24 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/Microcode
[23:50:48] <icinga-wm>	 ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1006 is CRITICAL: connect to address 10.64.20.24 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:50:49] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc maximum on cloudvirt1006 is CRITICAL: connect to address 10.64.20.24 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:50:50] <icinga-wm>	 ACKNOWLEDGEMENT - nova-compute proc minimum on cloudvirt1006 is CRITICAL: connect to address 10.64.20.24 port 5666: Connection refused andrew bogott this is a phantom from reimaging https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[23:52:02] <icinga-wm>	 RECOVERY - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[23:52:22] <icinga-wm>	 RECOVERY - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops