[00:00:11] <wikibugs>	 (03PS1) 10Dzahn: releases: use --delete when rsyncing files between servers [puppet] - 10https://gerrit.wikimedia.org/r/618411
[00:03:41] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/24314/releases1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/618411 (owner: 10Dzahn)
[00:03:53] <wikibugs>	 (03PS1) 10Dzahn: ATS: switch releases.wm to new buster backend servers [dns] - 10https://gerrit.wikimedia.org/r/618412
[00:04:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ATS: switch releases.wm to new buster backend servers [dns] - 10https://gerrit.wikimedia.org/r/618412 (owner: 10Dzahn)
[00:04:18] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[00:05:51] <wikibugs>	 (03PS2) 10Dzahn: ATS: switch releases.wm to new buster backend servers [dns] - 10https://gerrit.wikimedia.org/r/618412
[00:20:03] <wikibugs>	 10Operations, 10VPS-Projects, 10Wikimedia-Mailing-lists, 10User-Ladsgroup, and 2 others: Request for creating a DNS record for lists.wmcloud.org to 185.15.56.28 - https://phabricator.wikimedia.org/T259444 (10Ladsgroup) 05Open→03Resolved This is done, thanks!
[00:20:07] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Setup Mailman3 in Cloud VPS - https://phabricator.wikimedia.org/T258365 (10Ladsgroup)
[00:21:00] <wikibugs>	 (03PS1) 10Dzahn: httpbb: add test file for releases.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/618415
[00:42:00] <wikibugs>	 (03PS1) 10Bstorm: haproxy-galera: Make a meaningful healthcheck [puppet] - 10https://gerrit.wikimedia.org/r/618418
[00:42:41] <wikibugs>	 (03PS2) 10Bstorm: haproxy-galera: Make a meaningful healthcheck [puppet] - 10https://gerrit.wikimedia.org/r/618418
[00:43:58] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] haproxy-galera: Make a meaningful healthcheck [puppet] - 10https://gerrit.wikimedia.org/r/618418 (owner: 10Bstorm)
[00:44:02] <wikibugs>	 (03CR) 10Bstorm: "This is all assuming we don't use the tcp haproxy listen blocks for anything except mysql.  If that's not true, this needs a bit more work" [puppet] - 10https://gerrit.wikimedia.org/r/618418 (owner: 10Bstorm)
[00:45:44] <wikibugs>	 (03PS3) 10Bstorm: haproxy-galera: Make a meaningful healthcheck [puppet] - 10https://gerrit.wikimedia.org/r/618418
[00:52:04] <wikibugs>	 (03PS4) 10Bstorm: haproxy-galera: Make a meaningful healthcheck [puppet] - 10https://gerrit.wikimedia.org/r/618418
[00:55:24] <wikibugs>	 (03CR) 10Bstorm: "On-server testing show that it is correct for the current state of cloudcontrol1004:" [puppet] - 10https://gerrit.wikimedia.org/r/618418 (owner: 10Bstorm)
[00:58:30] <wikibugs>	 (03PS6) 10CRusnov: rotatedump: Enhance to retain period copies [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512)
[01:00:12] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[01:01:22] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[01:02:41] <wikibugs>	 (03CR) 10Bstorm: "I do question if I want those shell options at the start of the script. I want it to generally terminate with an http response. eqiad PCC:" [puppet] - 10https://gerrit.wikimedia.org/r/618418 (owner: 10Bstorm)
[01:13:12] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 50 probes of 571 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[01:25:22] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1
[01:48:36] <icinga-wm>	 PROBLEM - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[01:48:58] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[01:53:12] <icinga-wm>	 PROBLEM - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[02:13:48] <icinga-wm>	 RECOVERY - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[02:14:28] <icinga-wm>	 RECOVERY - Prometheus prometheus1003/ops restarted: beware possible monitoring artifacts on prometheus1003 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[03:02:42] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:27:50] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:27:57] <wikibugs>	 10Operations, 10Parsoid, 10serviceops, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ssastry) >>! In T257906#6330994, @Dzahn wrote: > Merging the change above was  a noop on scandium. I did not manuall...
[03:31:42] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:45:12] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:50:54] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:16:09] <wikibugs>	 10Operations, 10Platform Engineering, 10Release Pipeline, 10Release-Engineering-Team-TODO, and 6 others: Kask functional testing with Cassandra via the Deployment Pipeline - https://phabricator.wikimedia.org/T224041 (10jeena) Hmm, I tried to deploy again but still couldn't. I would be happy to help with up...
[04:16:12] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:23:58] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[04:33:46] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12163 and previous config saved to /var/cache/conftool/dbconfig/20200805-043346-marostegui.json
[04:33:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:46:02] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Reimage db1132 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/618427 (https://phabricator.wikimedia.org/T259589)
[04:53:35] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12164 and previous config saved to /var/cache/conftool/dbconfig/20200805-045334-marostegui.json
[04:53:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:03:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12165 and previous config saved to /var/cache/conftool/dbconfig/20200805-050308-marostegui.json
[05:03:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:08:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P12166 and previous config saved to /var/cache/conftool/dbconfig/20200805-050808-marostegui.json
[05:08:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:09:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1094 for MCR', diff saved to https://phabricator.wikimedia.org/P12167 and previous config saved to /var/cache/conftool/dbconfig/20200805-050907-marostegui.json
[05:09:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:23:48] <wikibugs>	 (03CR) 10Hashar: "Can we get stretch-backports removed from the stretch base image or is this change pending something else?  I am a few changes for CI imag" [puppet] - 10https://gerrit.wikimedia.org/r/610050 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff)
[05:25:48] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:29:44] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:38:54] <wikibugs>	 (03PS3) 10Hashar: Stop including backports in Stretch images [puppet] - 10https://gerrit.wikimedia.org/r/610050 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff)
[05:39:21] <wikibugs>	 (03CR) 10Hashar: "Rebased to fix a trivial merge conflict with Ic2b5bfb122ad9d0fc7f4e404f639d9b71114691f" [puppet] - 10https://gerrit.wikimedia.org/r/610050 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff)
[05:44:16] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 238, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:44:22] <icinga-wm>	 PROBLEM - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[05:45:16] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:51:08] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[05:53:16] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.prepare-upgrade
[05:53:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:13:42] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 240, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:17:38] <icinga-wm>	 RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:21:22] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 130, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:21:26] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 238, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:29:06] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 132, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:29:10] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 240, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:40:39] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wikireplica_dns.yaml: Depool dbproxy1018 [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[06:41:04] <icinga-wm>	 PROBLEM - Check the last execution of generate-mysqld-exporter-config on prometheus1004 is CRITICAL: connect to address 10.64.16.38 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[06:42:20] <icinga-wm>	 PROBLEM - Check systemd state on prometheus1004 is CRITICAL: connect to address 10.64.16.38 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:42:28] <icinga-wm>	 PROBLEM - Check size of conntrack table on prometheus1004 is CRITICAL: connect to address 10.64.16.38 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:43:38] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Enable the service proxy on termbox in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/618317 (owner: 10Giuseppe Lavagetto)
[06:45:08] <icinga-wm>	 PROBLEM - puppet last run on prometheus1004 is CRITICAL: connect to address 10.64.16.38 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:46:11] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
[06:46:11] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
[06:46:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:46:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:47:10] <icinga-wm>	 PROBLEM - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[06:48:12] <icinga-wm>	 RECOVERY - Check systemd state on prometheus1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:48:18] <icinga-wm>	 RECOVERY - Check size of conntrack table on prometheus1004 is OK: OK: nf_conntrack is 3 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:48:56] <wikibugs>	 10Operations, 10netops: Make eqord its own AS - https://phabricator.wikimedia.org/T259593 (10ayounsi)
[06:50:51] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
[06:50:51] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
[06:50:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:50:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:51:04] <icinga-wm>	 RECOVERY - puppet last run on prometheus1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:51:58] <icinga-wm>	 RECOVERY - Check the last execution of generate-mysqld-exporter-config on prometheus1004 is OK: OK: Status of the systemd unit generate-mysqld-exporter-config https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers
[06:52:25] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
[06:52:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:59:12] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
[06:59:12] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
[06:59:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:59:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:01:40] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: termbox/staging: rollback the configuration, it clearly doesn't work. [deployment-charts] - 10https://gerrit.wikimedia.org/r/618479
[07:01:42] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] termbox/staging: rollback the configuration, it clearly doesn't work. [deployment-charts] - 10https://gerrit.wikimedia.org/r/618479 (owner: 10Giuseppe Lavagetto)
[07:04:51] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
[07:04:51] <logmsgbot>	 !log oblivian@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
[07:04:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:04:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:07:36] <wikibugs>	 (03PS1) 10Ayounsi: Depool ulsfo for routers upgrade [dns] - 10https://gerrit.wikimedia.org/r/618480 (https://phabricator.wikimedia.org/T259621)
[07:08:26] <icinga-wm>	 RECOVERY - Prometheus prometheus1004/ops restarted: beware possible monitoring artifacts on prometheus1004 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqiad+prometheus/ops
[07:12:47] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Reimage db1132 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/618427 (https://phabricator.wikimedia.org/T259589) (owner: 10Marostegui)
[07:13:42] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.prepare-upgrade
[07:13:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:14:10] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.network.prepare-upgrade
[07:14:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:14:12] <wikibugs>	 (03PS1) 10Awight: FileImporter: full default deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618481 (https://phabricator.wikimedia.org/T232542)
[07:18:22] <wikibugs>	 (03CR) 10Elukey: "Keith one question - shouldn't we add the hiera config for monitoring_buster in this patch, to see if the new instances comes up fine etc." [puppet] - 10https://gerrit.wikimedia.org/r/618359 (https://phabricator.wikimedia.org/T252773) (owner: 10Herron)
[07:20:15] <moritzm>	 !log installing libexif security updates on buster
[07:20:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:20:57] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] Additional prefixes for sdoc for wcqs [puppet] - 10https://gerrit.wikimedia.org/r/618237 (https://phabricator.wikimedia.org/T258625) (owner: 10ZPapierski)
[07:21:04] <wikibugs>	 (03PS1) 10Awight: Remove deprecated setting [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618482 (https://phabricator.wikimedia.org/T232542)
[07:26:30] <moritzm>	 !log installing perl security updates on buster
[07:26:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:27:48] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime
[07:27:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:29:58] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[07:29:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:36:46] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: lowercase alerts annotations [puppet] - 10https://gerrit.wikimedia.org/r/618284 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[07:36:51] <wikibugs>	 (03PS2) 10Filippo Giunchedi: prometheus: lowercase alerts annotations [puppet] - 10https://gerrit.wikimedia.org/r/618284 (https://phabricator.wikimedia.org/T258948)
[07:38:30] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "Nice! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/617260 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[07:39:26] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
[07:39:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:40:26] <wikibugs>	 (03PS4) 10JMeybohm: helm-diff: New upstream version 3.1.2 [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618314 (https://phabricator.wikimedia.org/T258572)
[07:41:40] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, CC'ing Alex as a FYI" [puppet] - 10https://gerrit.wikimedia.org/r/618388 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite)
[07:42:27] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/618345 (https://phabricator.wikimedia.org/T247966) (owner: 10Herron)
[07:43:54] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] blubberoid: remove out-dated repositories definition [deployment-charts] - 10https://gerrit.wikimedia.org/r/618347 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[07:45:04] <wikibugs>	 (03Merged) 10jenkins-bot: blubberoid: remove out-dated repositories definition [deployment-charts] - 10https://gerrit.wikimedia.org/r/618347 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[07:45:11] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
[07:45:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:15] <marostegui>	 !log Stop mysql on db1117:3323 (this will generate haproxy irc alerts) T259589 
[07:49:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:49:17] <stashbot>	 T259589: Upgrade m3 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T259589
[07:49:34] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Packaging is fine. A couple minor comments on the control file, but it's good as-is otherwise." (032 comments) [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618314 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[07:49:48] <wikibugs>	 10Operations, 10serviceops, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10JMeybohm)
[07:54:53] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1016 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[07:55:26] <godog>	 ebernhardson: looks like prometheus is failing to scrape metrics from mjolnir, I'm assuming due to the deploy yesterday
[07:55:54] <marostegui>	 haproxy is me, as announced
[07:56:52] <godog>	 ebernhardson: or rather, prometheus is expecting to find mjolnir on all elastic instances but atm only ~2% of elastic hosts can have mjolnir metrics scraped, https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets?orgId=1
[07:57:42] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Revert "ATS: force cache revalidation on a few wikis" [puppet] - 10https://gerrit.wikimedia.org/r/618294 (https://phabricator.wikimedia.org/T256750) (owner: 10Ema)
[07:58:45] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Fix the copyright, otherwise LGTM" (032 comments) [debs/helmfile] - 10https://gerrit.wikimedia.org/r/618273 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[08:02:19] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] helm: Replace repo update cronjob by systemd timer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618350 (owner: 10JMeybohm)
[08:02:31] <wikibugs>	 (03PS5) 10JMeybohm: helmfile: New upstream version 0.125.2 [debs/helmfile] - 10https://gerrit.wikimedia.org/r/618273 (https://phabricator.wikimedia.org/T258572)
[08:02:39] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1020 is CRITICAL: CRITICAL check_failover servers up 1 down 1 https://wikitech.wikimedia.org/wiki/HAProxy
[08:03:01] <wikibugs>	 (03PS5) 10JMeybohm: helm-diff: New upstream version 3.1.2 [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618314 (https://phabricator.wikimedia.org/T258572)
[08:03:08] <marostegui>	 haproxy expected
[08:03:46] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] helm: Replace repo update cronjob by systemd timer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618350 (owner: 10JMeybohm)
[08:07:01] <wikibugs>	 (03CR) 10JMeybohm: helm: Replace repo update cronjob by systemd timer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618350 (owner: 10JMeybohm)
[08:08:18] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: mobileapps: Switch conftool to kubernetes/kubesvc [puppet] - 10https://gerrit.wikimedia.org/r/618485 (https://phabricator.wikimedia.org/T218733)
[08:08:19] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: mobileapps: Remove from scb conftool config [puppet] - 10https://gerrit.wikimedia.org/r/618486 (https://phabricator.wikimedia.org/T218733)
[08:08:21] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: mobileapps: Remove mobileapps from scb [puppet] - 10https://gerrit.wikimedia.org/r/618487 (https://phabricator.wikimedia.org/T218733)
[08:08:25] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: mobileapps: Remove the profile and the role [puppet] - 10https://gerrit.wikimedia.org/r/618488 (https://phabricator.wikimedia.org/T218733)
[08:08:28] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] helm-diff: New upstream version 3.1.2 (032 comments) [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618314 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[08:09:17] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] helmfile: New upstream version 0.125.2 (032 comments) [debs/helmfile] - 10https://gerrit.wikimedia.org/r/618273 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[08:11:15] <wikibugs>	 (03Merged) 10jenkins-bot: helm-diff: New upstream version 3.1.2 [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618314 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[08:11:16] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] mobileapps: Switch conftool to kubernetes/kubesvc [puppet] - 10https://gerrit.wikimedia.org/r/618485 (https://phabricator.wikimedia.org/T218733) (owner: 10Alexandros Kosiaris)
[08:12:20] <wikibugs>	 (03Merged) 10jenkins-bot: helmfile: New upstream version 0.125.2 [debs/helmfile] - 10https://gerrit.wikimedia.org/r/618273 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[08:12:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12169 and previous config saved to /var/cache/conftool/dbconfig/20200805-081237-marostegui.json
[08:12:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:53] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] varnishmtail: check if varnishncsa is still running [puppet] - 10https://gerrit.wikimedia.org/r/618308 (https://phabricator.wikimedia.org/T259020) (owner: 10Ema)
[08:15:10] <wikibugs>	 (03CR) 10Ema: [C: 03+2] varnishmtail: check if varnishncsa is still running [puppet] - 10https://gerrit.wikimedia.org/r/618308 (https://phabricator.wikimedia.org/T259020) (owner: 10Ema)
[08:21:39] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12170 and previous config saved to /var/cache/conftool/dbconfig/20200805-082138-marostegui.json
[08:21:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:26:51] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:29:08] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12171 and previous config saved to /var/cache/conftool/dbconfig/20200805-082908-marostegui.json
[08:29:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:15] <wikibugs>	 (03PS1) 10Elukey: profile::prometheus::ops: change mjolnir's target classes [puppet] - 10https://gerrit.wikimedia.org/r/618493 (https://phabricator.wikimedia.org/T258245)
[08:31:49] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[08:32:49] <wikibugs>	 (03PS8) 10Jdrewniak: Switch test wikis to new version of vector by default (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/614891 (https://phabricator.wikimedia.org/T254227) (owner: 10Jdlrobson)
[08:32:54] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/618493 (https://phabricator.wikimedia.org/T258245) (owner: 10Elukey)
[08:34:38] <wikibugs>	 (03CR) 10DCausse: [C: 03+1] profile::prometheus::ops: change mjolnir's target classes [puppet] - 10https://gerrit.wikimedia.org/r/618493 (https://phabricator.wikimedia.org/T258245) (owner: 10Elukey)
[08:34:40] <wikibugs>	 (03PS1) 10Marostegui: db1132: Set binlog format to ROW [puppet] - 10https://gerrit.wikimedia.org/r/618494 (https://phabricator.wikimedia.org/T259589)
[08:34:53] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::prometheus::ops: change mjolnir's target classes [puppet] - 10https://gerrit.wikimedia.org/r/618493 (https://phabricator.wikimedia.org/T258245) (owner: 10Elukey)
[08:35:22] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1132: Set binlog format to ROW [puppet] - 10https://gerrit.wikimedia.org/r/618494 (https://phabricator.wikimedia.org/T259589) (owner: 10Marostegui)
[08:37:09] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1016 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[08:37:09] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1020 is OK: OK check_failover servers up 2 down 0 https://wikitech.wikimedia.org/wiki/HAProxy
[08:37:40] <wikibugs>	 10Operations, 10Parsoid, 10serviceops, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10MoritzMuehlenhoff) >>! In T257906#6361874, @ssastry wrote: > testreduce codebase is used for regular roundtrip testi...
[08:38:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P12172 and previous config saved to /var/cache/conftool/dbconfig/20200805-083833-marostegui.json
[08:38:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:17] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1079 for MCR', diff saved to https://phabricator.wikimedia.org/P12173 and previous config saved to /var/cache/conftool/dbconfig/20200805-083916-marostegui.json
[08:39:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:47] <marostegui>	 !log Stop replication on db1079 for MCR, this will generate lag on s7 on labsdb
[08:39:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:39:57] <marostegui>	 !log Remove revision triggers on db1125:3317
[08:39:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:43:21] <wikibugs>	 (03PS1) 10Muehlenhoff: Add Chad to ldap_only_users now that shell access has been removed [puppet] - 10https://gerrit.wikimedia.org/r/618496
[08:45:24] <wikibugs>	 (03PS1) 10Marostegui: install_server: Do not reimage db1132 [puppet] - 10https://gerrit.wikimedia.org/r/618497
[08:46:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/618488 (https://phabricator.wikimedia.org/T218733) (owner: 10Alexandros Kosiaris)
[08:46:24] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and nda groups for edtadros - https://phabricator.wikimedia.org/T256435 (10akosiaris) @Jrbranaa Ping?
[08:46:37] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db1132 [puppet] - 10https://gerrit.wikimedia.org/r/618497 (owner: 10Marostegui)
[08:49:02] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add Chad to ldap_only_users now that shell access has been removed [puppet] - 10https://gerrit.wikimedia.org/r/618496 (owner: 10Muehlenhoff)
[08:51:17] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:01:45] <wikibugs>	 (03CR) 10Marostegui: "After merging I realised that Puppet is disabled on all the eqiad cloudcontrol hosts, how should I proceed?" [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[09:05:50] <jayme>	 !log imported helmfile_0.125.2-0 to buster-wikimedia
[09:05:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:07:20] <jayme>	 !log imported helmfile_0.125.2-0 to stretch-wikimedia
[09:07:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:07:27] <jayme>	 !log imported helmfile_0.125.2-0 to jessie-wikimedia
[09:07:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:13:04] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: admin: Add cdunn to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/618500 (https://phabricator.wikimedia.org/T259615)
[09:15:37] <wikibugs>	 (03PS1) 10JMeybohm: Add postinst to clean up after old package version [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618501 (https://phabricator.wikimedia.org/T258572)
[09:16:24] <wikibugs>	 (03CR) 10JMeybohm: "Package has not been build by now, so I did not increment the debian version." [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618501 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[09:17:23] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] admin: Add cdunn to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/618500 (https://phabricator.wikimedia.org/T259615) (owner: 10Alexandros Kosiaris)
[09:18:24] <wikibugs>	 (03PS1) 10Elukey: kerberos: set ticket renew lifetime to 7d [puppet] - 10https://gerrit.wikimedia.org/r/618502
[09:19:12] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Add postinst to clean up after old package version [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618501 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[09:19:55] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Add postinst to clean up after old package version [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618501 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[09:22:23] <wikibugs>	 (03Merged) 10jenkins-bot: Add postinst to clean up after old package version [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618501 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[09:22:27] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Init retry_count at each collection [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/618504 (https://phabricator.wikimedia.org/T258948)
[09:22:29] <wikibugs>	 (03PS1) 10Filippo Giunchedi: Add support for exposing Icinga problems as metrics [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/618505 (https://phabricator.wikimedia.org/T258948)
[09:23:40] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/618502 (owner: 10Elukey)
[09:24:05] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Add Carol Dunn to the wmf LDAP group - https://phabricator.wikimedia.org/T259615 (10akosiaris) 05Open→03Resolved p:05Triage→03Medium a:03akosiaris
[09:24:20] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Add Carol Dunn to the wmf LDAP group - https://phabricator.wikimedia.org/T259615 (10akosiaris) Done. Resolving, feel free to reopen
[09:25:08] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] mobileapps: Remove from scb conftool config [puppet] - 10https://gerrit.wikimedia.org/r/618486 (https://phabricator.wikimedia.org/T218733) (owner: 10Alexandros Kosiaris)
[09:25:47] <wikibugs>	 (03PS2) 10Filippo Giunchedi: Add support for exposing Icinga problems as metrics [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/618505 (https://phabricator.wikimedia.org/T258948)
[09:26:46] <wikibugs>	 (03PS3) 10Jbond: profile::gerrit::migrations: correct hiera name space [puppet] - 10https://gerrit.wikimedia.org/r/617676 (https://phabricator.wikimedia.org/T247956)
[09:27:04] <wikibugs>	 (03CR) 10Jbond: "> Patch Set 2: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/617676 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond)
[09:27:40] <jynus>	 https://mysqlserverteam.com/mysql-shell-dump-load-part-2-benchmarks/
[09:27:48] <jynus>	 sorry, wrong channel
[09:28:18] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] kerberos: set ticket renew lifetime to 7d [puppet] - 10https://gerrit.wikimedia.org/r/618502 (owner: 10Elukey)
[09:31:24] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] mobileapps: Remove mobileapps from scb [puppet] - 10https://gerrit.wikimedia.org/r/618487 (https://phabricator.wikimedia.org/T218733) (owner: 10Alexandros Kosiaris)
[09:32:39] <elukey>	 !log set ticket max renewable lifetime to 7d on all kerberos clients (was zero, the default)
[09:32:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:34:35] <wikibugs>	 (03CR) 10Awight: [C: 03+1] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/618390 (https://phabricator.wikimedia.org/T259254) (owner: 10BryanDavis)
[09:34:59] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/618352 (https://phabricator.wikimedia.org/T253843) (owner: 10JMeybohm)
[09:36:34] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "> Patch Set 1: Code-Review-2" [puppet] - 10https://gerrit.wikimedia.org/r/618379 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn)
[09:36:38] <wikibugs>	 (03CR) 10Filippo Giunchedi: "We'll need to do the same for host status as well" [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/618505 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[09:42:14] <wikibugs>	 10Operations, 10Traffic: Generate ATS cache.config from software-agnostic data structures - https://phabricator.wikimedia.org/T259692 (10ema)
[09:42:20] <wikibugs>	 10Operations, 10Traffic: Generate ATS cache.config from software-agnostic data structures - https://phabricator.wikimedia.org/T259692 (10ema) p:05Triage→03Medium
[09:42:41] <wikibugs>	 (03PS1) 10Elukey: druid: puppet cleanup after upgrading all clusters to 0.19 [puppet] - 10https://gerrit.wikimedia.org/r/618506 (https://phabricator.wikimedia.org/T244482)
[09:50:41] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] druid: puppet cleanup after upgrading all clusters to 0.19 [puppet] - 10https://gerrit.wikimedia.org/r/618506 (https://phabricator.wikimedia.org/T244482) (owner: 10Elukey)
[09:52:24] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Depool ulsfo for routers upgrade [dns] - 10https://gerrit.wikimedia.org/r/618480 (https://phabricator.wikimedia.org/T259621) (owner: 10Ayounsi)
[09:53:11] <XioNoX>	 !log depool ulsfo - T259621
[09:53:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:34] <wikibugs>	 10Operations, 10observability: db1082 failed on Jul 18th and 25th, however on the 25th pages didn't go out to VO/phones - https://phabricator.wikimedia.org/T259465 (10fgiunchedi) >>! In T259465#6355942, @fgiunchedi wrote: > A reminder might work! We'll be inquiring VO about that possibility e.g. via email when...
[09:58:15] <XioNoX>	 !log drain traffic away cr4-ulsfo
[09:58:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:58:41] <wikibugs>	 (03Abandoned) 10Elukey: Remove AAAA/PTR records for db1108 [dns] - 10https://gerrit.wikimedia.org/r/617064 (https://phabricator.wikimedia.org/T234826) (owner: 10Elukey)
[10:07:07] <wikibugs>	 (03PS6) 10Elukey: Add custom ferm srange to Kafka Jumbo brokers [puppet] - 10https://gerrit.wikimedia.org/r/611168 (https://phabricator.wikimedia.org/T204957)
[10:15:20] <wikibugs>	 (03PS3) 10Filippo Giunchedi: Add support for exposing Icinga problems as metrics [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/618505 (https://phabricator.wikimedia.org/T258948)
[10:18:41] <XioNoX>	 !log reboot cr4-ulsfo - T259621
[10:18:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:07] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 132, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:21:14] <moritzm>	 !log installing libssh security updates
[10:21:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:27] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[10:21:41] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[10:22:25] <XioNoX>	 router looks back up on the console
[10:25:28] <wikibugs>	 (03PS1) 10Hnowlan: api-gateway: change deployment to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618512 (https://phabricator.wikimedia.org/T254906)
[10:25:37] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 134, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:25:59] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[10:26:19] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[10:27:38] <XioNoX>	 alright cr4 is all good
[10:27:41] <XioNoX>	 cr3 now
[10:28:28] <XioNoX>	 !log drain traffic away cr3-ulsfo - T259621
[10:28:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:26] <wikibugs>	 (03PS1) 10Kormat: Split utilities into separate packages [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513
[10:29:52] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Split utilities into separate packages [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513 (owner: 10Kormat)
[10:34:31] <wikibugs>	 (03PS2) 10Kormat: Split utilities into separate packages [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513
[10:36:55] <icinga-wm>	 PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 44 probes of 655 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:39:27] <XioNoX>	 !log reboot cr3-ulsfo - T259621
[10:39:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:42:33] <wikibugs>	 (03PS3) 10Kormat: Split utilities into separate packages [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513
[10:42:37] <icinga-wm>	 RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 11 probes of 655 (alerts on 35) - https://atlas.ripe.net/measurements/1791307/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[10:43:05] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 52, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:43:37] <wikibugs>	 (03CR) 10Kormat: "Jcrespo: this is what we talked about yesterday" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513 (owner: 10Kormat)
[10:45:04] <wikibugs>	 (03CR) 10Kormat: "Note: i'm not planning to do a release just yet, hence UNRELEASED in the changelog." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513 (owner: 10Kormat)
[10:46:37] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 54, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[10:53:30] <XioNoX>	 all good, removing downtimes
[10:59:38] <Lucas_WMDE>	 jouncebot: refresh
[10:59:39] <jouncebot>	 I refreshed my knowledge about deployments.
[11:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor I � Unicode. All rise for European mid-day backport window(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200805T1100).
[11:00:04] <jouncebot>	 Lucas_WMDE, DannyS712, awight, and jan_drewniak: A patch you scheduled for European mid-day backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:12] <Lucas_WMDE>	 o/
[11:00:23] <jan_drewniak>	 o/
[11:00:50] <Lucas_WMDE>	 I’ll start with my config change
[11:01:01] <icinga-wm>	 RECOVERY - PHP opcache health on mwdebug1002 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health
[11:01:05] <wikibugs>	 (03PS6) 10Lucas Werkmeister (WMDE): Enable Data Bridge on Test Wikidata clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/595542 (https://phabricator.wikimedia.org/T232584)
[11:01:19] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Enable Data Bridge on Test Wikidata clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/595542 (https://phabricator.wikimedia.org/T232584) (owner: 10Lucas Werkmeister (WMDE))
[11:02:04] <wikibugs>	 (03Merged) 10jenkins-bot: Enable Data Bridge on Test Wikidata clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/595542 (https://phabricator.wikimedia.org/T232584) (owner: 10Lucas Werkmeister (WMDE))
[11:02:36] <Lucas_WMDE>	 testing on mwdebug1001
[11:04:43] <Lucas_WMDE>	 hm, action=wbformatreference isn’t available on testwikidatawiki
[11:05:53] <Lucas_WMDE>	 ah, because it’s a client module ^^
[11:05:58] <Lucas_WMDE>	 it is available on testwiki, so that’s fine
[11:08:34] <Lucas_WMDE>	 data bridge works \o/
[11:08:37] <Lucas_WMDE>	 syncing
[11:09:10] <awight>	 Lucas_WMDE: I'm happy to take over for this next one-liner, or for my patch later...
[11:09:22] <Lucas_WMDE>	 sure, if you want
[11:09:32] <awight>	 :-) Enjoy the 5 minutes off
[11:09:43] <wikibugs>	 (03PS3) 10Awight: Add import sources for lijwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618303 (https://phabricator.wikimedia.org/T259633) (owner: 10DannyS712)
[11:10:13] <Lucas_WMDE>	 good luck with the FileImporter deploy! big change :)
[11:10:26] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:595542|Enable Data Bridge on Test Wikidata clients (T232584)]] (duration: 01m 20s)
[11:10:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:29] <stashbot>	 T232584: Step 1: Production deployment checklist - https://phabricator.wikimedia.org/T232584
[11:10:32] <Lucas_WMDE>	 ^ awight: go ahead
[11:10:43] <awight>	 Lucas_WMDE: ack
[11:11:19] <wikibugs>	 (03CR) 10Awight: [C: 03+2] "Bacon deploying.  Thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618303 (https://phabricator.wikimedia.org/T259633) (owner: 10DannyS712)
[11:12:05] <wikibugs>	 (03Merged) 10jenkins-bot: Add import sources for lijwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618303 (https://phabricator.wikimedia.org/T259633) (owner: 10DannyS712)
[11:12:41] <wikibugs>	 (03PS9) 10Jdrewniak: Switch test wikis to new version of vector by default (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/614891 (https://phabricator.wikimedia.org/T254227) (owner: 10Jdlrobson)
[11:13:58] <logmsgbot>	 !log awight@deploy1001 sync-file aborted: Config: [[gerrit:618303|Add import sources for lijwikisource (T259633)]] (duration: 00m 13s)
[11:14:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:14:02] <stashbot>	 T259633: Add import sources for lijwikisource - https://phabricator.wikimedia.org/T259633
[11:14:25] <awight>	 DannyS712: Sorry, I decided to test this at the last moment.
[11:15:56] <awight>	 DannyS712: Okay, your patch is live on mwdebug1001.
[11:17:23] <Lucas_WMDE>	 is DannyS712 here? I haven’t seen an o/ yet
[11:17:46] <wikibugs>	 (03CR) 10Ladsgroup: Turn muswiki and mhwiktionary to read-only (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618089 (https://phabricator.wikimedia.org/T259004) (owner: 10Urbanecm)
[11:18:13] <awight>	 I'm rusty at this, anyway.  Import requires special permissions, I'll just be satisfied that the site doesn't explode.
[11:19:46] <logmsgbot>	 !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:618303|Add import sources for lijwikisource (T259633)]] (duration: 01m 07s)
[11:19:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:19:49] <stashbot>	 T259633: Add import sources for lijwikisource - https://phabricator.wikimedia.org/T259633
[11:20:32] <wikibugs>	 (03PS2) 10Awight: FileImporter: full default deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618481 (https://phabricator.wikimedia.org/T232542)
[11:20:49] <wikibugs>	 (03CR) 10Awight: [C: 03+2] "Bacon deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618481 (https://phabricator.wikimedia.org/T232542) (owner: 10Awight)
[11:21:39] <wikibugs>	 (03Merged) 10jenkins-bot: FileImporter: full default deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618481 (https://phabricator.wikimedia.org/T232542) (owner: 10Awight)
[11:22:09] <jayme>	 !log imported helm-diff_3.1.2-0 to buster-wikimedia
[11:22:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:23:14] <jayme>	 !log imported helm-diff_3.1.2-0 to jessie-wikimedia and stretch-wikimedia
[11:23:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:26:47] <logmsgbot>	 !log awight@deploy1001 Synchronized wmf-config: Config: [[gerrit:618481|FileImporter: full default deployment (T232542)]] (duration: 01m 04s)
[11:26:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:26:50] <stashbot>	 T232542: [Deployment] FileImporter / FileExporter full default - https://phabricator.wikimedia.org/T232542
[11:28:36] <awight>	 !log EU Bacon complete
[11:28:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:57] <jan_drewniak>	 hey, there's still my patch!
[11:29:17] <awight>	 !log EU Bacon reopened
[11:29:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:29:25] <awight>	 jan_drewniak: So sorry, I just saw it.
[11:29:36] <jan_drewniak>	 late addition :P
[11:29:43] <awight>	 jan_drewniak: Helpful if I deploy?
[11:29:59] <jan_drewniak>	 awight: sure!
[11:30:03] <awight>	 ack
[11:30:28] <wikibugs>	 (03PS10) 10Awight: Switch test wikis to new version of vector by default (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/614891 (https://phabricator.wikimedia.org/T254227) (owner: 10Jdlrobson)
[11:30:51] <wikibugs>	 (03CR) 10Awight: [C: 03+2] "Bacon deployment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/614891 (https://phabricator.wikimedia.org/T254227) (owner: 10Jdlrobson)
[11:31:22] <wikibugs>	 (03Merged) 10jenkins-bot: Switch test wikis to new version of vector by default (3/3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/614891 (https://phabricator.wikimedia.org/T254227) (owner: 10Jdlrobson)
[11:32:44] <awight>	 jan_drewniak: The config is live on mwdebug1001, if you'd like to test
[11:32:59] <jan_drewniak>	 Ok I'll test it now
[11:33:59] <jan_drewniak>	 awight: alrighty, look good! 
[11:34:05] <awight>	 jan_drewniak: Thanks :-)
[11:36:02] <logmsgbot>	 !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:614891|Switch test wikis to new version of vector by default (3/3) (T254227)]] (duration: 01m 07s)
[11:36:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:36:04] <stashbot>	 T254227: Switch test wikis to new version of vector by default - https://phabricator.wikimedia.org/T254227
[11:36:21] <awight>	 !log EU Bacon reclosed
[11:36:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:36:50] <jan_drewniak>	 awight: thanks! 
[11:37:17] <awight>	 My pleasure!
[12:09:29] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618090 (https://phabricator.wikimedia.org/T259004) (owner: 10Urbanecm)
[12:15:58] <wikibugs>	 (03PS1) 10Kormat: Add wikimedia.cloud domain [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/618522
[12:25:42] <wikibugs>	 (03PS1) 10JMeybohm: Fix if in postinst [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618524 (https://phabricator.wikimedia.org/T258572)
[12:25:47] <wikibugs>	 (03PS1) 10KartikMistry: Update cxserver to 2020-08-05-070016-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618525 (https://phabricator.wikimedia.org/T258919)
[12:26:21] <icinga-wm>	 RECOVERY - puppet last run on otrs1001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[12:28:15] <wikibugs>	 10Operations, 10DBA, 10User-Kormat: DBA python layout - https://phabricator.wikimedia.org/T259516 (10Kormat)
[12:28:38] <wikibugs>	 (03PS4) 10Kormat: Split utilities into separate packages [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513 (https://phabricator.wikimedia.org/T259516)
[12:33:13] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: mobileapps: Remove the profile and the role [puppet] - 10https://gerrit.wikimedia.org/r/618488 (https://phabricator.wikimedia.org/T218733)
[12:33:26] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/618488 (https://phabricator.wikimedia.org/T218733) (owner: 10Alexandros Kosiaris)
[12:33:42] <moritzm>	 !log installing net-snmp security updates on icinga hosts
[12:33:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:34:56] <wikibugs>	 (03CR) 10Jcrespo: "Looks ok, but have you tried building all packages locally? I wonder if dependencies on python3-wmfmariadbpy will get duplicated as they c" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat)
[12:35:53] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618524 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[12:35:56] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "disclaimer: reviewing it as a new patch because too much time has passed since the last PS." (031 comment) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/562408 (https://phabricator.wikimedia.org/T231512) (owner: 10CRusnov)
[12:37:41] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Fix if in postinst [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618524 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[12:40:12] <wikibugs>	 (03Merged) 10jenkins-bot: Fix if in postinst [debs/helm-diff] - 10https://gerrit.wikimedia.org/r/618524 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm)
[12:40:25] <wikibugs>	 (03PS1) 10DCausse: [yarn] set yarn.scheduler.minimum-allocation-mb to > 0 [puppet] - 10https://gerrit.wikimedia.org/r/618526
[12:40:59] <wikibugs>	 (03CR) 10DCausse: "ref https://github.com/apache/flink/pull/12444" [puppet] - 10https://gerrit.wikimedia.org/r/618526 (owner: 10DCausse)
[12:46:05] <wikibugs>	 (03PS2) 10JMeybohm: helm: Replace repo update cronjob by systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/618350
[12:46:46] <moritzm>	 !log installing imagemagick security updates on buster
[12:46:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:47:20] <wikibugs>	 (03CR) 10JMeybohm: helm: Replace repo update cronjob by systemd timer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618350 (owner: 10JMeybohm)
[12:49:17] <jayme>	 !log imported helm-diff_3.1.2-1 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
[12:49:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:52:48] <XioNoX>	 !log netmon1002:/srv/deployment/librenms/librenms$ sudo -u librenms ./lnms migrate
[12:52:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:58:50] <wikibugs>	 10Operations, 10OTRS, 10serviceops, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10akosiaris) >>! In T187984#6351881, @eyazi wrote: > Not sure if you did, but you should also reset the Ticket::SearchIndexModule setting. C...
[13:00:20] <wikibugs>	 (03CR) 10Kormat: "> Patch Set 4:" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat)
[13:00:50] <moritzm>	 !log installing libjpeg-turbo security updates on stretch
[13:00:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:01:40] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+1] "Cool. Let's maybe merge fast and give a final production test/review when fully ready for deployment." [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/618513 (https://phabricator.wikimedia.org/T259516) (owner: 10Kormat)
[13:01:58] <wikibugs>	 (03PS1) 10Elukey: hadoop: set yarn_scheduler_minimum_allocation_mb to 1 [puppet] - 10https://gerrit.wikimedia.org/r/618529
[13:02:34] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] hadoop: set yarn_scheduler_minimum_allocation_mb to 1 [puppet] - 10https://gerrit.wikimedia.org/r/618529 (owner: 10Elukey)
[13:02:47] <wikibugs>	 (03Abandoned) 10DCausse: [yarn] set yarn.scheduler.minimum-allocation-mb to > 0 [puppet] - 10https://gerrit.wikimedia.org/r/618526 (owner: 10DCausse)
[13:04:10] <elukey>	 !log restart yarn resource managers on an-master100[12] to pick up new Yarn settings - https://gerrit.wikimedia.org/r/c/operations/puppet/+/618529
[13:04:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:11:07] <wikibugs>	 (03CR) 10Volans: "I don't see either a clear way to simplify it given the current data, see also inline." (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/617603 (https://phabricator.wikimedia.org/T200277) (owner: 10Ayounsi)
[13:14:47] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:16:41] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:20:16] <wikibugs>	 (03PS7) 10Elukey: Add custom ferm srange to Kafka Jumbo brokers [puppet] - 10https://gerrit.wikimedia.org/r/611168 (https://phabricator.wikimedia.org/T204957)
[13:22:25] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:24:21] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[13:24:27] <wikibugs>	 (03CR) 10Elukey: [C: 04-2] "https://puppet-compiler.wmflabs.org/compiler1001/24318/kafka-jumbo1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/611168 (https://phabricator.wikimedia.org/T204957) (owner: 10Elukey)
[13:24:41] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] helm: Replace repo update cronjob by systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/618350 (owner: 10JMeybohm)
[13:24:43] <wikibugs>	 (03PS3) 10Volans: mgmt: netbox-generated data for mgmt eqiad [dns] - 10https://gerrit.wikimedia.org/r/617509 (https://phabricator.wikimedia.org/T233183)
[13:24:47] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+2] api-gateway: change deployment to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618512 (https://phabricator.wikimedia.org/T254906) (owner: 10Hnowlan)
[13:24:58] <logmsgbot>	 !log volans@cumin1001 START - Cookbook sre.dns.netbox
[13:25:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:26:17] <wikibugs>	 (03Merged) 10jenkins-bot: api-gateway: change deployment to production [deployment-charts] - 10https://gerrit.wikimedia.org/r/618512 (https://phabricator.wikimedia.org/T254906) (owner: 10Hnowlan)
[13:28:54] <logmsgbot>	 !log volans@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[13:28:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:32:41] <ottomata>	 oh elukey
[13:32:43] <ottomata>	 mirror maker?
[13:32:48] <ottomata>	 hmmm
[13:32:52] <ottomata>	 ok no this is just for jumbo
[13:32:58] <ottomata>	 mirror maker pulls from elsewhere there
[13:32:58] <jayme>	 !log updated helmfile to 0.125.2-0 and helm-diff to 3.1.2-1 on contint* and deploy*
[13:32:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:33:07] <ottomata>	 oops wrong chat room back to analytics :)
[13:35:27] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "The idea is sound, but I'd implement it the same way as it's done for the envoy sidecars for TLS termination." (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/616121 (https://phabricator.wikimedia.org/T254908) (owner: 10Hnowlan)
[13:36:34] <wikibugs>	 (03PS1) 10Muehlenhoff: update libjpeg-turbo library hint to also cover libturbojpeg [puppet] - 10https://gerrit.wikimedia.org/r/618532
[13:40:22] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] update libjpeg-turbo library hint to also cover libturbojpeg [puppet] - 10https://gerrit.wikimedia.org/r/618532 (owner: 10Muehlenhoff)
[13:49:02] <wikibugs>	 10Operations, 10serviceops, 10Platform Team Initiatives (Containerise Services): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10akosiaris)
[13:49:08] <wikibugs>	 10Operations, 10Release Pipeline, 10Release-Engineering-Team-TODO, 10Epic, and 2 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10akosiaris)
[13:49:48] <wikibugs>	 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for David Rochford (Drochford) - https://phabricator.wikimedia.org/T259713 (10drochford)
[13:51:31] <moritzm>	 !log installing Linux update to 4.9.132 from buster point update (no reboots, just the package updates)
[13:51:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:03:57] <moritzm>	 !log installing node-minimist security updates
[14:03:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:05:15] <wikibugs>	 10Operations: Integrate Buster 10.5 point release - https://phabricator.wikimedia.org/T259519 (10MoritzMuehlenhoff)
[14:06:55] <wikibugs>	 10Operations, 10User-jbond: update profile::waf::apache2::administrative to use the new abuse_networks hiera key - https://phabricator.wikimedia.org/T253632 (10Aklapper) @JBond: Both patches in Gerrit have been merged. Can this task be resolved (via {nav name=Add Action... > Change Status} in the dropdown menu...
[14:08:30] <wikibugs>	 10Operations, 10Phabricator, 10Traffic: Access Forbidden to Phabricator at WikiArabia 2019 (Morocco) via Indian IP 185.174.156.75 - https://phabricator.wikimedia.org/T234598 (10Aklapper)
[14:09:38] <wikibugs>	 10Operations, 10Phabricator, 10Traffic: Access Forbidden to Phabricator at WikiArabia 2019 (Morocco) via Indian IP 185.174.156.75 - https://phabricator.wikimedia.org/T234598 (10Aklapper) See also T257507, T229575, T258059, T246923, etc. Reason might be vandalism (non-public T218589).  (The error itself is a...
[14:14:17] <moritzm>	 !log installing pillow security updates
[14:14:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:14:45] <icinga-wm>	 PROBLEM - Host mr1-eqsin IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[14:15:14] <volans>	 XioNoX: ^^^ fyi
[14:15:39] <volans>	 (hoping that you don't filter icinga-wm too :-P )
[14:15:52] <XioNoX>	 who's pinging me?
[14:16:18] <cdanis>	 someone is pinging you?
[14:16:39] <icinga-wm>	 PROBLEM - Juniper alarms on mr1-eqsin is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 103.102.166.128 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[14:16:48] <XioNoX>	 did mr1-eqsin die?
[14:17:55] <icinga-wm>	 PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 66%, RTA = 4267.51 ms
[14:17:59] <cdanis>	 seems like it
[14:18:06] <cdanis>	 I haven't tried scs yet
[14:18:13] <XioNoX>	 cdanis: scs is via mr1 :)
[14:18:17] <cdanis>	 ahah ofc
[14:18:37] <wikibugs>	 (03PS1) 10Ottomata: eventgate-main - bump image version to get schema mediawiki/revision/create/1.1.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618535 (https://phabricator.wikimedia.org/T216297)
[14:18:50] <XioNoX>	 `ping mr1-eqsin.oob.wikimedia.org -4` still replies
[14:19:05] <XioNoX>	 but not ssh
[14:19:19] <icinga-wm>	 PROBLEM - Host cp5006.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:20:04] <logmsgbot>	 !log sukhe@cumin1001 START - Cookbook sre.hosts.downtime
[14:20:04] <logmsgbot>	 !log sukhe@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[14:20:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:20:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:20:28] <volans>	 mmmh something is going on, I can login on a random cp host though
[14:20:52] <cdanis>	 guessing some weird internal crash, the data plane is still working but not other parts
[14:21:40] <XioNoX>	 v6 to bast5001 is terribly slow for me
[14:21:43] <XioNoX>	 v4 is fine
[14:22:02] <volans>	 and by ssh I meant ofc into a mgmt console
[14:22:07] <icinga-wm>	 RECOVERY - Juniper alarms on mr1-eqsin is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm
[14:22:37] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate-main - bump image version to get schema mediawiki/revision/create/1.1.0 [deployment-charts] - 10https://gerrit.wikimedia.org/r/618535 (https://phabricator.wikimedia.org/T216297) (owner: 10Ottomata)
[14:22:38] <volans>	 I'm on v4 indeed
[14:22:57] <XioNoX>	 I'm in via mr1-eqsin.oob.wikimedia.org -4
[14:22:59] <icinga-wm>	 RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 212.51 ms
[14:24:13] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
[14:24:14] <logmsgbot>	 !log otto@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
[14:24:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:29] <icinga-wm>	 RECOVERY - Host cp5006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 228.99 ms
[14:25:07] <icinga-wm>	 RECOVERY - Host mr1-eqsin IPv6 is UP: PING OK - Packet loss = 0%, RTA = 230.73 ms
[14:25:10] <moritzm>	 !log installing nmap bugfix updates from buster point release
[14:25:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:26:30] <wikibugs>	 10Operations: Integrate Buster 10.5 point release - https://phabricator.wikimedia.org/T259519 (10MoritzMuehlenhoff)
[14:27:46] <logmsgbot>	 !log otto@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
[14:27:46] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "This is ambitious, but I like it.  We could modify the patch to apply only to codfw1dev for testing, although since puppet is disabled on " [puppet] - 10https://gerrit.wikimedia.org/r/618418 (owner: 10Bstorm)
[14:27:46] <logmsgbot>	 !log otto@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
[14:27:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:32:19] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:32:58] <logmsgbot>	 !log otto@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
[14:32:58] <logmsgbot>	 !log otto@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
[14:32:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:33:48] <wikibugs>	 10Operations: Integrate Buster 10.5 point release - https://phabricator.wikimedia.org/T259519 (10MoritzMuehlenhoff)
[14:35:38] <wikibugs>	 10Operations, 10ops-codfw, 10RESTBase: restbase2009 down - https://phabricator.wikimedia.org/T256863 (10Papaul) @Eevans @hnowlan any update on this?
[14:36:00] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, thanks." [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/618522 (owner: 10Kormat)
[14:36:10] <wikibugs>	 (03CR) 10Muehlenhoff: [V: 03+2 C: 03+2] Add wikimedia.cloud domain [debs/wmf-sre-laptop] - 10https://gerrit.wikimedia.org/r/618522 (owner: 10Kormat)
[14:38:09] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:40:07] <XioNoX>	 volans: thanks for the ping :)
[14:41:16] <volans>	 anytime :)
[14:41:27] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:43:09] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[14:45:04] <wikibugs>	 (03CR) 10Ottomata: Add eventgate-logging-external streams, and add destination_event_service to all stream configs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618394 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[14:46:36] <wikibugs>	 (03PS1) 10Ema: ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692)
[14:47:10] <wikibugs>	 (03PS1) 10Ebernhardson: mjolnir: Increase msearch daemon parallelism to 25 [puppet] - 10https://gerrit.wikimedia.org/r/618538
[14:47:44] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Add a TestCase field for POST form data. [software/httpbb] - 10https://gerrit.wikimedia.org/r/615570 (owner: 10RLazarus)
[14:47:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692) (owner: 10Ema)
[14:47:58] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] Add a TestCase field for POST form data. [software/httpbb] - 10https://gerrit.wikimedia.org/r/615570 (owner: 10RLazarus)
[14:48:37] <elukey>	 !log reboot stat1008 for unexpected maintenance (GPU stuck)
[14:48:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:15] <wikibugs>	 (03Merged) 10jenkins-bot: Add a TestCase field for POST form data. [software/httpbb] - 10https://gerrit.wikimedia.org/r/615570 (owner: 10RLazarus)
[14:49:45] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul)
[14:51:42] <wikibugs>	 (03PS1) 10Elukey: hadoop set yarn_scheduler_minimum_allocation_vcores to 1 [puppet] - 10https://gerrit.wikimedia.org/r/618539
[14:52:18] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] hadoop set yarn_scheduler_minimum_allocation_vcores to 1 [puppet] - 10https://gerrit.wikimedia.org/r/618539 (owner: 10Elukey)
[14:52:20] <wikibugs>	 (03PS2) 10Ema: ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692)
[14:53:07] <icinga-wm>	 PROBLEM - Host stat1008 is DOWN: PING CRITICAL - Packet loss = 100%
[14:53:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692) (owner: 10Ema)
[14:53:46] <elukey>	 stat1008 is me
[14:53:51] <elukey>	 probably stuck in booting sigh
[14:57:22] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Check Netbox/dns/reality inconsistencies - https://phabricator.wikimedia.org/T259283 (10wiki_willy) a:03Cmjohnson
[14:58:01] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1008 is CRITICAL: 0.82 ge 0.1 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[14:59:51] <icinga-wm>	 RECOVERY - Host stat1008 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms
[15:00:47] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1009 is CRITICAL: 0.7685 ge 0.1 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[15:01:53] <wikibugs>	 (03PS3) 10Ema: ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692)
[15:03:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692) (owner: 10Ema)
[15:03:09] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1007 is CRITICAL: 0.7494 ge 0.1 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[15:04:35] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+1] Add eventgate-logging-external streams, and add destination_event_service to all stream configs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618394 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[15:05:05] <godog>	 uh oh, logstash is unhappy, I'm taking a look
[15:05:11] <logmsgbot>	 !log pt1979@cumin2001 START - Cookbook sre.dns.netbox
[15:05:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:06:44] <wikibugs>	 (03PS4) 10Ottomata: Add eventgate-logging-external streams, and add destination_event_service to all stream configs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618394 (https://phabricator.wikimedia.org/T251935)
[15:07:44] <wikibugs>	 (03PS4) 10Ema: ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692)
[15:07:57] <godog>	 not sure exactly what's up, I'll bounce logstash
[15:08:29] <godog>	 !log bounce logstash on logstash100[789] - udp loss reported
[15:08:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:47] <wikibugs>	 10Operations, 10Parsoid, 10serviceops, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10cscott) @ssastry one minor wrinkle to keep in mind is that to start an rt test run you need to update files on both...
[15:11:54] <logmsgbot>	 !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:11:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:12:39] <wikibugs>	 (03PS5) 10Ema: ATS: add function profile::trafficserver_caching_rules [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692)
[15:12:45] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1007 is OK: (C)0.1 ge (W)0.05 ge 0 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[15:13:37] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:14:11] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1009 is OK: (C)0.1 ge (W)0.05 ge 0.007086 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[15:15:19] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1008 is OK: (C)0.1 ge (W)0.05 ge 0 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[15:16:37] <godog>	 the last exceptions alert might a false positive while logstash catches up btw
[15:16:37] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul)
[15:16:43] <wikibugs>	 (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/618537 (https://phabricator.wikimedia.org/T259692) (owner: 10Ema)
[15:22:53] <wikibugs>	 (03CR) 10Mholloway: "In case it's useful, Mateus from Product Infrastructure has a project (https://github.com/thesocialdev/mediawiki-services-profiler) based " [deployment-charts] - 10https://gerrit.wikimedia.org/r/602527 (https://phabricator.wikimedia.org/T241230) (owner: 10Bmansurov)
[15:25:47] <wikibugs>	 (03PS1) 10Kormat: mariadb: Use correct binlog format for db_inventory [puppet] - 10https://gerrit.wikimedia.org/r/618540
[15:28:46] <wikibugs>	 (03CR) 10Kormat: "PCC run looks good: https://puppet-compiler.wmflabs.org/compiler1002/24323/" [puppet] - 10https://gerrit.wikimedia.org/r/618540 (owner: 10Kormat)
[15:29:15] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash instance=kafkamon1001 job=burrow partition={0,1,2,3,4,5} prometheus=ops site=eqiad topic={udp_localhost-info,udp_localhost-warning} https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-dataso
[15:29:15] <icinga-wm>	 luster=logging-eqiad&var-topic=All&var-consumer_group=All
[15:29:24] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: Use correct binlog format for db_inventory [puppet] - 10https://gerrit.wikimedia.org/r/618540 (owner: 10Kormat)
[15:29:26] <wikibugs>	 (03CR) 10Jcrespo: "I am 100% ok with the change, but it is not a universal rule for all hosts. I believe most misc servers use ROW on the master, and some ot" [puppet] - 10https://gerrit.wikimedia.org/r/618540 (owner: 10Kormat)
[15:29:35] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul) ` Interface       Admin Link Description xe-4/0/20       up    up   dbprov2003  Logical          Vlan          TAG     MAC         STP         L...
[15:30:03] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul)
[15:30:06] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/618540 (owner: 10Kormat)
[15:30:07] <icinga-wm>	 PROBLEM - Packet loss ratio for UDP on logstash1007 is CRITICAL: 0.7679 ge 0.1 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[15:34:31] <wikibugs>	 (03CR) 10Herron: [C: 03+2] alerting_host: assign alert[12]001 role::alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/618345 (https://phabricator.wikimedia.org/T247966) (owner: 10Herron)
[15:34:39] <wikibugs>	 (03PS3) 10Herron: alerting_host: assign alert[12]001 role::alerting_host [puppet] - 10https://gerrit.wikimedia.org/r/618345 (https://phabricator.wikimedia.org/T247966)
[15:35:02] <herron>	 hmm looking at udp loss
[15:36:09] <godog>	 thanks, I'm looking too but not sure yet what/why is happening (bounced logstash a little while ago for the same reason)
[15:39:37] <icinga-wm>	 RECOVERY - Packet loss ratio for UDP on logstash1007 is OK: (C)0.1 ge (W)0.05 ge 0.0007004 https://wikitech.wikimedia.org/wiki/Logstash https://grafana.wikimedia.org/dashboard/db/logstash
[15:39:48] <wikibugs>	 (03PS2) 10Kormat: mariadb: Use correct binlog format for db_inventory [puppet] - 10https://gerrit.wikimedia.org/r/618540
[15:40:05] <wikibugs>	 (03CR) 10Kormat: "Updated the commit message slightly to be clearer." [puppet] - 10https://gerrit.wikimedia.org/r/618540 (owner: 10Kormat)
[15:40:29] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[15:41:02] <wikibugs>	 (03CR) 10Cwhite: [V: 03+2 C: 03+2] Init retry_count at each collection [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/618504 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[15:41:40] <wikibugs>	 (03CR) 10Cwhite: [V: 03+2 C: 03+2] Add support for exposing Icinga problems as metrics [debs/prometheus-icinga-exporter] - 10https://gerrit.wikimedia.org/r/618505 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi)
[15:42:18] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] mariadb: Use correct binlog format for db_inventory [puppet] - 10https://gerrit.wikimedia.org/r/618540 (owner: 10Kormat)
[15:43:49] <logmsgbot>	 !log hnowlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
[15:44:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:48:40] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Switch service-checker-image to python3 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/618542
[15:48:41] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10jcrespo)
[15:48:44] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "There is a similar change for the Jenkins user and I expressed concerns about it on https://gerrit.wikimedia.org/r/c/operations/puppet/+/6" [puppet] - 10https://gerrit.wikimedia.org/r/607853 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn)
[15:50:13] <logmsgbot>	 !log hnowlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
[15:50:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:05] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Add eventgate-logging-external streams, and add destination_event_service to all stream configs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618394 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[15:53:46] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (2020-09-14) rack/setup/install dbprov1003.eqiad.wmnet - https://phabricator.wikimedia.org/T258750 (10jcrespo)
[15:55:07] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10jcrespo)
[15:56:22] <logmsgbot>	 !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-logging-external streams and destination_event_service settings - T251935 (duration: 01m 05s)
[15:56:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:56:24] <stashbot>	 T251935: All EventGate instances should use EventStreamConfig - https://phabricator.wikimedia.org/T251935
[15:56:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "What Daniel wrote: Having the central GIDs recorded in data.yaml is the only sane way to prevent duplicated use." [puppet] - 10https://gerrit.wikimedia.org/r/606286 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn)
[15:59:14] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul)    Status  Name  State  Layout  Size  Media Type  Read Policy  Write Policy  Stripe Size  Secured  Remaining Redundancy    Virtual Disk 0  Onlin...
[15:59:25] <wikibugs>	 10Operations, 10Parsoid, 10serviceops, 10Parsoid-Tests, 10Patch-For-Review: Move testreduce away from scandium to a separate Buster Ganeti VM - https://phabricator.wikimedia.org/T257906 (10ssastry) >>! In T257906#6362985, @cscott wrote: > @ssastry one minor wrinkle to keep in mind is that to start an rt...
[16:00:55] <wikibugs>	 (03CR) 10Ottomata: [C: 03+1] Move oozie server to an-scheduler1001 [puppet] - 10https://gerrit.wikimedia.org/r/618339 (https://phabricator.wikimedia.org/T257412) (owner: 10Elukey)
[16:01:45] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] haproxy-galera: Make a meaningful healthcheck [puppet] - 10https://gerrit.wikimedia.org/r/618418 (owner: 10Bstorm)
[16:02:27] <wikibugs>	 (03PS2) 10Ottomata: eventgate-logging-external - Use MW EventStreamConfig API to get static stream configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/618395 (https://phabricator.wikimedia.org/T251935)
[16:04:03] <wikibugs>	 (03PS1) 10Herron: acme_cheif: permit alert[12]001 to fetch icinga cert [puppet] - 10https://gerrit.wikimedia.org/r/618545 (https://phabricator.wikimedia.org/T247966)
[16:04:18] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] eventgate-logging-external - Use MW EventStreamConfig API to get static stream configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/618395 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[16:04:39] <wikibugs>	 (03PS1) 10Papaul: DNS: Add production DNS for dbprov2003 [dns] - 10https://gerrit.wikimedia.org/r/618546
[16:05:59] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DNS: Add production DNS for dbprov2003 [dns] - 10https://gerrit.wikimedia.org/r/618546 (owner: 10Papaul)
[16:06:41] <wikibugs>	 (03CR) 10Hashar: [C: 04-1] "My review comment was unclear, I apologize.  My concerns are:" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/606286 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn)
[16:07:00] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul)
[16:12:12] <wikibugs>	 (03PS1) 10Papaul: DHCP: Add MAC address for dbprov2003 [puppet] - 10https://gerrit.wikimedia.org/r/618548 (https://phabricator.wikimedia.org/T258749)
[16:13:29] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DHCP: Add MAC address for dbprov2003 [puppet] - 10https://gerrit.wikimedia.org/r/618548 (https://phabricator.wikimedia.org/T258749) (owner: 10Papaul)
[16:15:47] <wikibugs>	 10Operations: Integrate Buster 10.5 point release - https://phabricator.wikimedia.org/T259519 (10MoritzMuehlenhoff)
[16:16:50] <wikibugs>	 10Operations: Integrate Buster 10.5 point release - https://phabricator.wikimedia.org/T259519 (10MoritzMuehlenhoff)
[16:16:54] <wikibugs>	 (03PS4) 10Herron: kafkamon: add role::kafka::monitoring_buster, assign kafkamon[12]002 [puppet] - 10https://gerrit.wikimedia.org/r/618359 (https://phabricator.wikimedia.org/T252773)
[16:17:24] <wikibugs>	 (03CR) 10Herron: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/618359 (https://phabricator.wikimedia.org/T252773) (owner: 10Herron)
[16:18:05] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10jcrespo) They use the `custom/db.cfg` recipe, but only on first install, after that they are moved to `custom/reuse-dbprov.cfg`.
[16:21:50] <wikibugs>	 (03PS1) 10Papaul: Add dbprov2003 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/618549 (https://phabricator.wikimedia.org/T258749)
[16:22:29] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] Add dbprov2003 to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/618549 (https://phabricator.wikimedia.org/T258749) (owner: 10Papaul)
[16:26:20] <icinga-wm>	 RECOVERY - nova instance creation test on cloudcontrol1003 is OK: PROCS OK: 1 process with command name python, args nova-fullstack https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[16:28:33] <wikibugs>	 (03PS1) 10Ottomata: Add eventgate service specific test.event streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618550 (https://phabricator.wikimedia.org/T251935)
[16:32:33] <wikibugs>	 (03CR) 10Ppchelko: [C: 04-1] Add eventgate service specific test.event streams (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618550 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[16:33:39] <wikibugs>	 (03PS2) 10Ottomata: Add eventgate service specific test.event streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618550 (https://phabricator.wikimedia.org/T251935)
[16:33:41] <wikibugs>	 (03PS1) 10Andrew Bogott: mwopenstackclients: fix ensure_recordset [puppet] - 10https://gerrit.wikimedia.org/r/618553
[16:34:40] <wikibugs>	 (03PS1) 10RLazarus: httpbb: Move test files into subdirectories by host type. [puppet] - 10https://gerrit.wikimedia.org/r/618554 (https://phabricator.wikimedia.org/T259665)
[16:35:12] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "but do the new hosts also need new SNIs to be added?" [puppet] - 10https://gerrit.wikimedia.org/r/618545 (https://phabricator.wikimedia.org/T247966) (owner: 10Herron)
[16:35:18] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+1] Add eventgate service specific test.event streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618550 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[16:35:59] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] httpbb: Move test files into subdirectories by host type. [puppet] - 10https://gerrit.wikimedia.org/r/618554 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[16:37:09] <wikibugs>	 (03PS2) 10RLazarus: httpbb: Move test files into subdirectories by host type. [puppet] - 10https://gerrit.wikimedia.org/r/618554 (https://phabricator.wikimedia.org/T259665)
[16:38:40] <wikibugs>	 (03PS1) 10Jcrespo: mariadb-backups: Add dbprov2003 to the db.cfg partman recipe list [puppet] - 10https://gerrit.wikimedia.org/r/618555 (https://phabricator.wikimedia.org/T258749)
[16:39:09] <wikibugs>	 (03CR) 10RLazarus: "PCC: https://puppet-compiler.wmflabs.org/compiler1003/24325/cumin1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/618554 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[16:39:12] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] mwopenstackclients: fix ensure_recordset [puppet] - 10https://gerrit.wikimedia.org/r/618553 (owner: 10Andrew Bogott)
[16:39:44] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] mwopenstackclients: fix ensure_recordset [puppet] - 10https://gerrit.wikimedia.org/r/618553 (owner: 10Andrew Bogott)
[16:40:19] <wikibugs>	 (03PS2) 10Herron: acme_cheif: add alert[12]001 SNI and permit to fetch icinga cert [puppet] - 10https://gerrit.wikimedia.org/r/618545 (https://phabricator.wikimedia.org/T247966)
[16:40:37] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] httpbb: Move test files into subdirectories by host type. [puppet] - 10https://gerrit.wikimedia.org/r/618554 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[16:40:57] <wikibugs>	 (03CR) 10Herron: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/618545 (https://phabricator.wikimedia.org/T247966) (owner: 10Herron)
[16:41:21] <wikibugs>	 (03PS2) 10Jcrespo: mariadb-backups: Add dbprov[12]003 to the db.cfg partman recipe list [puppet] - 10https://gerrit.wikimedia.org/r/618555 (https://phabricator.wikimedia.org/T258749)
[16:41:53] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] httpbb: Move test files into subdirectories by host type. [puppet] - 10https://gerrit.wikimedia.org/r/618554 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[16:42:24] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Add dbprov[12]003 to the db.cfg partman recipe list [puppet] - 10https://gerrit.wikimedia.org/r/618555 (https://phabricator.wikimedia.org/T258749) (owner: 10Jcrespo)
[16:42:50] <jynus>	 rzl: deploy?
[16:42:55] <rzl>	 yes please!
[16:45:11] <wikibugs>	 (03PS1) 10JMeybohm: Detect kubeconfig as known argument in plugin invocations [debs/helm] - 10https://gerrit.wikimedia.org/r/618556 (https://phabricator.wikimedia.org/T258572)
[16:46:08] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10DStrine)
[16:48:41] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10DStrine) @Ladsgroup thanks for the info. I have rewritten this task in the format you have recommended. Let...
[16:50:46] <elukey>	 !log powercycle stat1005 after GPU issue
[16:50:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:53:28] <icinga-wm>	 PROBLEM - Host stat1005 is DOWN: PING CRITICAL - Packet loss = 100%
[16:54:53] <wikibugs>	 (03PS1) 10Ahmon Dancy: zuul_error_log.mtail: Settle on initial counters [puppet] - 10https://gerrit.wikimedia.org/r/618557 (https://phabricator.wikimedia.org/T258821)
[16:55:04] <icinga-wm>	 RECOVERY - Host stat1005 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms
[16:56:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] zuul_error_log.mtail: Settle on initial counters [puppet] - 10https://gerrit.wikimedia.org/r/618557 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[16:57:51] <wikibugs>	 (03PS2) 10Ahmon Dancy: zuul_error_log.mtail: Settle on initial counters [puppet] - 10https://gerrit.wikimedia.org/r/618557 (https://phabricator.wikimedia.org/T258821)
[17:03:55] <wikibugs>	 (03PS3) 10Dzahn: ATS: switch releases.wm to new buster backend servers [dns] - 10https://gerrit.wikimedia.org/r/618412 (https://phabricator.wikimedia.org/T247652)
[17:03:59] <rzl>	 I broke puppet on cumin* and deployment*, working on it
[17:04:02] <wikibugs>	 (03PS2) 10Dzahn: releases: use --delete when rsyncing files between servers [puppet] - 10https://gerrit.wikimedia.org/r/618411 (https://phabricator.wikimedia.org/T247652)
[17:04:10] <wikibugs>	 (03PS2) 10Dzahn: httpbb: add test file for releases.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652)
[17:04:34] <volans>	 rzl: once fixed https://wikitech.wikimedia.org/wiki/Cumin#Run_Puppet_only_if_last_run_failed is your friend :D
[17:04:40] <rzl>	 ack, thanks
[17:05:17] <wikibugs>	 (03PS1) 10Dzahn: releases: open firewall hole for http from deployment_server [puppet] - 10https://gerrit.wikimedia.org/r/618559 (https://phabricator.wikimedia.org/T247652)
[17:06:10] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and nda groups for edtadros - https://phabricator.wikimedia.org/T256435 (10Jrbranaa) Sorry, being out of the office and changing my IRC usage (missing channels :-/) I didn't see this.  The contract termi...
[17:06:33] <wikibugs>	 (03PS2) 10Dzahn: releases: open firewall hole for http from deployment_server [puppet] - 10https://gerrit.wikimedia.org/r/618559 (https://phabricator.wikimedia.org/T247652)
[17:07:44] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and nda groups for edtadros - https://phabricator.wikimedia.org/T256435 (10Dzahn) 05Stalled→03Open a:05Jrbranaa→03None
[17:08:49] <wikibugs>	 (03CR) 10Dzahn: "a contract end date has been provided at https://phabricator.wikimedia.org/T256435#6363455  this can now be amended with that and the tick" [puppet] - 10https://gerrit.wikimedia.org/r/609158 (https://phabricator.wikimedia.org/T256435) (owner: 10Ssingh)
[17:09:56] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[17:10:07] <wikibugs>	 (03PS1) 10RLazarus: httpbb: Fix breakage caused by 618554, create dir before files [puppet] - 10https://gerrit.wikimedia.org/r/618562 (https://phabricator.wikimedia.org/T259665)
[17:10:10] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] releases: open firewall hole for http from deployment_server [puppet] - 10https://gerrit.wikimedia.org/r/618559 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn)
[17:10:45] <rzl>	 mutante: fyi puppet's broken on the deployment hosts right now ^
[17:10:54] <rzl>	 I'll send you the fix for review as soon as pcc finishes
[17:10:54] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[17:11:08] <mutante>	 rzl: thanks! it's alright, i just need it on releases* right now
[17:11:13] <rzl>	 ahh okay
[17:11:34] <rzl>	 well, I'll send you the fix for review anyway if you don't mind :D
[17:12:08] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "warning, i was about to upload another test file that is neither for miscweb nor for appservers :p" [puppet] - 10https://gerrit.wikimedia.org/r/618562 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[17:12:37] <mutante>	 rzl: i made another test file for releases* , heh
[17:12:51] <mutante>	 i am opening the firewall there to be able to use it :)
[17:13:02] <rzl>	 cool! go ahead and add a third subdir in that case
[17:13:10] <mutante>	 ok
[17:13:11] <rzl>	 at some point I might clean up the way those are defined, but
[17:13:21] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] "PCC: https://puppet-compiler.wmflabs.org/compiler1001/24326/cumin1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/618562 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[17:15:47] <volans>	 rzl: and to be clear, if unsure were it's broken, it's totally ok to run that cumin command across the whole fleet, granted you use a sane batch size like in the example
[17:15:56] <rzl>	 nod
[17:16:17] <rzl>	 in this case it's only four hosts and I know what they are, so I just did a plain old sudo cumin run-puppet-agent
[17:16:25] <rzl>	 but glad to be reminded that recipe exists for the more interesting cases
[17:17:07] <rzl>	 fixed!
[17:17:30] <volans>	 :)
[17:17:43] <mutante>	 so the other day i kept missing the $http_proxy setup on some hosts.. so i think it's a good idea to add it to my global .bash_profile.  One day later... i am using httpbb and wonder why it times out.. now gotta think about removing the http_proxy :p
[17:17:50] <rzl>	 I *really* wish pcc could catch "parent directory doesn't exist"
[17:18:38] <volans>	 it could exists outside of puppet though
[17:19:01] <rzl>	 yeah it's true, I don't think it's actually possible
[17:19:08] <rzl>	 but life would be nice if it were
[17:20:51] <wikibugs>	 (03PS1) 10RLazarus: httpbb: Remove temporary ensure-absents for moved files [puppet] - 10https://gerrit.wikimedia.org/r/618563 (https://phabricator.wikimedia.org/T259665)
[17:21:45] <wikibugs>	 (03PS1) 10Lucas Werkmeister (WMDE): Pass jQuery objects into jqueryMsg [extensions/ContentTranslation] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618566
[17:22:07] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] httpbb: Remove temporary ensure-absents for moved files [puppet] - 10https://gerrit.wikimedia.org/r/618563 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[17:25:19] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] httpbb: Remove temporary ensure-absents for moved files [puppet] - 10https://gerrit.wikimedia.org/r/618563 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[17:30:17] <shdubsh>	 !log test prometheus-icinga-exporter upgrade on icinga2001
[17:30:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:31:12] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wmcs: alphabetize labstore NFS mounts [puppet] - 10https://gerrit.wikimedia.org/r/618389 (owner: 10BryanDavis)
[17:36:12] <wikibugs>	 (03PS4) 10Bstorm: wmcs: Add project NFS for wmde-templates-alpha [puppet] - 10https://gerrit.wikimedia.org/r/618390 (https://phabricator.wikimedia.org/T259254) (owner: 10BryanDavis)
[17:36:34] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:37:33] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wmcs: Add project NFS for wmde-templates-alpha [puppet] - 10https://gerrit.wikimedia.org/r/618390 (https://phabricator.wikimedia.org/T259254) (owner: 10BryanDavis)
[17:37:38] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:39:00] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[17:39:32] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[17:39:58] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[17:40:22] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[17:40:27] <wikibugs>	 (03PS3) 10Ahmon Dancy: zuul_error_log.mtail: Settle on initial counters [puppet] - 10https://gerrit.wikimedia.org/r/618557 (https://phabricator.wikimedia.org/T258821)
[17:40:58] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "Looks great to me." [puppet] - 10https://gerrit.wikimedia.org/r/617995 (owner: 10Muehlenhoff)
[17:42:15] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10User-dancy, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10dancy)
[17:43:18] <mutante>	 rzl: currently confused. i add more assertions to my local test file but httpbb says "4 requests sent" before and after. I am like "that should be more than 4 now". 
[17:46:47] <mutante>	 well, i have 6 URLs in the file but only 4 are unique, the other 2 differ in path
[17:46:51] <wikibugs>	 (03CR) 10Bstorm: "Won't this need to auth to the ceph cluster as well? Maybe that would be on the backup server profile and not this anyway, though. I'm jus" [puppet] - 10https://gerrit.wikimedia.org/r/617841 (https://phabricator.wikimedia.org/T259192) (owner: 10Andrew Bogott)
[17:48:44] <mutante>	 ok, got it. i did it wrong
[17:50:09] <wikibugs>	 (03CR) 10Ahmon Dancy: "Followup to https://gerrit.wikimedia.org/r/c/operations/puppet/+/617271" [puppet] - 10https://gerrit.wikimedia.org/r/618557 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[17:50:26] <rzl>	 mutante: cool, happy to look if it comes up again
[17:51:04] <wikibugs>	 (03CR) 10BryanDavis: [C: 03+1] toolforge: Remove jessie conditionals [puppet] - 10https://gerrit.wikimedia.org/r/617995 (owner: 10Muehlenhoff)
[17:52:58] <wikibugs>	 (03PS3) 10Dzahn: releases: use --delete when rsyncing files between servers [puppet] - 10https://gerrit.wikimedia.org/r/618411
[17:53:00] <wikibugs>	 (03PS3) 10Dzahn: httpbb: add test file for releases.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652)
[17:55:13] <wikibugs>	 (03PS3) 10Nray: Re-enable growth study quick survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618343 (https://phabricator.wikimedia.org/T257015)
[17:56:24] <wikibugs>	 (03PS4) 10Dzahn: httpbb: add test file for releases.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652)
[17:57:58] <wikibugs>	 (03PS5) 10Dzahn: httpbb: add directory and test file for releases.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652)
[17:58:11] <Lucas_WMDE>	 jouncebot: refresh
[17:58:12] <jouncebot>	 I refreshed my knowledge about deployments.
[17:59:10] <Lucas_WMDE>	 thx
[17:59:58] <mutante>	 rzl:  ^ like that?  i kept adding both http and https even though i don't get the redirects when testing internally
[18:00:04] <jouncebot>	 brennen and dancy: Time to snap out of that daydream and deploy Train log triage with CPT. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200805T1800).
[18:00:05] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy Morning backport window(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200805T1800).
[18:00:05] <jouncebot>	 nray and Lucas_WMDE: A patch you scheduled for Morning backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[18:00:13] <Lucas_WMDE>	 o/
[18:00:16] <nray>	 o/ here
[18:00:27] <mutante>	 and just 2 random files i check from the actual releases .. good enough to me
[18:01:11] <Lucas_WMDE>	 nray: do you want to deploy your change yourself?
[18:01:40] <nray>	 I don't have deploy rights
[18:01:59] <Lucas_WMDE>	 ok, then I can do it
[18:02:00] <wikibugs>	 (03CR) 10BryanDavis: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/618283 (https://phabricator.wikimedia.org/T255408) (owner: 10Marostegui)
[18:02:13] <wikibugs>	 (03CR) 10Herron: [C: 03+2] acme_cheif: add alert[12]001 SNI and permit to fetch icinga cert [puppet] - 10https://gerrit.wikimedia.org/r/618545 (https://phabricator.wikimedia.org/T247966) (owner: 10Herron)
[18:02:26] <Lucas_WMDE>	 I’ll also +2 my backport so the CI starts already
[18:02:32] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Pass jQuery objects into jqueryMsg [extensions/ContentTranslation] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618566 (owner: 10Lucas Werkmeister (WMDE))
[18:03:00] <nray>	 Lucas_WMDE:  thank you!
[18:03:33] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): Re-enable growth study quick survey (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618343 (https://phabricator.wikimedia.org/T257015) (owner: 10Nray)
[18:04:03] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] "(lgtm otherwise)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618343 (https://phabricator.wikimedia.org/T257015) (owner: 10Nray)
[18:05:14] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[18:05:20] <wikibugs>	 (03PS4) 10Nray: Re-enable growth study quick survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618343 (https://phabricator.wikimedia.org/T257015)
[18:05:23] <wikibugs>	 (03PS4) 10Dzahn: releases: use --delete when rsyncing files between servers [puppet] - 10https://gerrit.wikimedia.org/r/618411
[18:05:52] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+2] Re-enable growth study quick survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618343 (https://phabricator.wikimedia.org/T257015) (owner: 10Nray)
[18:06:14] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[18:06:43] <wikibugs>	 (03Merged) 10jenkins-bot: Re-enable growth study quick survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618343 (https://phabricator.wikimedia.org/T257015) (owner: 10Nray)
[18:07:06] <Lucas_WMDE>	 nray: this sounds like a change that can’t really be tested on mwdebug, right?
[18:07:15] <Lucas_WMDE>	 other than quickly checking that the wiki doesn’t explode
[18:07:33] <nray>	 I think it can be tested. There is a query param that makes the survey show
[18:07:39] <Lucas_WMDE>	 ah, ok
[18:07:46] <Lucas_WMDE>	 in that case it’s on mwdebug1001 now, please test :)
[18:07:51] <nray>	 cool thanks
[18:09:00] <nray>	 Lucas_WMDE:  Tested and lgtm!
[18:09:03] <Lucas_WMDE>	 ok!
[18:09:38] <Lucas_WMDE>	 syncing
[18:10:36] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] releases: use --delete when rsyncing files between servers [puppet] - 10https://gerrit.wikimedia.org/r/618411 (owner: 10Dzahn)
[18:10:46] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:618343|Re-enable growth study quick survey (T257015)]] (duration: 01m 12s)
[18:11:22] <Lucas_WMDE>	 stashbot: u there?
[18:11:40] <mutante>	 !log test !log
[18:11:44] <Lucas_WMDE>	 well, it’s in the sal tool at least
[18:11:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:11:53] <Lucas_WMDE>	 ok there we go
[18:12:27] <Lucas_WMDE>	 (but the test !log is still waiting)
[18:12:53] <Lucas_WMDE>	 next on the deployment calendar for this window is a backport, which is currently going through CI, btw
[18:12:55] * Lucas_WMDE waits
[18:12:57] <mutante>	 surprisingly slow...
[18:14:44] <Lucas_WMDE>	 that missing test !log is concerning
[18:14:46] <stashbot>	 T257015: Redeploy quicksurvey on enwiki (for a Growth study) - https://phabricator.wikimedia.org/T257015
[18:14:46] <stashbot>	 See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help.
[18:14:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:15:03] <Lucas_WMDE>	 ok now it went through
[18:15:12] <mutante>	 ACK, seeing it
[18:15:19] <nray>	 sweet, thank you for your help @Lucas_WMDE  
[18:15:20] <Lucas_WMDE>	 perhaps the phabricator API was slow and it blocked on getting the task label…?
[18:15:22] <Lucas_WMDE>	 random guess
[18:15:26] <Lucas_WMDE>	 np nray :)
[18:15:30] <Lucas_WMDE>	 good luck with the survey
[18:15:56] <mutante>	 i don't think phab, my test did not include a ticket number
[18:16:33] <mutante>	 maybe toollabs is very busy
[18:16:40] <Lucas_WMDE>	 but maybe processing that !log was blocked on processing the task number from the scap !log
[18:16:54] <Lucas_WMDE>	 (and apparently “see … for help” is its response to my “u there”)
[18:17:32] <mutante>	 Lucas_WMDE: heh, yea, you might be right there
[18:18:15] <Lucas_WMDE>	 (to clarify – on my end, the messages are T257015, then see X for help, then logged the message. looks like the wm-bot log has a different order so maybe I’m completely wrong after all)
[18:20:02] <wikibugs>	 (03CR) 10Andrew Bogott: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/617841 (https://phabricator.wikimedia.org/T259192) (owner: 10Andrew Bogott)
[18:21:05] <wikibugs>	 (03Merged) 10jenkins-bot: Pass jQuery objects into jqueryMsg [extensions/ContentTranslation] (wmf/1.36.0-wmf.3) - 10https://gerrit.wikimedia.org/r/618566 (owner: 10Lucas Werkmeister (WMDE))
[18:22:04] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` dbprov2003.codfw.wmnet ` The log can be found i...
[18:22:34] <Lucas_WMDE>	 testing the backport on mwdebug1001
[18:23:46] <Lucas_WMDE>	 seems to work fine, syncing
[18:25:50] <logmsgbot>	 !log lucaswerkmeister-wmde@deploy1001 Synchronized php-1.36.0-wmf.3/extensions/ContentTranslation/: Backport: [[gerrit:618566|Pass jQuery objects into jqueryMsg]] (duration: 01m 11s)
[18:25:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:56] <Lucas_WMDE>	 !log Morning backport window done
[18:26:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:34:24] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[18:35:18] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10User-dancy, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10Ejegg)
[18:35:20] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[18:37:21] <wikibugs>	 (03PS1) 10Kaldari: Switching to updated license definition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618586
[18:39:15] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbprov2003.codfw.wmnet'] `  Of which those **FAILED**: ` ['dbprov2003.codfw.wmnet'] `
[18:39:46] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "/me looks. Oh yeah, that'd totally take care of it." [puppet] - 10https://gerrit.wikimedia.org/r/617841 (https://phabricator.wikimedia.org/T259192) (owner: 10Andrew Bogott)
[18:42:31] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] jessie-ssd: Fetch base image from docker-registry.tools.wmflabs.org [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/617288 (owner: 10BryanDavis)
[18:52:38] <rzl>	 mutante: sorry, back from lunch and looking now
[18:54:00] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] dnsrecursor: allow installation of pdns-recursor from component [puppet] - 10https://gerrit.wikimedia.org/r/618376 (owner: 10Ssingh)
[18:54:51] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10jcrespo) This was the only issue we had the last time with the same hw and recipe: T218336#5068836
[18:57:37] <wikibugs>	 (03PS6) 10Cicalese: DO NOT MERGE Remove temporary logging for mediamoderation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606239 (https://phabricator.wikimedia.org/T245595)
[18:58:30] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] dnsrecursor: allow installation of pdns-recursor from component [puppet] - 10https://gerrit.wikimedia.org/r/618376 (owner: 10Ssingh)
[18:58:39] <wikibugs>	 (03PS7) 10Cicalese: DO NOT MERGE Remove temporary logging for mediamoderation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606239 (https://phabricator.wikimedia.org/T259742)
[18:59:38] <wikibugs>	 (03PS8) 10Cicalese: Remove temporary logging for mediamoderation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606239 (https://phabricator.wikimedia.org/T259742)
[19:00:04] <jouncebot>	 brennen and dancy: Your horoscope predicts another unfortunate Mediawiki train - American Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200805T1900).
[19:00:40] <brennen>	 horrorscope.
[19:00:43] <dancy>	 I've been enjoying the automated nags so far.
[19:01:53] <dancy>	 Did anything interesting come out of the triage meeting?
[19:02:06] <brennen>	 dancy (or any random persons who want to watch me swear at a train): https://meet.google.com/qxk-kkjc-meo
[19:03:20] <brennen>	 pretty quiet triage meeting;  we should have a clean dashboard for this one.
[19:04:28] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog1001.eqiad.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[19:05:50] <wikibugs>	 (03PS1) 10Ssingh: wikidough: enable QNAME minimisation for the dnsrecursor module [puppet] - 10https://gerrit.wikimedia.org/r/618591 (https://phabricator.wikimedia.org/T252132)
[19:06:44] <wikibugs>	 (03PS1) 10Brennen Bearnes: group1 wikis to 1.36.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618592
[19:06:46] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] group1 wikis to 1.36.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618592 (owner: 10Brennen Bearnes)
[19:07:25] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+1] Remove temporary logging for mediamoderation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/606239 (https://phabricator.wikimedia.org/T259742) (owner: 10Cicalese)
[19:07:30] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.36.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618592 (owner: 10Brennen Bearnes)
[19:10:16] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[19:11:55] <logmsgbot>	 !log brennen@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
[19:11:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:12:07] <wikibugs>	 (03CR) 10RLazarus: httpbb: add directory and test file for releases.wm.org (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn)
[19:12:39] <wikibugs>	 (03PS1) 10Papaul: DHCP: Fix MAC address for dbprov2003 [puppet] - 10https://gerrit.wikimedia.org/r/618593 (https://phabricator.wikimedia.org/T258749)
[19:13:15] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] DHCP: Fix MAC address for dbprov2003 [puppet] - 10https://gerrit.wikimedia.org/r/618593 (https://phabricator.wikimedia.org/T258749) (owner: 10Papaul)
[19:13:39] <logmsgbot>	 !log brennen@deploy1001 Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 44s)
[19:13:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:17:43] <wikibugs>	 (03PS1) 10Ssingh: dnsrecursor: use the correct option name in commit e250327 [puppet] - 10https://gerrit.wikimedia.org/r/618594
[19:21:10] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` dbprov2003.codfw.wmnet `...
[19:22:28] <wikibugs>	 (03CR) 10Ssingh: "Merging this based on the review from I2173f99. No Puppet code change; only the template was updated which does not affect the other DNS r" [puppet] - 10https://gerrit.wikimedia.org/r/618594 (owner: 10Ssingh)
[19:23:16] <wikibugs>	 (03CR) 10Ssingh: [C: 03+2] dnsrecursor: use the correct option name in commit e250327 [puppet] - 10https://gerrit.wikimedia.org/r/618594 (owner: 10Ssingh)
[19:26:31] <wikibugs>	 (03CR) 10Ssingh: "https://puppet-compiler.wmflabs.org/compiler1001/24330/malmok.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/618591 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh)
[19:29:17] <wikibugs>	 (03CR) 10Dzahn: "all releases servers (and deploy1001 when it comes to /srv/patches) are now _actual_ mirrors of each other and are not accumulating old fi" [puppet] - 10https://gerrit.wikimedia.org/r/618411 (owner: 10Dzahn)
[19:37:11] <logmsgbot>	 !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime
[19:37:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:39:12] <logmsgbot>	 !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[19:39:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:41:55] <logmsgbot>	 !log brennen@deploy1001 rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.2
[19:41:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:42:44] <wikibugs>	 (03PS1) 10Brennen Bearnes: Revert "group1 wikis to 1.36.0-wmf.3" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618595
[19:42:46] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Revert "group1 wikis to 1.36.0-wmf.3" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618595 (owner: 10Brennen Bearnes)
[19:43:29] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.36.0-wmf.3" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/618595 (owner: 10Brennen Bearnes)
[19:46:03] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul)
[19:50:10] <icinga-wm>	 PROBLEM - nova instance creation test on cloudcontrol1003 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[19:59:40] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[19:59:47] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['dbprov2003.codfw.wmnet'] `  and were **ALL** successful.
[20:00:04] <jouncebot>	 halfak and accraze: How many deployers does it take to do Services – Graphoid / ORES deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200805T2000).
[20:00:42] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[20:04:31] <wikibugs>	 (03PS1) 10RLazarus: web_testing: Remove the apache-fast-test placeholder [puppet] - 10https://gerrit.wikimedia.org/r/618602
[20:04:33] <wikibugs>	 (03PS1) 10RLazarus: web_testing: Clean up the old class used for apache-fast-test. [puppet] - 10https://gerrit.wikimedia.org/r/618603
[20:08:06] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[20:09:51] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] zuul_error_log.mtail: Settle on initial counters [puppet] - 10https://gerrit.wikimedia.org/r/618557 (https://phabricator.wikimedia.org/T258821) (owner: 10Ahmon Dancy)
[20:12:43] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul)
[20:12:58] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10Papaul) 05Open→03Resolved This is done
[20:20:31] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog1001.eqiad.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[20:20:48] <wikibugs>	 (03CR) 10Herron: [C: 03+1] prometheus: puppetized install of prometheus-es-exporter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/617260 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[20:20:52] <wikibugs>	 10Operations, 10serviceops: httpbb: Mapping between tests and hosts - https://phabricator.wikimedia.org/T259665 (10RLazarus) 05Open→03Resolved The simple version of this is done. We might eventually want to do something more elaborate -- the advantage would be that httpbb could be run without explicitly pa...
[20:26:24] <wikibugs>	 (03PS5) 10Herron: kafkamon: add role::kafka::monitoring_buster, assign kafkamon[12]002 [puppet] - 10https://gerrit.wikimedia.org/r/618359 (https://phabricator.wikimedia.org/T252773)
[20:36:22] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[20:38:36] <wikibugs>	 (03PS1) 10RLazarus: cumin: Update wmf_auto_reimage_lib for the new httpbb test layout [puppet] - 10https://gerrit.wikimedia.org/r/618618 (https://phabricator.wikimedia.org/T259665)
[20:39:40] <wikibugs>	 (03CR) 10Herron: "PCC https://puppet-compiler.wmflabs.org/compiler1003/24332/" [puppet] - 10https://gerrit.wikimedia.org/r/618359 (https://phabricator.wikimedia.org/T252773) (owner: 10Herron)
[20:41:00] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/618618 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[20:42:36] <wikibugs>	 (03CR) 10RLazarus: [C: 03+2] cumin: Update wmf_auto_reimage_lib for the new httpbb test layout [puppet] - 10https://gerrit.wikimedia.org/r/618618 (https://phabricator.wikimedia.org/T259665) (owner: 10RLazarus)
[20:42:39] <wikibugs>	 (03CR) 10Dzahn: httpbb: add directory and test file for releases.wm.org (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn)
[20:42:42] <wikibugs>	 (03PS6) 10Dzahn: httpbb: add directory and test file for releases.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652)
[20:44:46] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] "Looks good modulo the nit inline, thanks!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn)
[20:46:13] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[20:46:40] <wikibugs>	 (03PS1) 10Dzahn: hiera: switch releases server to releases1001, remove 1001/2001 [puppet] - 10https://gerrit.wikimedia.org/r/618621 (https://phabricator.wikimedia.org/T247652)
[20:46:41] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[20:47:01] <wikibugs>	 (03PS2) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[20:48:48] <wikibugs>	 (03PS3) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[20:52:54] <wikibugs>	 (03PS4) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[20:56:57] <wikibugs>	 (03PS5) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[21:00:27] <wikibugs>	 (03PS6) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[21:01:15] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[21:01:45] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[21:01:46] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619 (owner: 10Andrew Bogott)
[21:03:07] <wikibugs>	 (03PS7) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[21:04:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619 (owner: 10Andrew Bogott)
[21:05:27] <wikibugs>	 (03PS8) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[21:06:18] <wikibugs>	 (03PS1) 10Ottomata: eventgate - use /v1/_test/events route for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/618624 (https://phabricator.wikimedia.org/T251935)
[21:08:06] <wikibugs>	 (03PS9) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[21:09:46] <wikibugs>	 (03PS2) 10Dzahn: hiera: switch releases server to releases1001, remove 1001/2001 [puppet] - 10https://gerrit.wikimedia.org/r/618621 (https://phabricator.wikimedia.org/T247652)
[21:12:55] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[21:14:19] <wikibugs>	 (03PS10) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[21:15:57] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10User-Urbanecm, 10User-dancy, 10Wiki-Setup (Create): New wiki for fundraising Thank You pages with similar config as donatewiki - https://phabricator.wikimedia.org/T259002 (10Ladsgroup) >>! In T259002#6363413, @DStrine wrote: > @Ladsgroup thanks for the info. I have...
[21:17:20] <wikibugs>	 (03CR) 10Andrew Bogott: "https://puppet-compiler.wmflabs.org/compiler1001/24340/" [puppet] - 10https://gerrit.wikimedia.org/r/618619 (owner: 10Andrew Bogott)
[21:17:32] <wikibugs>	 (03PS1) 10Dave Pifke: arclamp: require python-swiftclient [puppet] - 10https://gerrit.wikimedia.org/r/618626 (https://phabricator.wikimedia.org/T244776)
[21:23:06] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: scs-c1-eqiad CPU usage over 85% - https://phabricator.wikimedia.org/T238036 (10wiki_willy) a:03Cmjohnson
[21:24:32] <wikibugs>	 (03PS11) 10Andrew Bogott: wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619
[21:27:32] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[21:28:20] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] wmcs galera: change our HA approach to primary/backups for db access [puppet] - 10https://gerrit.wikimedia.org/r/618619 (owner: 10Andrew Bogott)
[21:30:55] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1030 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:31:15] <icinga-wm>	 PROBLEM - nova-compute proc minimum on cloudvirt1006 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:31:59] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1030 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:32:02] <icinga-wm>	 PROBLEM - nova-compute proc maximum on cloudvirt1006 is CRITICAL: PROCS CRITICAL: 0 processes with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:32:49] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:33:53] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1030 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:36:19] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[21:36:43] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[21:44:01] <icinga-wm>	 RECOVERY - nova instance creation test on cloudcontrol1003 is OK: PROCS OK: 1 process with command name python, args nova-fullstack https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:52:22] <icinga-wm>	 RECOVERY - nova-compute proc minimum on cloudvirt1006 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:53:12] <icinga-wm>	 RECOVERY - nova-compute proc maximum on cloudvirt1006 is OK: PROCS OK: 1 process with PPID = 1, regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
[21:53:13] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: DC Failover) eqiad: Upgrades of Management Switches - https://phabricator.wikimedia.org/T259758 (10wiki_willy)
[21:53:49] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[21:53:56] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: DC Failover) eqiad: Upgrades of Management Switches - https://phabricator.wikimedia.org/T259758 (10wiki_willy)
[21:55:15] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: DC Failover) eqiad: Upgrades of Management Switches - https://phabricator.wikimedia.org/T259758 (10wiki_willy)
[22:00:53] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: Q2) eqiad: Upgrades of Management Switches - https://phabricator.wikimedia.org/T259758 (10wiki_willy)
[22:02:32] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[22:02:59] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog1001.eqiad.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[22:03:41] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need by: 2020-04-01) rack/setup/install cloudcontrol1005 - https://phabricator.wikimedia.org/T247471 (10bd808) Netbox is showing this host as "staged" rather than "active": https://netbox.wikimedia.org/dcim/devices/2613/
[22:05:16] <wikibugs>	 (03PS1) 10Bstorm: galera: Ease up on replication restrictions since there is one primary [puppet] - 10https://gerrit.wikimedia.org/r/618633
[22:06:43] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/618633 (owner: 10Bstorm)
[22:08:09] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] galera: Ease up on replication restrictions since there is one primary [puppet] - 10https://gerrit.wikimedia.org/r/618633 (owner: 10Bstorm)
[22:17:05] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[22:24:57] <icinga-wm>	 PROBLEM - Host relforge1001.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[22:34:39] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[22:35:09] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[22:39:00] <wikibugs>	 (03PS3) 10Dzahn: hiera: switch releases server to releases1001, remove 1001/2001 [puppet] - 10https://gerrit.wikimedia.org/r/618621 (https://phabricator.wikimedia.org/T247652)
[22:55:38] <wikibugs>	 (03PS7) 10Cwhite: prometheus: puppetized install of prometheus-es-exporter [puppet] - 10https://gerrit.wikimedia.org/r/617260 (https://phabricator.wikimedia.org/T256418)
[22:58:01] <icinga-wm>	 PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[22:58:03] <wikibugs>	 (03CR) 10Cwhite: "fixed hiera lookup issue" [puppet] - 10https://gerrit.wikimedia.org/r/617260 (https://phabricator.wikimedia.org/T256418) (owner: 10Cwhite)
[22:58:23] <icinga-wm>	 PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action={fwd_centrallog1001.eqiad.wmnet:6514,fwd_centrallog2001.codfw.wmnet:6514} https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[23:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: How many deployers does it take to do Evening backport window(Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200805T2300).
[23:02:13] <shdubsh>	 !log logstash in codfw looks stuck -- restarting
[23:02:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:05:34] <wikibugs>	 10Operations, 10MediaWiki-extensions-Score, 10Security-Team, 10Wikimedia-General-or-Unknown, and 3 others: Extension:Score / Lilypond is disabled on all wikis - https://phabricator.wikimedia.org/T257066 (10tstarling) We still can't announce anything since we're waiting for vendor security releases. Third p...
[23:13:52] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+1] eventgate - use /v1/_test/events route for readinessProbe [deployment-charts] - 10https://gerrit.wikimedia.org/r/618624 (https://phabricator.wikimedia.org/T251935) (owner: 10Ottomata)
[23:20:54] <wikibugs>	 (03PS1) 10Bstorm: Disable the mdadm check cron for cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/618638
[23:21:56] <wikibugs>	 (03PS7) 10Dzahn: httpbb: add directory and test file for releases.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652)
[23:22:00] <wikibugs>	 (03CR) 10Dzahn: httpbb: add directory and test file for releases.wm.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn)
[23:24:19] <icinga-wm>	 RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops
[23:24:25] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] httpbb: add directory and test file for releases.wm.org [puppet] - 10https://gerrit.wikimedia.org/r/618415 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn)
[23:26:40] <wikibugs>	 (03CR) 10Bstorm: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/24345/" [puppet] - 10https://gerrit.wikimedia.org/r/618638 (owner: 10Bstorm)
[23:29:02] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] Disable the mdadm check cron for cloudcontrol servers [puppet] - 10https://gerrit.wikimedia.org/r/618638 (owner: 10Bstorm)
[23:32:28] <wikibugs>	 (03CR) 10Andrew Bogott: "retrospective +1" [puppet] - 10https://gerrit.wikimedia.org/r/618638 (owner: 10Bstorm)
[23:36:27] <icinga-wm>	 RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops
[23:52:31] <wikibugs>	 (03CR) 10Dzahn: "after this we are getting warnings "demon present in privileged LDAP group (nda),but not present in data.yaml".  Are there any web UIs tha" [puppet] - 10https://gerrit.wikimedia.org/r/617749 (owner: 10Chad)
[23:52:53] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 53 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:58:47] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 46 probes of 569 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas