[00:37:48] <icinga-wm>	 PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[00:38:57] <wikibugs>	 10Operations, 10MW-on-K8s, 10TechCom-RFC, 10serviceops, 10Patch-For-Review: RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) >>! In T260330#6468248, @Legoktm wrote: > I didn't see any shell pipelines in your caller survey and can't think of...
[00:45:04] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:49:21] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Mailing list for local development discussion - https://phabricator.wikimedia.org/T263216 (10jeena)
[00:50:50] <icinga-wm>	 PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:02:12] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:06:04] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[01:09:30] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:11:24] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:22:40] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 90, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:24:36] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 92, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:26:30] <wikibugs>	 (03PS1) 10Ryan Kemper: cloudelastic: use envoy to mitigate tls latency [puppet] - 10https://gerrit.wikimedia.org/r/628243 (https://phabricator.wikimedia.org/T263073)
[01:26:46] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:28:40] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:29:35] <wikibugs>	 (03CR) 10Ryan Kemper: "OPEN QUESTIONS" [puppet] - 10https://gerrit.wikimedia.org/r/628243 (https://phabricator.wikimedia.org/T263073) (owner: 10Ryan Kemper)
[01:30:22] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 90, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:31:11] <wikibugs>	 (03CR) 10Ryan Kemper: "@Giuseppe:" [puppet] - 10https://gerrit.wikimedia.org/r/628243 (https://phabricator.wikimedia.org/T263073) (owner: 10Ryan Kemper)
[01:32:18] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 92, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:34:28] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:36:26] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:46:06] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:47:48] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 90, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:49:44] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 92, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[01:51:02] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 69 probes of 569 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[01:51:50] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[01:56:58] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 65 probes of 569 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[02:12:34] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 75 probes of 569 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[02:15:20] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[02:16:50] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 90, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:19:12] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[02:24:36] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 92, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:36:12] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 90, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:36:22] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 65 probes of 569 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[02:36:36] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[02:38:34] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[02:42:00] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 92, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[02:46:04] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 67 probes of 569 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[02:57:54] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 64 probes of 569 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[03:05:34] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[03:07:32] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[03:18:04] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 66 probes of 569 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[03:19:08] <icinga-wm>	 PROBLEM - OSPF status on cr2-esams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/3 UP : 4 v2 P2P interfaces vs. 3 v3 P2P interfaces https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:21:04] <icinga-wm>	 RECOVERY - OSPF status on cr2-esams is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[03:21:36] <icinga-wm>	 PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - No response from remote host 91.198.174.244 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[03:22:50] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 90, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:28:38] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 92, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:36:34] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[03:42:20] <icinga-wm>	 PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 90, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:42:28] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[03:46:16] <icinga-wm>	 RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 92, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[03:47:06] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 62 probes of 568 (alerts on 65) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[05:08:19] <wikibugs>	 (03PS1) 10Marostegui: db1131: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/628250 (https://phabricator.wikimedia.org/T262901)
[05:15:52] <marostegui>	 !log Restart wikibugs
[05:15:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:26:11] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12641 and previous config saved to /var/cache/conftool/dbconfig/20200918-052608-marostegui.json
[05:26:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:26:17] <stashbot>	 T261717: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717
[05:28:13] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db1131: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/628250 (https://phabricator.wikimedia.org/T262901) (owner: 10Marostegui)
[05:29:01] <wikibugs>	 (03PS1) 10Marostegui: instances.yaml: Add es2029 and es2030 [puppet] - 10https://gerrit.wikimedia.org/r/628251 (https://phabricator.wikimedia.org/T261717)
[05:30:30] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] instances.yaml: Add es2029 and es2030 [puppet] - 10https://gerrit.wikimedia.org/r/628251 (https://phabricator.wikimedia.org/T261717) (owner: 10Marostegui)
[05:36:04] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Add es2029 and es2030 to dbctl depooled - T261717', diff saved to https://phabricator.wikimedia.org/P12642 and previous config saved to /var/cache/conftool/dbconfig/20200918-053604-marostegui.json
[05:36:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:36:09] <stashbot>	 T261717: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717
[05:36:46] <wikibugs>	 (03CR) 10Effie Mouzeli: "Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/628153 (owner: 10Dzahn)
[05:37:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12643 and previous config saved to /var/cache/conftool/dbconfig/20200918-053758-marostegui.json
[05:38:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:39:17] <wikibugs>	 (03PS1) 10Elukey: profile::hue: add new alarms for Hue 4 [puppet] - 10https://gerrit.wikimedia.org/r/628252
[05:40:24] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::hue: add new alarms for Hue 4 [puppet] - 10https://gerrit.wikimedia.org/r/628252 (owner: 10Elukey)
[05:55:26] <wikibugs>	 (03PS1) 10Elukey: profile::hadoop::common: create the ssl directory if not present [puppet] - 10https://gerrit.wikimedia.org/r/628253
[05:56:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::hadoop::common: create the ssl directory if not present [puppet] - 10https://gerrit.wikimedia.org/r/628253 (owner: 10Elukey)
[05:57:17] <wikibugs>	 (03PS2) 10Elukey: profile::hadoop::common: create the ssl directory if not present [puppet] - 10https://gerrit.wikimedia.org/r/628253
[05:57:50] <wikibugs>	 (03PS6) 10Rosalie Perside (WMDE): Remove $wgExtraLanguageNames from Wikidata and Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620050 (https://phabricator.wikimedia.org/T260118) (owner: 10Guergana Tzatchkova)
[05:58:46] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::hadoop::common: create the ssl directory if not present [puppet] - 10https://gerrit.wikimedia.org/r/628253 (owner: 10Elukey)
[06:01:03] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12644 and previous config saved to /var/cache/conftool/dbconfig/20200918-060103-marostegui.json
[06:01:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:01:13] <stashbot>	 T261717: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717
[06:03:34] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) 05Open→03Resolved Closing per the internal email thread. If this happens again we'll reopen and contact Dell again.
[06:03:40] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: db2127 memory errors - https://phabricator.wikimedia.org/T262247 (10Marostegui) >>! In T262247#6442725, @Papaul wrote: > The log on says "It has been corrected by h/w and requires no further action" so i don't think this will be enough to replace the memory because it is not...
[06:06:33] <wikibugs>	 (03PS1) 10Elukey: role::analytics_test_cluster::hadoop::ui: add missing hiera param [puppet] - 10https://gerrit.wikimedia.org/r/628254
[06:07:24] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1131 after rack move', diff saved to https://phabricator.wikimedia.org/P12645 and previous config saved to /var/cache/conftool/dbconfig/20200918-060724-marostegui.json
[06:07:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:07:38] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] role::analytics_test_cluster::hadoop::ui: add missing hiera param [puppet] - 10https://gerrit.wikimedia.org/r/628254 (owner: 10Elukey)
[06:08:15] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1106 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12646 and previous config saved to /var/cache/conftool/dbconfig/20200918-060815-marostegui.json
[06:08:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:21:27] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12647 and previous config saved to /var/cache/conftool/dbconfig/20200918-062127-marostegui.json
[06:21:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:21:32] <stashbot>	 T261717: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717
[06:39:24] <icinga-wm>	 PROBLEM - kubelet operational latencies on kubernetes2010 is CRITICAL: instance=kubernetes2010.codfw.wmnet https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:42:44] <icinga-wm>	 RECOVERY - kubelet operational latencies on kubernetes2010 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1
[06:48:32] <wikibugs>	 (03PS1) 10Elukey: profile::hue: adjust settings for the new python3.7 alerts (hue 4) [puppet] - 10https://gerrit.wikimedia.org/r/628290
[06:50:08] <icinga-wm>	 PROBLEM - BFD status on cr3-knams is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:50:28] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:50:48] <icinga-wm>	 PROBLEM - OSPF status on cr3-knams is CRITICAL: OSPFv2: 3/4 UP : OSPFv3: 3/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:51:08] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 237, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:52:08] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/628290 (owner: 10Elukey)
[06:52:30] <elukey>	 so the above seems related to the GTT transport from eqiad to knams 
[06:53:04] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:53:13] <elukey>	 but I don't see maintenance scheduled
[06:56:10] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 3/5 UP : OSPFv3: 3/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:57:52] <icinga-wm>	 RECOVERY - BFD status on cr3-knams is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[06:58:08] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:58:10] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[06:58:32] <icinga-wm>	 RECOVERY - OSPF status on cr3-knams is OK: OSPFv2: 4/4 UP : OSPFv3: 4/4 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[07:00:04] <jouncebot>	 Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200918T0700)
[07:00:41] <elukey>	 (discussion about the links down are in #sre)
[07:05:26] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Wikimedia-RU mailing list page has wrong encoding - https://phabricator.wikimedia.org/T135226 (10ssr) Web interface for reading mails. E. g. we take the letter with Cyrillic letters in URL: https://lists.wikimedia.org/pipermail/wikimedia-ru/2020-September/005396.html —...
[07:05:55] <wikibugs>	 (03PS2) 10Muehlenhoff: Add grafana-rw to cache config [puppet] - 10https://gerrit.wikimedia.org/r/627772 (https://phabricator.wikimedia.org/T262512)
[07:07:14] <wikibugs>	 (03CR) 10Muehlenhoff: "The grafana.discovery.wmnet cert was extended with the grafana-rw.w.o altname yesterday." [puppet] - 10https://gerrit.wikimedia.org/r/627772 (https://phabricator.wikimedia.org/T262512) (owner: 10Muehlenhoff)
[07:12:08] <jayme>	 !log draining kubestage1001 for kernel upgrade - T262527
[07:12:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:12:13] <stashbot>	 T262527: Update to kernel 4.19 on kubernetes nodes - https://phabricator.wikimedia.org/T262527
[07:14:53] <XioNoX>	 !log push pfw policies - T263168
[07:14:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:17:21] <moritzm>	 !log installing xdg-utils security updates
[07:17:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:21:59] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::hue: adjust settings for the new python3.7 alerts (hue 4) [puppet] - 10https://gerrit.wikimedia.org/r/628290 (owner: 10Elukey)
[07:36:04] <wikibugs>	 (03PS1) 10Elukey: profile::hadoop::common: add missing config to local puppet ssl [puppet] - 10https://gerrit.wikimedia.org/r/628294
[07:38:07] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::hadoop::common: add missing config to local puppet ssl [puppet] - 10https://gerrit.wikimedia.org/r/628294 (owner: 10Elukey)
[07:38:40] <wikibugs>	 10Operations, 10netops: Set the same OSPF weight on eqiad/codfw wavelenghts - https://phabricator.wikimedia.org/T263230 (10ayounsi) p:05Triage→03High
[07:39:05] <wikibugs>	 (03PS4) 10Gehel: Extracting obvious reporting code to a Reporter class. [software/cumin] - 10https://gerrit.wikimedia.org/r/626660 (https://phabricator.wikimedia.org/T212783)
[07:39:12] <wikibugs>	 10Operations, 10netops: Set the same OSPF weight on eqiad/codfw wavelenghts - https://phabricator.wikimedia.org/T263230 (10ayounsi)
[07:39:14] <wikibugs>	 (03CR) 10Gehel: Extracting obvious reporting code to a Reporter class. (032 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/626660 (https://phabricator.wikimedia.org/T212783) (owner: 10Gehel)
[07:39:28] <wikibugs>	 (03CR) 10Gehel: Extracting obvious reporting code to a Reporter class. (031 comment) [software/cumin] - 10https://gerrit.wikimedia.org/r/626660 (https://phabricator.wikimedia.org/T212783) (owner: 10Gehel)
[07:41:28] <wikibugs>	 (03PS1) 10Elukey: profile::hadoop::common: remove duplicate variable definition [puppet] - 10https://gerrit.wikimedia.org/r/628295
[07:46:05] <wikibugs>	 (03PS2) 10Elukey: profile::hadoop::common: remove duplicate variable definition [puppet] - 10https://gerrit.wikimedia.org/r/628295
[07:46:15] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] profile::hadoop::common: remove duplicate variable definition [puppet] - 10https://gerrit.wikimedia.org/r/628295 (owner: 10Elukey)
[07:49:39] <wikibugs>	 10Operations, 10User-MoritzMuehlenhoff: Review lists of config/sysctl recommendations by "kernel self-protection project" - https://phabricator.wikimedia.org/T142984 (10Aklapper)
[07:53:08] <wikibugs>	 10Operations, 10Citoid, 10Wikimedia-Logstash, 10serviceops, 10Platform Engineering (Icebox): Citoid is logging all request / response headers as separate fields - https://phabricator.wikimedia.org/T239713 (10Aklapper)
[07:53:11] <wikibugs>	 10Operations, 10ops-codfw, 10decommission-hardware: decommission wmf6412 - https://phabricator.wikimedia.org/T261968 (10Aklapper)
[07:55:57] <wikibugs>	 (03PS2) 10Jcrespo: cli: Make /etc/wmfbackups the config dir for the main backup scripts [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/628168 (https://phabricator.wikimedia.org/T138562)
[07:57:34] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10User-jijiki: Put rdb200[78] into service - https://phabricator.wikimedia.org/T255681 (10Aklapper)
[07:58:12] <wikibugs>	 (03PS3) 10Jcrespo: cli: Make /etc/wmfbackups the config dir for the main backup scripts [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/628168 (https://phabricator.wikimedia.org/T138562)
[08:16:36] <klausman>	 !log reinstalling stat1004 with Buster
[08:16:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:23] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2124 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12648 and previous config saved to /var/cache/conftool/dbconfig/20200918-082223-kormat.json
[08:22:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:22:29] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[08:22:42] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:24:36] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:25:28] <jayme>	 !log reboot kubestage1001 for clean state testing - T262527
[08:25:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:25:32] <stashbot>	 T262527: Update to kernel 4.19 on kubernetes nodes - https://phabricator.wikimedia.org/T262527
[08:25:33] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single
[08:25:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:30:26] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[08:30:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:31:16] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[08:40:15] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+2] Use Kernel 4.19 on staging cluster nodes [puppet] - 10https://gerrit.wikimedia.org/r/627868 (https://phabricator.wikimedia.org/T262527) (owner: 10JMeybohm)
[08:43:20] <jayme>	 !log reboot kubestage1001 for kernel upgrade - T262527
[08:43:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:43:25] <stashbot>	 T262527: Update to kernel 4.19 on kubernetes nodes - https://phabricator.wikimedia.org/T262527
[08:43:26] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single
[08:43:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:45:55] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2124 (re)pooling @ 20%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12650 and previous config saved to /var/cache/conftool/dbconfig/20200918-084554-kormat.json
[08:45:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:45:59] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[08:47:54] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[08:47:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:48:22] <wikibugs>	 (03PS1) 10Elukey: Allow port 443 in term apt for analytics-in4 and in6 [homer/public] - 10https://gerrit.wikimedia.org/r/628300
[08:48:36] <elukey>	 XioNoX: --^ if you have a sec
[08:49:22] <XioNoX>	 elukey: what are you pointing to? :)
[08:49:34] <elukey>	 ahhh right you don't see it, I always forget
[08:49:35] <elukey>	 sorry
[08:49:39] <elukey>	 https://gerrit.wikimedia.org/r/628300
[08:49:41] <elukey>	 :)
[08:50:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/628300 (owner: 10Elukey)
[08:51:33] <elukey>	 ok I guess I can proceed, there seems no maintenance on the routers so I am also going to commit 
[08:52:03] <elukey>	 XioNoX: green light? (sorry we are in the middle of a reimage that is stuck :( )
[08:52:24] <XioNoX>	 elukey: one sec
[08:52:47] <XioNoX>	 elukey: +1
[08:53:09] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+1] Allow port 443 in term apt for analytics-in4 and in6 [homer/public] - 10https://gerrit.wikimedia.org/r/628300 (owner: 10Elukey)
[08:53:11] <elukey>	 XioNoX: <3
[08:53:20] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] Allow port 443 in term apt for analytics-in4 and in6 [homer/public] - 10https://gerrit.wikimedia.org/r/628300 (owner: 10Elukey)
[08:54:18] <elukey>	 !log change analytics-in4/in6 filters on cr1/cr2 after https://gerrit.wikimedia.org/r/628300
[08:54:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:54:53] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: openstack: rocky/buster: use more modern netfilter components [puppet] - 10https://gerrit.wikimedia.org/r/627773 (https://phabricator.wikimedia.org/T262979)
[08:55:27] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[08:56:14] <jayme>	 !log reboot kubestage1001 for clean state - T262527
[08:56:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:56:19] <stashbot>	 T262527: Update to kernel 4.19 on kubernetes nodes - https://phabricator.wikimedia.org/T262527
[08:56:21] <logmsgbot>	 !log jayme@cumin1001 START - Cookbook sre.hosts.reboot-single
[08:56:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:57:24] <XioNoX>	 godog: ^
[08:59:56] <wikibugs>	 (03PS3) 10Kormat: bsection: Script for binary-searching log files. [puppet] - 10https://gerrit.wikimedia.org/r/627841
[09:00:47] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC: https://puppet-compiler.wmflabs.org/compiler1002/25184/" [puppet] - 10https://gerrit.wikimedia.org/r/627773 (https://phabricator.wikimedia.org/T262979) (owner: 10Arturo Borrero Gonzalez)
[09:00:48] <logmsgbot>	 !log jayme@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
[09:00:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:00:58] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2124 (re)pooling @ 40%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12651 and previous config saved to /var/cache/conftool/dbconfig/20200918-090058-kormat.json
[09:01:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:01:02] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[09:02:33] <icinga-wm>	 PROBLEM - HTTPS-planet on en.planet.wikimedia.org is CRITICAL: SSL CRITICAL - Certificate *.wikipedia.org valid until 2020-10-18 09:02:07 +0000 (expires in 29 days) https://wikitech.wikimedia.org/wiki/Planet.wikimedia.org
[09:03:25] <icinga-wm>	 PROBLEM - HTTPS-wmfusercontent on phab.wmfusercontent.org is CRITICAL: SSL CRITICAL - Certificate *.wikipedia.org valid until 2020-10-18 09:02:07 +0000 (expires in 29 days) https://phabricator.wikimedia.org/tag/phabricator/
[09:12:57] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: openstack: rocky/buster: also pin other related packages required by modern iptables [puppet] - 10https://gerrit.wikimedia.org/r/628302 (https://phabricator.wikimedia.org/T262979)
[09:13:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] openstack: rocky/buster: also pin other related packages required by modern iptables [puppet] - 10https://gerrit.wikimedia.org/r/628302 (https://phabricator.wikimedia.org/T262979) (owner: 10Arturo Borrero Gonzalez)
[09:14:42] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: openstack: rocky/buster: also pin other related packages required by iptables [puppet] - 10https://gerrit.wikimedia.org/r/628302 (https://phabricator.wikimedia.org/T262979)
[09:16:02] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2124 (re)pooling @ 60%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12652 and previous config saved to /var/cache/conftool/dbconfig/20200918-091601-kormat.json
[09:16:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:16:07] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[09:18:55] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: openstack: rocky/buster: also pin other related packages required by iptables [puppet] - 10https://gerrit.wikimedia.org/r/628302 (https://phabricator.wikimedia.org/T262979)
[09:22:20] <logmsgbot>	 !log klausman@cumin1001 START - Cookbook sre.hosts.downtime
[09:22:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:30] <logmsgbot>	 !log klausman@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[09:24:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:31:05] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2124 (re)pooling @ 80%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12653 and previous config saved to /var/cache/conftool/dbconfig/20200918-093105-kormat.json
[09:31:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:31:10] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[09:34:43] <wikibugs>	 (03PS4) 10Arturo Borrero Gonzalez: openstack: rocky/buster: fixes for iptables updates [puppet] - 10https://gerrit.wikimedia.org/r/628302 (https://phabricator.wikimedia.org/T262979)
[09:43:47] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC as expected: https://puppet-compiler.wmflabs.org/compiler1001/25189/" [puppet] - 10https://gerrit.wikimedia.org/r/628302 (https://phabricator.wikimedia.org/T262979) (owner: 10Arturo Borrero Gonzalez)
[09:45:58] <wikibugs>	 (03PS1) 10Effie Mouzeli: push-notifications: deploy to production environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/628304 (https://phabricator.wikimedia.org/T256973)
[09:46:08] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12654 and previous config saved to /var/cache/conftool/dbconfig/20200918-094608-kormat.json
[09:46:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:46:14] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[09:46:57] <jayme>	 !log uncordoned kubestage1001 - T262527
[09:47:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:47:00] <stashbot>	 T262527: Update to kernel 4.19 on kubernetes nodes - https://phabricator.wikimedia.org/T262527
[09:47:08] <jayme>	 !log deleting some random pods in kubernetes staging to rebalance load back on kubestage1001 - T262527
[09:47:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:47:46] <twentyafterfour>	 !log deployed hotfix for T263063 to phab1001
[09:47:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:47:50] <stashbot>	 T263063: Phabricator global search: "Cannot use object of type PhutilSafeHTML as array" error for certain strings - https://phabricator.wikimedia.org/T263063
[09:48:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] push-notifications: deploy to production environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/628304 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[09:50:46] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+1] urldownloader: convert A record to CNAME (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/628102 (https://phabricator.wikimedia.org/T244153) (owner: 10Volans)
[09:53:51] <wikibugs>	 (03PS2) 10Effie Mouzeli: push-notifications: deploy to production environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/628304 (https://phabricator.wikimedia.org/T256973)
[09:55:29] <logmsgbot>	 !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime
[09:55:30] <logmsgbot>	 !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[09:55:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:55:55] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2087:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12655 and previous config saved to /var/cache/conftool/dbconfig/20200918-095554-kormat.json
[09:55:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:00] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[09:57:06] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] api-gateway: allow mwdebug hosts in calico [deployment-charts] - 10https://gerrit.wikimedia.org/r/628127 (https://phabricator.wikimedia.org/T262396) (owner: 10Hnowlan)
[09:59:10] <wikibugs>	 (03Merged) 10jenkins-bot: api-gateway: allow mwdebug hosts in calico [deployment-charts] - 10https://gerrit.wikimedia.org/r/628127 (https://phabricator.wikimedia.org/T262396) (owner: 10Hnowlan)
[10:03:07] <wikibugs>	 (03CR) 10Effie Mouzeli: [V: 03+2] "We have a bug that sometimes causes build failures, it built fine here: https://integration.wikimedia.org/ci/job/helm-lint/2574/console" [deployment-charts] - 10https://gerrit.wikimedia.org/r/628304 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[10:10:13] <wikibugs>	 (03PS1) 10Kormat: Exclude /cover dir from debian source tarball. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/628306
[10:11:33] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] "LGTM" [deployment-charts] - 10https://gerrit.wikimedia.org/r/628304 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[10:11:35] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12656 and previous config saved to /var/cache/conftool/dbconfig/20200918-101135-kormat.json
[10:11:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:40] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[10:12:05] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Exclude /cover dir from debian source tarball. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/628306 (owner: 10Kormat)
[10:13:06] <wikibugs>	 (03Merged) 10jenkins-bot: Exclude /cover dir from debian source tarball. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/628306 (owner: 10Kormat)
[10:13:24] <wikibugs>	 10Operations, 10Domains, 10Traffic: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10Ijon) Given the answer above, can progress be made on this? @croslof, @akosiaris
[10:18:16] <wikibugs>	 10Operations, 10Domains, 10Traffic: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10jcrespo) For the SRE side, regarding DNS, @bblack is probably the best person to be notified here.
[10:24:19] <wikibugs>	 (03PS1) 10Kormat: Prepare for 0.5 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/628307
[10:25:52] <nemo-yiannis>	 Hey effie! Regarding https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/628304/ can we also bump the version of the image after a couple of patches we merged?
[10:26:36] <effie>	 whatever works for you, we only want the -production for the version that will end up in production 
[10:26:39] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12657 and previous config saved to /var/cache/conftool/dbconfig/20200918-102638-kormat.json
[10:26:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:26:47] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[10:27:16] <nemo-yiannis>	 cool
[10:27:18] <effie>	 nemo-yiannis: since there is too much bot traffic here, mind if we move this to #mediawiki-serviceops?
[10:27:32] <effie>	 it will be easier to offer future support there as well 
[10:27:40] * kormat hugs the bots. please don't mind effie
[10:27:46] <nemo-yiannis>	 sure
[10:27:51] <effie>	 tx tx 
[10:28:15] <logmsgbot>	 !log hnowlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
[10:28:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:29:09] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[10:30:59] <wikibugs>	 (03CR) 10Effie Mouzeli: [V: 03+2 C: 03+2] push-notifications: deploy to production environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/628304 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[10:31:57] <logmsgbot>	 !log hnowlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
[10:32:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:33:25] <wikibugs>	 (03PS1) 10Jgiannelos: Bump push-notifications image to version 2020-09-17-171128-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/628309
[10:33:31] <wikibugs>	 (03Merged) 10jenkins-bot: push-notifications: deploy to production environment [deployment-charts] - 10https://gerrit.wikimedia.org/r/628304 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[10:34:23] <logmsgbot>	 !log hnowlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
[10:34:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:34:26] <wikibugs>	 (03PS2) 10Kormat: Prepare for 0.5 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/628307
[10:35:51] <logmsgbot>	 !log jiji@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
[10:35:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:38:52] <wikibugs>	 (03CR) 10Kormat: [C: 03+2] Prepare for 0.5 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/628307 (owner: 10Kormat)
[10:39:48] <wikibugs>	 (03Merged) 10jenkins-bot: Prepare for 0.5 release. [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/628307 (owner: 10Kormat)
[10:39:58] <wikibugs>	 (03CR) 10Jgiannelos: [C: 04-1] Bump push-notifications image to version 2020-09-17-171128-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/628309 (owner: 10Jgiannelos)
[10:41:42] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12658 and previous config saved to /var/cache/conftool/dbconfig/20200918-104141-kormat.json
[10:41:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:41:47] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[10:44:54] <wikibugs>	 (03CR) 10Jcrespo: "yay!!! :-) Thank you!" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/628307 (owner: 10Kormat)
[10:45:12] <logmsgbot>	 !log jiji@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
[10:45:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:48:41] <wikibugs>	 (03PS2) 10Jgiannelos: Bump push-notifications image to version 2020-09-17-171128-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/628309
[10:50:27] <wikibugs>	 (03PS3) 10Jgiannelos: Bump push-notifications image to version 2020-09-18-103454-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/628309
[10:56:45] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12659 and previous config saved to /var/cache/conftool/dbconfig/20200918-105645-kormat.json
[10:56:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:56:52] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[11:12:40] <wikibugs>	 (03CR) 10Jgiannelos: [C: 03+2] Bump push-notifications image to version 2020-09-18-103454-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/628309 (owner: 10Jgiannelos)
[11:12:44] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[11:15:00] <wikibugs>	 (03Merged) 10jenkins-bot: Bump push-notifications image to version 2020-09-18-103454-publish [deployment-charts] - 10https://gerrit.wikimedia.org/r/628309 (owner: 10Jgiannelos)
[11:15:29] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2089:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12660 and previous config saved to /var/cache/conftool/dbconfig/20200918-111529-kormat.json
[11:15:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:15:34] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[11:33:51] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[11:35:09] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2125', diff saved to https://phabricator.wikimedia.org/P12661 and previous config saved to /var/cache/conftool/dbconfig/20200918-113509-marostegui.json
[11:35:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:38:24] <icinga-wm>	 RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[11:39:51] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 240, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[11:47:16] <wikibugs>	 (03PS1) 10Marostegui: db2125: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/628312 (https://phabricator.wikimedia.org/T263244)
[11:48:00] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db2125: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/628312 (https://phabricator.wikimedia.org/T263244) (owner: 10Marostegui)
[11:54:37] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12662 and previous config saved to /var/cache/conftool/dbconfig/20200918-115437-kormat.json
[11:54:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:54:42] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[11:56:49] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[11:57:14] <librenms-wmf>	 04Critical Alert for device cr2-codfw.wikimedia.org - Primary outbound port utilisation over 80%  #page
[11:58:43] <wikibugs>	 10Operations, 10netops: Upgrade Fastnetmon to 1.1.7 - https://phabricator.wikimedia.org/T257035 (10MoritzMuehlenhoff) I have upgraded the existing 1.1.4 package to 1.1.7, I needed to fix up all patches for 1.1.7 (except one for luajit, which was obsoleted by upstream dropping support for luajit in 1.1.5 "Disab...
[12:01:30] <icinga-wm>	 PROBLEM - LibreNMS has a critical alert #page on icinga1001 is CRITICAL: Primary outbound port utilisation over 80% #page (cr2-codfw.wikimedia.org) https://bit.ly/wmf-librenms
[12:03:14] <librenms-wmf>	 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-codfw.wikimedia.org recovered from Primary outbound port utilisation over 80%  #page
[12:03:26] <icinga-wm>	 RECOVERY - LibreNMS has a critical alert #page on icinga1001 is OK: OK: zero critical LibreNMS alerts https://bit.ly/wmf-librenms
[12:05:51] <wikibugs>	 (03PS1) 10Gehel: Enable cumin EventHandler to disable output. [software/cumin] - 10https://gerrit.wikimedia.org/r/628315 (https://phabricator.wikimedia.org/T212783)
[12:08:24] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Enable cumin EventHandler to disable output. [software/cumin] - 10https://gerrit.wikimedia.org/r/628315 (https://phabricator.wikimedia.org/T212783) (owner: 10Gehel)
[12:08:25] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10Patch-For-Review, and 2 others: Beta cluster is down - https://phabricator.wikimedia.org/T178841 (10hashar)
[12:09:22] <wikibugs>	 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Traffic, 10Release-Engineering-Team (Other / Uncategorized): Investigate what caused the unattended varnish upgrade in Beta Cluster - https://phabricator.wikimedia.org/T179197 (10hashar) 05Open→03Declined Not much we can...
[12:09:41] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12663 and previous config saved to /var/cache/conftool/dbconfig/20200918-120940-kormat.json
[12:09:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:09:46] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[12:13:28] <wikibugs>	 (03CR) 10Ema: [C: 03+1] "Excellent!" [puppet] - 10https://gerrit.wikimedia.org/r/627772 (https://phabricator.wikimedia.org/T262512) (owner: 10Muehlenhoff)
[12:20:27] <wikibugs>	 (03PS2) 10Gehel: Enable cumin EventHandler to disable output. [software/cumin] - 10https://gerrit.wikimedia.org/r/628315 (https://phabricator.wikimedia.org/T212783)
[12:23:13] <librenms-wmf>	 04Critical Alert for device cr2-eqiad.wikimedia.org - Primary inbound port utilisation over 80%  #page
[12:24:44] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12664 and previous config saved to /var/cache/conftool/dbconfig/20200918-122444-kormat.json
[12:24:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:24:50] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[12:26:46] <icinga-wm>	 PROBLEM - LibreNMS has a critical alert #page on icinga1001 is CRITICAL: Primary inbound port utilisation over 80% #page (cr2-eqiad.wikimedia.org) https://bit.ly/wmf-librenms
[12:26:59] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[12:28:14] <librenms-wmf>	 04Critical Alert for device cr2-codfw.wikimedia.org - Primary outbound port utilisation over 80%  #page
[12:28:48] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Mojibake on Mailman - https://phabricator.wikimedia.org/T263248 (10jhsoby)
[12:28:51] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Mojibake on Mailman - https://phabricator.wikimedia.org/T263248 (10jhsoby) @Ladsgroup I see in T52864 that you're involved in upgrading the lists. Do you have any idea what's causing this?
[12:39:48] <logmsgbot>	 !log kormat@cumin1001 dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12665 and previous config saved to /var/cache/conftool/dbconfig/20200918-123947-kormat.json
[12:39:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:39:53] <stashbot>	 T259831: Schema change to make change_tag.ct_rc_id unsigned - https://phabricator.wikimedia.org/T259831
[12:41:08] <kormat>	 !log reimaging db2125 T263244
[12:41:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:41:12] <stashbot>	 T263244: Reimage and reclone db2125 - https://phabricator.wikimedia.org/T263244
[12:45:39] <XioNoX>	 I'm disabling the inbound/outbound port utilization alert
[12:45:46] <XioNoX>	 the circuit is flirting with the 8Gbps
[12:46:11] <cdanis>	 XioNoX: I'm about to take several gpbs of traffic off the link
[12:46:14] <icinga-wm>	 RECOVERY - LibreNMS has a critical alert #page on icinga1001 is OK: OK: zero critical LibreNMS alerts https://bit.ly/wmf-librenms
[12:47:27] <XioNoX>	 cdanis: re-enabling Swift in eqiad?
[12:47:46] <ema>	 XioNoX: yes
[12:48:59] <logmsgbot>	 !log cdanis@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
[12:49:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:00:11] <logmsgbot>	 !log kormat@cumin2001 START - Cookbook sre.hosts.downtime
[13:00:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:02:27] <logmsgbot>	 !log kormat@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[13:02:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:07:48] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] remote_backup: Instead of using a preassigned port, autoselect one [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/626172 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[13:08:07] <wikibugs>	 (03CR) 10Jcrespo: "This change is ready for review." [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/623756 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo)
[13:08:24] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Add WMFBackup package creation [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/623756 (https://phabricator.wikimedia.org/T165358) (owner: 10Jcrespo)
[13:08:35] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] cli: Make /etc/wmfbackups the config dir for the main backup scripts [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/628168 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo)
[13:12:11] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[13:15:46] <icinga-wm>	 PROBLEM - k8s API server requests latencies on argon is CRITICAL: instance=10.64.32.133 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[13:17:09] <icinga-wm>	 RECOVERY - k8s API server requests latencies on argon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[13:24:32] <wikibugs>	 10Operations: Integrate Stretch 9.13 point update - https://phabricator.wikimedia.org/T258407 (10MoritzMuehlenhoff)
[13:33:13] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[13:47:57] <wikibugs>	 (03PS1) 10Elukey: profile::hadoop::spark2: use the package resource instead of require_package() [puppet] - 10https://gerrit.wikimedia.org/r/628330 (https://phabricator.wikimedia.org/T255028)
[13:50:43] <icinga-wm>	 PROBLEM - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[13:55:42] <wikibugs>	 10Operations, 10Community-Tech, 10MediaWiki-extensions-PageAssessments, 10Performance Issue: Issues with purgeUnusedProjects.php cron job on mwmaint1002  (Fri Oct 26, 2018) - https://phabricator.wikimedia.org/T208231 (10Aklapper)
[13:56:15] <icinga-wm>	 ACKNOWLEDGEMENT - Thanos query has high gRPC client errors on icinga1001 is CRITICAL: job=thanos-query Herron prometheus5001 being prepped for cutover (not yet in production) https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[13:56:15] <icinga-wm>	 ACKNOWLEDGEMENT - Thanos sidecar cannot connect to Prometheus on icinga1001 is CRITICAL: cluster=prometheus instance=prometheus5001 job=thanos-sidecar prometheus=ops site=eqsin Herron prometheus5001 being prepped for cutover (not yet in production) https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar
[14:01:55] <wikibugs>	 (03PS1) 10Effie Mouzeli: push-notifications: enable egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/628336 (https://phabricator.wikimedia.org/T256973)
[14:02:13] <icinga-wm>	 PROBLEM - Prometheus prometheus5001/ops restarted: beware possible monitoring artifacts on prometheus5001 is CRITICAL: instance=127.0.0.1 job=prometheus https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqsin+prometheus/ops
[14:03:13] <wikibugs>	 (03PS1) 10Ladsgroup: labs: Turn on termbox v2 on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628337 (https://phabricator.wikimedia.org/T261488)
[14:03:29] <icinga-wm>	 RECOVERY - Thanos sidecar cannot connect to Prometheus on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/b19644bfbf0ec1e108027cce268d99f7/thanos-sidecar
[14:03:32] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/25191/" [puppet] - 10https://gerrit.wikimedia.org/r/628330 (https://phabricator.wikimedia.org/T255028) (owner: 10Elukey)
[14:04:07] <icinga-wm>	 RECOVERY - Thanos query has high gRPC client errors on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/af36c91291a603f1d9fbdabdd127ac4a/thanos-query
[14:04:10] <wikibugs>	 (03CR) 10JMeybohm: [C: 03+1] push-notifications: enable egress [deployment-charts] - 10https://gerrit.wikimedia.org/r/628336 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[14:05:15] <wikibugs>	 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10serviceops, and 2 others: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10Mholloway) What will be the internal URL for this service?  I am guessing `https://push-notifications....
[14:05:53] <wikibugs>	 (03CR) 10Ladsgroup: [C: 03+2] "noop for production" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628337 (https://phabricator.wikimedia.org/T261488) (owner: 10Ladsgroup)
[14:06:38] <wikibugs>	 (03Merged) 10jenkins-bot: labs: Turn on termbox v2 on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628337 (https://phabricator.wikimedia.org/T261488) (owner: 10Ladsgroup)
[14:07:32] <wikibugs>	 (03PS1) 10Hashar: gerrit: dump heap on out of memory error [puppet] - 10https://gerrit.wikimedia.org/r/628338 (https://phabricator.wikimedia.org/T263008)
[14:08:43] <Amir1>	 ^ rebased on deploy1001
[14:10:50] <wikibugs>	 (03PS5) 10Muehlenhoff: Manage /etc/apt/sources.list via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562)
[14:10:52] <wikibugs>	 (03PS6) 10Muehlenhoff: Manage /etc/apt/sources.list via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562)
[14:12:00] <wikibugs>	 (03PS2) 10Hashar: gerrit: dump heap on out of memory error [puppet] - 10https://gerrit.wikimedia.org/r/628338 (https://phabricator.wikimedia.org/T263008)
[14:13:14] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 01m 00s)
[14:13:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:19] <stashbot>	 T261488: trial new termbox on desktop on a test system - https://phabricator.wikimedia.org/T261488
[14:13:27] <wikibugs>	 10Operations, 10netops: cr1-codfw<->cr1-eqiad link saturation - https://phabricator.wikimedia.org/T263206 (10CDanis) for posterity: repooling swift@eqiad took 3.5Gbit/s off of the codfw->eqiad path.  there's a much longer discussion (recorded in #wikimedia-sre logs) about discussing edge-egress-to-backhaul byt...
[14:13:30] <wikibugs>	 (03CR) 10Hashar: "Note: needs the service to be restarted after puppet ran." [puppet] - 10https://gerrit.wikimedia.org/r/628338 (https://phabricator.wikimedia.org/T263008) (owner: 10Hashar)
[14:13:36] <cdanis>	 wikibugs is like
[14:13:39] <cdanis>	 10 minutes behind the times??
[14:13:47] <cdanis>	 12
[14:15:04] <wikibugs>	 (03PS7) 10Hnowlan: api-gateway: migrate to new helmfile format [deployment-charts] - 10https://gerrit.wikimedia.org/r/627250
[14:15:37] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/Wikibase.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 00m 56s)
[14:15:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:15:56] <marostegui>	 cdanis: yeah, it has been misbehaving lately, I had to restart it yesterday and today
[14:19:46] <wikibugs>	 (03PS1) 10MSantos: push-notifications: change version tag to -production [deployment-charts] - 10https://gerrit.wikimedia.org/r/628340 (https://phabricator.wikimedia.org/T256973)
[14:20:51] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] api-gateway: migrate to new helmfile format [deployment-charts] - 10https://gerrit.wikimedia.org/r/627250 (owner: 10Hnowlan)
[14:22:17] <wikibugs>	 (03CR) 10Papaul: [C: 03+2] maps: add partman configuration for newer maps servers. [puppet] - 10https://gerrit.wikimedia.org/r/628089 (https://phabricator.wikimedia.org/T260271) (owner: 10Gehel)
[14:22:31] <wikibugs>	 (03CR) 10Ppchelko: "woohooo! :)" [deployment-charts] - 10https://gerrit.wikimedia.org/r/627250 (owner: 10Hnowlan)
[14:22:34] <wikibugs>	 (03PS2) 10Papaul: maps: add partman configuration for newer maps servers. [puppet] - 10https://gerrit.wikimedia.org/r/628089 (https://phabricator.wikimedia.org/T260271) (owner: 10Gehel)
[14:22:41] <wikibugs>	 (03CR) 10Papaul: [V: 03+2 C: 03+2] maps: add partman configuration for newer maps servers. [puppet] - 10https://gerrit.wikimedia.org/r/628089 (https://phabricator.wikimedia.org/T260271) (owner: 10Gehel)
[14:23:17] <wikibugs>	 (03Merged) 10jenkins-bot: api-gateway: migrate to new helmfile format [deployment-charts] - 10https://gerrit.wikimedia.org/r/627250 (owner: 10Hnowlan)
[14:25:38] <wikibugs>	 10Operations, 10ops-codfw, 10serviceops: mw2256 went down with thermal issues / fail-safe voltage is out of range - https://phabricator.wikimedia.org/T263022 (10Papaul) 05Open→03Declined Declined since it is a duplicate.
[14:25:42] <wikibugs>	 10Operations, 10ops-codfw: ps1-a8-codfw WebUI unresponsive - https://phabricator.wikimedia.org/T263001 (10Papaul) working on getting a local FTP to upload the firmware to the PDU. It will be one sometimes next week
[14:27:35] <wikibugs>	 (03PS1) 10Mholloway: Echo: Set up the push notifier type [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628341 (https://phabricator.wikimedia.org/T262936)
[14:27:37] <wikibugs>	 (03PS1) 10Mholloway: Echo: Enable push on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628342 (https://phabricator.wikimedia.org/T262936)
[14:27:39] <wikibugs>	 (03PS1) 10Mholloway: Echo: Enable push on all Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628343 (https://phabricator.wikimedia.org/T262936)
[14:28:27] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Echo: Enable push on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628342 (https://phabricator.wikimedia.org/T262936) (owner: 10Mholloway)
[14:28:30] <wikibugs>	 (03CR) 10Mholloway: [C: 04-2] "Hold for scheduled deployment window" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628341 (https://phabricator.wikimedia.org/T262936) (owner: 10Mholloway)
[14:28:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Echo: Enable push on all Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628343 (https://phabricator.wikimedia.org/T262936) (owner: 10Mholloway)
[14:28:36] <wikibugs>	 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10serviceops, and 2 others: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10jijiki) @Mholloway it will be accessible on Monday after we deploy the LVS/DNS patches. Meanwhile you...
[14:28:38] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: db2127 memory errors - https://phabricator.wikimedia.org/T262247 (10Papaul) @Marostegui any day that works for you works for me as well
[14:28:46] <wikibugs>	 (03CR) 10Mholloway: [C: 04-2] "Hold for scheduled deployment window" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628342 (https://phabricator.wikimedia.org/T262936) (owner: 10Mholloway)
[14:28:53] <wikibugs>	 (03CR) 10Mholloway: [C: 04-2] "Hold for scheduled deployment window" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628343 (https://phabricator.wikimedia.org/T262936) (owner: 10Mholloway)
[14:28:58] <wikibugs>	 (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/25190/" [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562) (owner: 10Muehlenhoff)
[14:29:59] <wikibugs>	 (03PS7) 10Muehlenhoff: Manage /etc/apt/sources.list via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/626693 (https://phabricator.wikimedia.org/T156562)
[14:30:53] <icinga-wm>	 RECOVERY - Prometheus prometheus5001/ops restarted: beware possible monitoring artifacts on prometheus5001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_was_restarted https://grafana.wikimedia.org/d/000000271/prometheus-stats?var-datasource=eqsin+prometheus/ops
[14:31:52] <wikibugs>	 10Operations, 10ops-codfw, 10DBA: db2127 memory errors - https://phabricator.wikimedia.org/T262247 (10Marostegui) Thank you @Papaul - I will have it ready by Monday
[14:34:48] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps, 10Patch-For-Review: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` maps2005.codfw.wmnet ` The l...
[14:38:08] <wikibugs>	 (03PS2) 10Mholloway: Echo: Set up common push settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628341 (https://phabricator.wikimedia.org/T262936)
[14:38:10] <wikibugs>	 (03PS2) 10Mholloway: Echo: Enable push on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628342 (https://phabricator.wikimedia.org/T262936)
[14:38:12] <wikibugs>	 (03PS2) 10Mholloway: Echo: Enable push on all Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628343 (https://phabricator.wikimedia.org/T262936)
[14:38:41] <wikibugs>	 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10serviceops, and 2 others: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10MSantos)
[14:42:17] <wikibugs>	 (03CR) 10Mholloway: [C: 04-2] Echo: Set up common push settings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/628341 (https://phabricator.wikimedia.org/T262936) (owner: 10Mholloway)
[14:44:51] <wikibugs>	 10Operations, 10ops-eqiad: Check jumbo1008.eqiad.wmnet PSU redundancy reported as critical - https://phabricator.wikimedia.org/T263262 (10klausman)
[14:45:49] <wikibugs>	 10Operations, 10ops-eqiad: Check jumbo1008.eqiad.wmnet PSU redundancy reported as critical - https://phabricator.wikimedia.org/T263262 (10klausman)
[14:47:05] <wikibugs>	 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10serviceops, and 2 others: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10jijiki) On Monday (EU) morning, @JMeybohm and I will push the LVS/DNS patches, so everything will be r...
[14:52:37] <icinga-wm>	 PROBLEM - Check whether microcode mitigations for CPU vulnerabilities are applied on stat1004 is CRITICAL: CRITICAL - Server is missing the following CPU flags: {md_clear, flush_l1d} https://wikitech.wikimedia.org/wiki/Microcode
[14:53:31] <wikibugs>	 (03CR) 10Muehlenhoff: "Obsolete since an-tool1009 has been moved to CAS instead." [puppet] - 10https://gerrit.wikimedia.org/r/617385 (owner: 10Muehlenhoff)
[14:53:43] <wikibugs>	 (03Abandoned) 10Muehlenhoff: Enable CAS for Hue [puppet] - 10https://gerrit.wikimedia.org/r/617385 (owner: 10Muehlenhoff)
[14:53:56] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps, 10Patch-For-Review: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['maps2005.codfw.wmnet'] `  Of which those **FAILED**: ` ['maps2005.codfw.wmne...
[14:54:14] <wikibugs>	 (03PS2) 10Muehlenhoff: Retire stub firejail code in service::uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/622350
[14:57:29] <wikibugs>	 (03PS1) 10Gehel: maps: fix typo in glob exrpession for maps netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/628348 (https://phabricator.wikimedia.org/T260271)
[14:57:46] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] gerrit: dump heap on out of memory error [puppet] - 10https://gerrit.wikimedia.org/r/628338 (https://phabricator.wikimedia.org/T263008) (owner: 10Hashar)
[14:58:12] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Mojibake on Mailman - https://phabricator.wikimedia.org/T263248 (10Aklapper)
[14:58:15] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode - https://phabricator.wikimedia.org/T261031 (10Aklapper)
[14:59:26] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/628348 (https://phabricator.wikimedia.org/T260271) (owner: 10Gehel)
[15:00:28] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps, 10Patch-For-Review: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10Papaul) Still having partman recipe problem maybe because of this line  ` maps[12]00[1-4]*) echo partman/standard.cfg partman/raid10-...
[15:01:13] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode - https://phabricator.wikimedia.org/T261031 (10Aklapper)
[15:02:38] <mutante>	 jouncebot: next
[15:02:38] <jouncebot>	 In 15 hour(s) and 57 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200919T0700)
[15:05:41] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[15:08:44] <wikibugs>	 10Operations, 10OTRS, 10serviceops, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10JGHowes) Please restore the color highlighting as was in the previous OTRS version. It's removal from OTRS 6.0 makes it more difficult to...
[15:09:07] <mutante>	 !log restarting gerrit service to apply gerrit::628338 to make it dump heap if out of memory (T263008)
[15:09:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:09:13] <stashbot>	 T263008: Gerrit out of heap - https://phabricator.wikimedia.org/T263008
[15:10:21] <mutante>	 gerrit back
[15:10:53] <mutante>	 should now dump heap in /srv/gerrit if it runs out of memory again
[15:13:01] <wikibugs>	 (03CR) 10Dzahn: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/628338 (https://phabricator.wikimedia.org/T263008) (owner: 10Hashar)
[15:21:09] <wikibugs>	 (03CR) 10Elukey: "Answering to comments and sending the first set of fixes, will also check the other ones pointed out in the previous comment from Riccardo" (038 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/626380 (https://phabricator.wikimedia.org/T257905) (owner: 10Elukey)
[15:21:35] <wikibugs>	 (03PS2) 10Elukey: Add basic debian packaging [software/pywmflib] - 10https://gerrit.wikimedia.org/r/626380 (https://phabricator.wikimedia.org/T257905)
[15:22:05] <wikibugs>	 (03PS2) 10Gehel: maps: fix typo in glob exrpession for maps netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/628348 (https://phabricator.wikimedia.org/T260271)
[15:22:49] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] maps: fix typo in glob exrpession for maps netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/628348 (https://phabricator.wikimedia.org/T260271) (owner: 10Gehel)
[15:26:01] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Mojibake on Mailman - https://phabricator.wikimedia.org/T263248 (10Dzahn) >  the problem arose some time between May 2019 and August 2020 – I wish I could be more specific.  This would match the upgrading of the mailman server to the newer Debian distro versio...
[15:28:00] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode - https://phabricator.wikimedia.org/T261031 (10Dzahn) imported comment from T263248:  >  the problem arose some time between May 2019 and August 2020 – I...
[15:31:23] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 04-2] "Let's wait and make sure we need this before merging" [deployment-charts] - 10https://gerrit.wikimedia.org/r/628336 (https://phabricator.wikimedia.org/T256973) (owner: 10Effie Mouzeli)
[15:32:11] <wikibugs>	 10Operations, 10OTRS, 10serviceops, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10NoFWDaddress) @JGHowes please sse T263243.  Note that the new version was in test for all before the upgrade and that this issue could hav...
[15:42:19] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists, 10I18n: Several unreadable mailing list descriptions (Mojibake) due to wrong charset encodings, should be Unicode - https://phabricator.wikimedia.org/T261031 (10Aklapper) Wondering if backporting https://gitlab.com/mailman/mailman/-/commit/761c268bb7c7c7b91d3f962e5ca4...
[15:50:46] <wikibugs>	 10Operations, 10OTRS, 10serviceops, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10akosiaris) >>! In T187984#6474618, @JGHowes wrote: > Please restore the color highlighting as was in the previous OTRS version. It's remov...
[16:12:22] <wikibugs>	 10Operations, 10OTRS, 10serviceops, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10jcrespo) > that will probably not be developed  As a small correction, instead of "that will probably not be developed" something more lik...
[16:16:36] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` maps2005.codfw.wmnet ` The log can be found in `/v...
[16:21:30] <logmsgbot>	 !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime
[16:21:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:22:46] <wikibugs>	 10Operations, 10OTRS, 10serviceops, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10NoFWDaddress) @jcrespo : By experience, those kind of features will not see light in our era for OTRS since more urgent "features" (like T...
[16:22:49] <wikibugs>	 10Operations, 10OTRS, 10serviceops, 10Patch-For-Review, 10User-notice: Update OTRS to the latest stable version (6.0.x) - https://phabricator.wikimedia.org/T187984 (10jcrespo) Yeah, not disagreeing, in fact supporting that ticket. My stress was on that it was not Alex's decision to remove it. :-) Cheers.
[16:23:34] <logmsgbot>	 !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
[16:23:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:02] <wikibugs>	 (03Abandoned) 10Hnowlan: api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622799 (https://phabricator.wikimedia.org/T235277) (owner: 10Hnowlan)
[16:45:41] <wikibugs>	 (03PS1) 10Hnowlan: api-gateway: remove straggler config. [deployment-charts] - 10https://gerrit.wikimedia.org/r/628403
[16:50:47] <wikibugs>	 (03CR) 10Hnowlan: [C: 03+2] api-gateway: remove straggler config. [deployment-charts] - 10https://gerrit.wikimedia.org/r/628403 (owner: 10Hnowlan)
[16:52:55] <wikibugs>	 (03Merged) 10jenkins-bot: api-gateway: remove straggler config. [deployment-charts] - 10https://gerrit.wikimedia.org/r/628403 (owner: 10Hnowlan)
[16:56:48] <wikibugs>	 (03PS14) 10Dzahn: prometheus: replace remaining hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/623666
[16:58:33] <wikibugs>	 (03PS2) 10Dzahn: openstack: replace hiera() with lookup() [puppet] - 10https://gerrit.wikimedia.org/r/627966 (https://phabricator.wikimedia.org/T209953)
[17:00:52] <wikibugs>	 (03CR) 10Elukey: "So without the dh_auto_clean override I get:" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/626380 (https://phabricator.wikimedia.org/T257905) (owner: 10Elukey)
[17:03:03] <rzl>	 👋 just got a librenms page, I guess it's the one from yesterday and the ack expired
[17:07:02] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Allow Nicholas Skaggs to issue icinga commands - https://phabricator.wikimedia.org/T263191 (10Dzahn) This is usually done without separate access request for all users who have root shell. The difference here would just be "prod root" vs. "wmcs / cloud...
[17:09:51] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` maps2005.codfw.wmnet ` The log can be found in `/v...
[17:11:55] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Mailing list for local development discussion - https://phabricator.wikimedia.org/T263216 (10Dzahn) //You have successfully created the mailing list local-dev and notification has been sent to the list owner jhuneidi@wikimedia.org. You can now://  [[ https://lists.wikim...
[17:13:52] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['maps2005.codfw.wmnet'] `  and were **ALL** successful.
[17:15:41] <mutante>	 !log lists1001 - apt-get install pwgen to generate passwords (this was installed on previous list server but apparently not puppetized, puppet patch coming up)
[17:15:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:58] <wikibugs>	 (03PS1) 10Hnowlan: api-gateway: add routing for static and other components [deployment-charts] - 10https://gerrit.wikimedia.org/r/628408 (https://phabricator.wikimedia.org/T263045)
[17:22:27] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Mailing list for local development discussion - https://phabricator.wikimedia.org/T263216 (10Dzahn) Hi @jeena @bbearnes,  the new list has been created. The thing is that at list creation you can only enter a single initial admin (Jeena).  So what I did was:  - create n...
[17:23:43] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Mailing list for local development discussion - https://phabricator.wikimedia.org/T263216 (10Dzahn) 05Open→03Resolved a:03Dzahn
[17:24:54] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create arbcom-ru@wikimedia.org - https://phabricator.wikimedia.org/T262525 (10Dzahn) a:03Adamant.pwn
[17:30:56] <wikibugs>	 10Operations, 10Wikidata, 10Wikimedia-Mailing-lists: Stop archiving the wikidata-bugs mailinglist in pipermail - https://phabricator.wikimedia.org/T262773 (10Dzahn) a:03Lydia_Pintscher assigning to Lydia as she is the list admin and can change it per T262773#6464825  I think that's all that is needed to re...
[17:36:48] <wikibugs>	 (03CR) 10Ppchelko: "that's what you get for having the portal site and apis on the same host." [deployment-charts] - 10https://gerrit.wikimedia.org/r/628408 (https://phabricator.wikimedia.org/T263045) (owner: 10Hnowlan)
[17:37:15] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+1] api-gateway: add routing for static and other components [deployment-charts] - 10https://gerrit.wikimedia.org/r/628408 (https://phabricator.wikimedia.org/T263045) (owner: 10Hnowlan)
[17:42:19] <wikibugs>	 (03PS1) 10Dzahn: mailman: require package pwgen to create random passwords [puppet] - 10https://gerrit.wikimedia.org/r/628412
[17:43:24] <wikibugs>	 (03PS2) 10Dzahn: mailman: require package pwgen to create random passwords [puppet] - 10https://gerrit.wikimedia.org/r/628412
[17:43:55] <wikibugs>	 (03CR) 10Herron: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/628412 (owner: 10Dzahn)
[17:44:23] <wikibugs>	 (03CR) 10RLazarus: [C: 03+1] mailman: require package pwgen to create random passwords [puppet] - 10https://gerrit.wikimedia.org/r/628412 (owner: 10Dzahn)
[17:44:50] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "fast reviews like that are awesome 😊" [puppet] - 10https://gerrit.wikimedia.org/r/628412 (owner: 10Dzahn)
[17:46:40] <wikibugs>	 (03CR) 10Dzahn: "now installed on lists1001 by puppet" [puppet] - 10https://gerrit.wikimedia.org/r/628412 (owner: 10Dzahn)
[17:49:30] <wikibugs>	 10Operations, 10Traffic, 10netops, 10Epic: Capacity planning for (& optimization of) transport backhaul vs edge egress - https://phabricator.wikimedia.org/T263275 (10CDanis) p:05Triage→03Medium
[17:50:33] <wikibugs>	 10Operations, 10netops: cr1-codfw<->cr1-eqiad link saturation - https://phabricator.wikimedia.org/T263206 (10CDanis) This particular issue is resolved for now, and the action items and other ideas spawned in the discussion of it will be tracked as sub-tasks of {T263275}
[17:50:50] <wikibugs>	 10Operations, 10netops: cr1-codfw<->cr1-eqiad link saturation - https://phabricator.wikimedia.org/T263206 (10CDanis) 05Open→03Resolved a:03CDanis
[18:00:02] <wikibugs>	 10Operations, 10ops-eqiad: Check jumbo1008.eqiad.wmnet PSU redundancy reported as critical - https://phabricator.wikimedia.org/T263262 (10wiki_willy) a:03Cmjohnson
[18:04:37] <wikibugs>	 (03CR) 10Dzahn: [V: 04-1 C: 04-1] "previous issue fixed but still more here: https://puppet-compiler.wmflabs.org/compiler1003/25192/bast3004.wikimedia.org/change.bast3004.wi" [puppet] - 10https://gerrit.wikimedia.org/r/623666 (owner: 10Dzahn)
[18:07:10] <wikibugs>	 10Operations, 10Traffic, 10netops: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10CDanis)
[18:08:20] <wikibugs>	 10Operations, 10Traffic, 10netops: Collect netflow data for internal traffic - https://phabricator.wikimedia.org/T263277 (10CDanis) p:05Triage→03Medium
[18:10:29] <wikibugs>	 (03PS1) 10Herron: graphite-carbon: disable internal log rotation and use logrotate [puppet] - 10https://gerrit.wikimedia.org/r/628423 (https://phabricator.wikimedia.org/T263103)
[18:10:56] <ryankemper>	 !log Removed stale `wikidatardf-dumps` crontab entry from `dumpsgen@snapshot1008`, stored backup of previous state of crontab in the (admittedly verbose) `/tmp/dumpsgen_crontab_before_removing_stale_wikidata_dump_entry_see_gerrit_puppet_patch_622342`
[18:10:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:25:40] <wikibugs>	 10Operations, 10Domains, 10Traffic: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10Ijon) Thanks, @jcrespo!
[18:26:16] <wikibugs>	 (03CR) 10Herron: "currently fails like this https://puppet-compiler.wmflabs.org/compiler1003/25194/graphite1004.eqiad.wmnet/change.graphite1004.eqiad.wmnet." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/628423 (https://phabricator.wikimedia.org/T263103) (owner: 10Herron)
[18:26:38] <wikibugs>	 10Operations, 10ops-codfw, 10decommission-hardware: decommission wmf6412 - https://phabricator.wikimedia.org/T261968 (10wiki_willy) a:03Papaul
[18:37:22] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps20[05-10].codfw.wmnet - https://phabricator.wikimedia.org/T260271 (10Papaul) hey guys please look at the output below and see if the all looks good on maps2005 so i can resume the install on the other nodes on Monday.  Thank...
[18:38:36] <ryankemper>	 !log `sudo kill 126121 126122 126124 126128 249520 249521 254016 254027` on `snapshot1008` to terminate wikidata dump jobs that are in a bad state
[18:38:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:41:55] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: 3.37e+04 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:43:49] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: (C)100 gt (W)50 gt 9 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[18:44:42] <ryankemper>	 !log `sudo kill 254017 254018 254028 254029` to kill some dangling serdi / gzip processes, all the wikidata cleanup should be complete
[18:44:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:46:28] <logmsgbot>	 !log pt1979@cumin2001 START - Cookbook sre.dns.netbox
[18:46:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:28] <logmsgbot>	 !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[18:52:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:52:41] <wikibugs>	 10Operations, 10ops-codfw, 10decommission-hardware: decommission wmf6412 - https://phabricator.wikimedia.org/T261968 (10Papaul)
[18:53:21] <wikibugs>	 10Operations, 10ops-codfw, 10decommission-hardware: decommission wmf6412 - https://phabricator.wikimedia.org/T261968 (10Papaul) 05Open→03Resolved
[20:17:13] <wikibugs>	 (03CR) 10Ppchelko: [C: 03+1] "heh, I was thinking about https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/listener/v3/listener_components.proto#config-listener-" [deployment-charts] - 10https://gerrit.wikimedia.org/r/628408 (https://phabricator.wikimedia.org/T263045) (owner: 10Hnowlan)
[20:20:53] <wikibugs>	 (03CR) 10CRusnov: "This change is ready for review." [puppet] - 10https://gerrit.wikimedia.org/r/628436 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov)
[20:26:07] <wikibugs>	 10Operations, 10Traffic, 10netops: experiment with reënabling compression between applayer's TLS terminators and edge caches - https://phabricator.wikimedia.org/T263288 (10CDanis)
[20:28:52] <wikibugs>	 (03PS2) 10CRusnov: base/check_systemd_state.py: Switch header to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/624733 (https://phabricator.wikimedia.org/T247364)
[20:32:37] <wikibugs>	 (03PS3) 10CRusnov: modules/service/files/logstash_checker.py: Move to Python3 [puppet] - 10https://gerrit.wikimedia.org/r/624116 (https://phabricator.wikimedia.org/T247364)
[20:33:47] <wikibugs>	 (03Abandoned) 10DannyS712: abusefilter.php: Remove settings that duplicate defaults, and clean up [mediawiki-config] - 10https://gerrit.wikimedia.org/r/552610 (https://phabricator.wikimedia.org/T238965) (owner: 10DannyS712)
[20:35:36] <wikibugs>	 (03PS3) 10CRusnov: modules/admin/data/nda_audit.py: Port to Python3 [puppet] - 10https://gerrit.wikimedia.org/r/624112 (https://phabricator.wikimedia.org/T247364)
[20:45:07] <wikibugs>	 10Operations, 10netops: Set the same OSPF weight on eqiad/codfw wavelenghts - https://phabricator.wikimedia.org/T263230 (10CDanis)
[20:45:09] <wikibugs>	 10Operations, 10Traffic, 10netops, 10Epic: Capacity planning for (& optimization of) transport backhaul vs edge egress - https://phabricator.wikimedia.org/T263275 (10CDanis)
[20:45:12] <wikibugs>	 10Operations, 10netops: Consider balancing VRRP primaries to cr1/cr2 - https://phabricator.wikimedia.org/T263212 (10CDanis)
[20:52:16] <wikibugs>	 10Operations, 10Analytics, 10Traffic, 10netops: Turnilo: per-second rates for wmf_netflow bytes + packets - https://phabricator.wikimedia.org/T263290 (10CDanis)
[21:00:39] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "I compiled this on everything. NOOP on *.  https://puppet-compiler.wmflabs.org/compiler1002/25193/" [puppet] - 10https://gerrit.wikimedia.org/r/627966 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[21:06:41] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: 138 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:06:41] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[21:08:37] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: (C)100 gt (W)50 gt 7 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:08:53] <wikibugs>	 10Operations, 10Traffic: experiment with a "unified" ATS-BE pool - https://phabricator.wikimedia.org/T263291 (10CDanis)
[21:14:27] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: 1222 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:16:23] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: (C)100 gt (W)50 gt 2 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:18:11] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1083 is OK: HTTP OK: HTTP/1.0 200 OK - 23597 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[21:38:35] <wikibugs>	 (03PS9) 10CDanis: WIP: serve NEL headers on group0 [puppet] - 10https://gerrit.wikimedia.org/r/627629
[21:39:56] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] ""is_primary_server" is likely to confuse someone in the future who thinks that means the other server is standby (which is not true, it's " [puppet] - 10https://gerrit.wikimedia.org/r/624328 (owner: 10Dzahn)
[21:41:19] <wikibugs>	 (03PS10) 10CDanis: WIP: serve NEL headers on group0 [puppet] - 10https://gerrit.wikimedia.org/r/627629
[21:42:16] <wikibugs>	 (03PS7) 10Dzahn: dumps: rename the do_acme parameter and lookup [puppet] - 10https://gerrit.wikimedia.org/r/624328
[21:43:21] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "thank you very much, also for updating docs. i'll merge it then and confirm on the hosts" [puppet] - 10https://gerrit.wikimedia.org/r/624328 (owner: 10Dzahn)
[21:45:08] <wikibugs>	 (03PS11) 10CDanis: Serve Network Error Logging headers on group0 [puppet] - 10https://gerrit.wikimedia.org/r/627629 (https://phabricator.wikimedia.org/T257527)
[21:45:20] <wikibugs>	 (03CR) 10Bstorm: [C: 03+1] "I think this looks good now. It's honestly a really scary one because a change in the NFS mounts that doesn't go well will cause all NFS m" [puppet] - 10https://gerrit.wikimedia.org/r/622666 (owner: 10Dzahn)
[21:45:39] <wikibugs>	 (03CR) 10CDanis: "This is ready for review!  I'd like to deploy Monday." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/627629 (https://phabricator.wikimedia.org/T257527) (owner: 10CDanis)
[21:45:57] <wikibugs>	 (03CR) 10Dzahn: "noop confirmed on labstore1006/1007" [puppet] - 10https://gerrit.wikimedia.org/r/624328 (owner: 10Dzahn)
[21:46:30] <wikibugs>	 10Operations, 10Product-Infrastructure-Data, 10Epic, 10Goal, 10Patch-For-Review: automatically collect network error reports from users' browsers (Network Error Logging API) - https://phabricator.wikimedia.org/T257527 (10CDanis)
[21:47:10] <wikibugs>	 (03CR) 10Dzahn: "ACK, confirmed. this will not merge on a Friday. thank you" [puppet] - 10https://gerrit.wikimedia.org/r/622666 (owner: 10Dzahn)
[21:48:32] <tzatziki>	 !log changed password for Millennium bug@ptwiki
[21:48:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:07] <wikibugs>	 (03CR) 10Hashar: "Danke Schon ;)" [puppet] - 10https://gerrit.wikimedia.org/r/628338 (https://phabricator.wikimedia.org/T263008) (owner: 10Hashar)
[22:13:32] <wikibugs>	 (03CR) 10Dzahn: "reduced the number of hiera() lines across the whole repo by 32%" [puppet] - 10https://gerrit.wikimedia.org/r/627966 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn)
[22:14:38] <wikibugs>	 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Use lookup() instead of hiera() in Puppet - https://phabricator.wikimedia.org/T209953 (10Dzahn) The patch above reduced the number of hiera() lines across the whole puppet repo by 32%.
[22:26:29] <wikibugs>	 (03PS1) 10Dzahn: wmcs::postgres: hiera->lookup and add data types [puppet] - 10https://gerrit.wikimedia.org/r/628459
[22:30:54] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Wikimedia-production-error: Could not enqueue jobs from stream mediawiki.job.cirrusSearchIncomingLinkCount - https://phabricator.wikimedia.org/T263132 (10jeena) Various jobenqueue errors happened today in the past 6 hours with spikes of 1...
[22:33:12] <wikibugs>	 (03PS1) 10Dzahn: cache::ssl::unified: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/628460
[22:34:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cache::ssl::unified: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/628460 (owner: 10Dzahn)
[22:34:33] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Wikimedia-production-error: Could not enqueue jobs from stream mediawiki.job.cirrusSearchIncomingLinkCount - https://phabricator.wikimedia.org/T263132 (10thcipriani) p:05High→03Unbreak! >>! In T263132#6475784, @jeena wrote: > Various...
[22:39:48] <wikibugs>	 (03PS1) 10Dzahn: nutcracker: hiera-lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/628461
[22:48:08] <wikibugs>	 (03PS1) 10Dzahn: yubiauth: hiera->lookup, add data types [puppet] - 10https://gerrit.wikimedia.org/r/628462
[23:07:15] <wikibugs>	 (03PS1) 10Dzahn: phabricator: add mysql port to user check script [puppet] - 10https://gerrit.wikimedia.org/r/628464
[23:08:23] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "it's just a script run by humans, not influencing prod phabricator in any way" [puppet] - 10https://gerrit.wikimedia.org/r/628464 (owner: 10Dzahn)
[23:12:42] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/624116 (https://phabricator.wikimedia.org/T247364) (owner: 10CRusnov)
[23:14:49] <icinga-wm>	 PROBLEM - Too many messages in kafka logging-eqiad #o11y on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group=logstash-codfw instance=kafkamon1002 job=burrow partition={2,4} prometheus=ops site=eqiad topic=rsyslog-notice https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=thanos&var-cluster=
[23:14:49] <icinga-wm>	 -topic=All&var-consumer_group=All
[23:20:03] <wikibugs>	 (03CR) 10Krinkle: "Ping :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622863 (https://phabricator.wikimedia.org/T249745) (owner: 10Ppchelko)
[23:21:03] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/628423 (https://phabricator.wikimedia.org/T263103) (owner: 10Herron)
[23:51:45] <icinga-wm>	 RECOVERY - Too many messages in kafka logging-eqiad #o11y on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=thanos&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All