[00:01:41] <icinga-wm>	 RECOVERY - Check systemd state on lists1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:05:13] <wikibugs>	 10SRE, 10Traffic, 10HTTPS, 10Performance-Team (Radar): Enable QUIC support on Wikimedia servers - https://phabricator.wikimedia.org/T238034 (10Bugreporter)
[00:05:46] <wikibugs>	 10SRE, 10Traffic, 10HTTPS, 10Performance-Team (Radar): Enable HTTP/3 (QUIC) support on Wikimedia servers - https://phabricator.wikimedia.org/T238034 (10Bugreporter)
[00:09:15] <icinga-wm>	 PROBLEM - Check systemd state on lists1001 is CRITICAL: CRITICAL - degraded: The following units failed: check_exclude_backups.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:20:03] <icinga-wm>	 RECOVERY - Check systemd state on elastic2046 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[00:25:45] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[00:45:41] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[00:47:59] <icinga-wm>	 PROBLEM - Check systemd state on logstash1026 is CRITICAL: CRITICAL - degraded: The following units failed: curator_actions_cluster_wide.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[01:43:26] <wikibugs>	 (03PS1) 10Andrew Bogott: Trove: add policy.yaml override [puppet] - 10https://gerrit.wikimedia.org/r/684136 (https://phabricator.wikimedia.org/T281655)
[01:44:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Trove: add policy.yaml override [puppet] - 10https://gerrit.wikimedia.org/r/684136 (https://phabricator.wikimedia.org/T281655) (owner: 10Andrew Bogott)
[01:45:23] <wikibugs>	 (03PS2) 10Andrew Bogott: Trove: add policy.yaml override [puppet] - 10https://gerrit.wikimedia.org/r/684136 (https://phabricator.wikimedia.org/T281655)
[01:46:05] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[01:47:30] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Trove: add policy.yaml override [puppet] - 10https://gerrit.wikimedia.org/r/684136 (https://phabricator.wikimedia.org/T281655) (owner: 10Andrew Bogott)
[01:54:15] <icinga-wm>	 PROBLEM - SSH on phab2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[02:05:16] <wikibugs>	 (03PS1) 10Andrew Bogott: Horizon: install trove policy file for trove-dashboard [puppet] - 10https://gerrit.wikimedia.org/r/684137 (https://phabricator.wikimedia.org/T281655)
[02:10:49] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Horizon: install trove policy file for trove-dashboard [puppet] - 10https://gerrit.wikimedia.org/r/684137 (https://phabricator.wikimedia.org/T281655) (owner: 10Andrew Bogott)
[02:49:37] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[02:52:35] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[02:54:31] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[02:55:33] <icinga-wm>	 RECOVERY - SSH on phab2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[03:03:49] <icinga-wm>	 PROBLEM - Host wdqs1013 is DOWN: PING CRITICAL - Packet loss = 100%
[03:04:35] <icinga-wm>	 RECOVERY - Host wdqs1013 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms
[03:24:51] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[03:51:41] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[03:54:15] <icinga-wm>	 RECOVERY - WDQS SPARQL on wdqs1006 is OK: HTTP OK: HTTP/1.1 200 OK - 689 bytes in 1.065 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[04:44:29] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[04:49:21] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[04:54:49] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[05:32:09] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 37817808 and 11 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[05:37:05] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 610800 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[06:42:19] <icinga-wm>	 RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:44:51] <wikibugs>	 (03CR) 10Elukey: [C: 03+1] "This can be merged anytime, so all the install servers will update way before our maintenance :)" [puppet] - 10https://gerrit.wikimedia.org/r/682785 (https://phabricator.wikimedia.org/T278423) (owner: 10Razzi)
[06:47:15] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[06:49:37] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[07:20:57] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I'm not sure I get the point of this PS." [puppet] - 10https://gerrit.wikimedia.org/r/683837 (owner: 10Jbond)
[07:29:45] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+2] Change owner of /srv/patches to mwdeploy (from root) [puppet] - 10https://gerrit.wikimedia.org/r/683989 (https://phabricator.wikimedia.org/T245184) (owner: 10Ahmon Dancy)
[07:31:43] <moritzm>	 !log installing libimage-exiftool-perl security updates
[07:31:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:43:19] <wikibugs>	 (03CR) 10Majavah: [C: 03+1] "> Patch Set 2: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/683837 (owner: 10Jbond)
[07:50:04] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: safe-service-restart: Only verify in scope services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/682619 (https://phabricator.wikimedia.org/T279100) (owner: 10Alexandros Kosiaris)
[07:58:23] <wikibugs>	 10SRE, 10ops-codfw, 10Discovery, 10Discovery-Search (Current work): elastic2033 without bootable devices available - https://phabricator.wikimedia.org/T281621 (10Gehel) Note that elastic2033 is using software RAID. The data should be on RAID0, but the root partition on RAID1.
[07:59:02] <wikibugs>	 10SRE, 10Wikimedia-Planet: Find a replacement for RSS aggregator for planet.wikimedia.org - https://phabricator.wikimedia.org/T281219 (10MoritzMuehlenhoff) https://github.com/rubys/venus/issues/37 points to https://github.com/feedreader/pluto which is written in Ruby and still seems to be actively maintained.
[08:01:19] <moritzm>	 !log installing edk2 security updates
[08:01:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:12:33] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:13:05] <wikibugs>	 10SRE, 10ops-codfw, 10Discovery, 10Discovery-Search (Current work): elastic2033 without bootable devices available - https://phabricator.wikimedia.org/T281621 (10elukey) The other thing that may happen is that the mbr was installed only on one of the two disks of the RAID1, so now nothing boots. IIRC PXE w...
[08:15:03] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:37:55] <wikibugs>	 (03PS1) 10Gergő Tisza: Handle DB readonly errors [extensions/GrowthExperiments] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684078 (https://phabricator.wikimedia.org/T281382)
[08:46:24] <wikibugs>	 (03PS1) 10WMDE-Fisch: [beta] Enable new search feature for the template dialog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684284 (https://phabricator.wikimedia.org/T271802)
[08:51:29] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100)
[08:51:41] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] wikidata: post edit constraint jobs on 70% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682608 (https://phabricator.wikimedia.org/T204031) (owner: 10Tonina Zhelyazkova)
[08:52:53] <logmsgbot>	 !log joal@deploy1002 Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a]
[08:53:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:53:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100) (owner: 10Giuseppe Lavagetto)
[08:53:02] <wikibugs>	 (03CR) 10WMDE-Fisch: "This change is ready for review." [extensions/Popups] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684079 (https://phabricator.wikimedia.org/T281352) (owner: 10WMDE-Fisch)
[08:56:24] <wikibugs>	 (03PS2) 10WMDE-Fisch: [beta] Enable new search features for the template dialog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684284 (https://phabricator.wikimedia.org/T271802)
[08:57:35] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100)
[08:58:06] <wikibugs>	 (03PS3) 10Tonina Zhelyazkova: wikidata: post edit constraint jobs on 70% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682608 (https://phabricator.wikimedia.org/T204031)
[08:58:36] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29345/console" [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100) (owner: 10Giuseppe Lavagetto)
[08:59:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100) (owner: 10Giuseppe Lavagetto)
[08:59:12] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wikimediacloud.org: add more FQDNs in prepartion for the cloudgw migration [dns] - 10https://gerrit.wikimedia.org/r/683855 (owner: 10Arturo Borrero Gonzalez)
[09:02:55] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29346/console" [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100) (owner: 10Giuseppe Lavagetto)
[09:05:19] <wikibugs>	 (03Abandoned) 10Muehlenhoff: Include grub::defaults unconditionally [puppet] - 10https://gerrit.wikimedia.org/r/505869 (https://phabricator.wikimedia.org/T140100) (owner: 10Muehlenhoff)
[09:07:05] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:08:57] <wikibugs>	 (03PS1) 10Volans: doc: fix sphinx warning in docstring [software/cumin] - 10https://gerrit.wikimedia.org/r/684295
[09:09:00] <logmsgbot>	 !log joal@deploy1002 Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s)
[09:09:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:09:29] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:10:12] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29347/console" [puppet] - 10https://gerrit.wikimedia.org/r/683997 (owner: 10Jbond)
[09:10:16] <logmsgbot>	 !log joal@deploy1002 Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a]
[09:10:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:22] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: P::toolforge::mailrelay: support multiple domains (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/684032 (https://phabricator.wikimedia.org/T278109) (owner: 10Majavah)
[09:10:23] <logmsgbot>	 !log joal@deploy1002 Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s)
[09:10:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:46] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:pki::multirootca: use hardcoded sources for pki certs [puppet] - 10https://gerrit.wikimedia.org/r/683997 (owner: 10Jbond)
[09:12:28] <logmsgbot>	 !log joal@deploy1002 Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a]
[09:12:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:02] <wikibugs>	 (03CR) 10Filippo Giunchedi: "> Patch Set 1:" [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676383 (owner: 10Filippo Giunchedi)
[09:15:25] <wikibugs>	 (03PS1) 10Jbond: P:pki::multirootca: drop ca bundle file as its not used [puppet] - 10https://gerrit.wikimedia.org/r/684297
[09:16:04] <wikibugs>	 (03PS29) 10Jcrespo: mediabackup: Initial setup for the media backup worker hosts [puppet] - 10https://gerrit.wikimedia.org/r/668380 (https://phabricator.wikimedia.org/T276442)
[09:16:06] <wikibugs>	 (03PS1) 10Jcrespo: backups: Fix typo on fileset name, resulting on no backups scheduled [puppet] - 10https://gerrit.wikimedia.org/r/684298 (https://phabricator.wikimedia.org/T281369)
[09:16:17] <wikibugs>	 (03PS2) 10Jcrespo: backups: Fix typo on fileset name, resulting on no backups scheduled [puppet] - 10https://gerrit.wikimedia.org/r/684298 (https://phabricator.wikimedia.org/T281369)
[09:16:19] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29348/console" [puppet] - 10https://gerrit.wikimedia.org/r/684297 (owner: 10Jbond)
[09:17:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] backups: Fix typo on fileset name, resulting on no backups scheduled [puppet] - 10https://gerrit.wikimedia.org/r/684298 (https://phabricator.wikimedia.org/T281369) (owner: 10Jcrespo)
[09:17:59] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] P:pki::multirootca: drop ca bundle file as its not used [puppet] - 10https://gerrit.wikimedia.org/r/684297 (owner: 10Jbond)
[09:18:01] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "This has a typo, plus some grep impact" [puppet] - 10https://gerrit.wikimedia.org/r/683676 (owner: 10Jbond)
[09:18:14] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] backups: Fix typo on fileset name, resulting on no backups scheduled [puppet] - 10https://gerrit.wikimedia.org/r/684298 (https://phabricator.wikimedia.org/T281369) (owner: 10Jcrespo)
[09:19:27] <wikibugs>	 (03PS1) 10Jcrespo: backups: Fix typo on fileset name, resulting on no backups scheduled [puppet] - 10https://gerrit.wikimedia.org/r/684300 (https://phabricator.wikimedia.org/T281369)
[09:19:37] <wikibugs>	 (03PS2) 10Jcrespo: backups: Fix typo on fileset name, resulting on no backups scheduled [puppet] - 10https://gerrit.wikimedia.org/r/684300 (https://phabricator.wikimedia.org/T281369)
[09:21:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] backups: Fix typo on fileset name, resulting on no backups scheduled [puppet] - 10https://gerrit.wikimedia.org/r/684300 (https://phabricator.wikimedia.org/T281369) (owner: 10Jcrespo)
[09:21:34] <wikibugs>	 (03PS1) 10Filippo Giunchedi: pontoon: add hosts_for_role function [puppet] - 10https://gerrit.wikimedia.org/r/684301
[09:22:17] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "Merging right away since this is only a cherry-pick in production" [puppet] - 10https://gerrit.wikimedia.org/r/684301 (owner: 10Filippo Giunchedi)
[09:22:37] <wikibugs>	 (03PS3) 10Jcrespo: backups: Fix typo on fileset name, resulting on no backups scheduled [puppet] - 10https://gerrit.wikimedia.org/r/684298 (https://phabricator.wikimedia.org/T281369)
[09:23:04] <wikibugs>	 (03CR) 10Volans: [C: 03+2] doc: fix sphinx warning in docstring [software/cumin] - 10https://gerrit.wikimedia.org/r/684295 (owner: 10Volans)
[09:24:08] <wikibugs>	 (03PS3) 10Jbond: P::envoy: allow users to run tlsproxy without service proxy [puppet] - 10https://gerrit.wikimedia.org/r/683837 (https://phabricator.wikimedia.org/T277990)
[09:25:26] <wikibugs>	 (03PS3) 10Majavah: P::toolforge::mailrelay: support multiple domains [puppet] - 10https://gerrit.wikimedia.org/r/684032 (https://phabricator.wikimedia.org/T278109)
[09:25:52] <wikibugs>	 (03PS2) 10Filippo Giunchedi: hieradata: introduce 'public_domain' variable [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676383
[09:25:54] <wikibugs>	 (03PS4) 10Filippo Giunchedi: wmflib: add role/public_endpoint to wmflib::service [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676385
[09:25:56] <wikibugs>	 (03PS4) 10Filippo Giunchedi: pontoon: enable sso for alerts in cloud [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676386
[09:25:58] <wikibugs>	 (03PS4) 10Filippo Giunchedi: pontoon: use public_domain for alerts/icinga [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676387
[09:26:00] <wikibugs>	 (03PS5) 10Filippo Giunchedi: pontoon: introduce public_certs [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676388
[09:26:02] <wikibugs>	 (03PS5) 10Filippo Giunchedi: pontoon: add public LB class [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676389
[09:26:04] <wikibugs>	 (03PS8) 10Filippo Giunchedi: role: add pontoon::frontend role/profile [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676390
[09:26:05] <godog>	 sorry for the spam ^
[09:26:06] <wikibugs>	 (03PS8) 10Filippo Giunchedi: hieradata: add public o11y services to service::catalog [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676391
[09:26:08] <wikibugs>	 (03CR) 10Majavah: P::toolforge::mailrelay: support multiple domains (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/684032 (https://phabricator.wikimedia.org/T278109) (owner: 10Majavah)
[09:26:24] <wikibugs>	 (03CR) 10Jbond: "> Patch Set 2: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/683837 (https://phabricator.wikimedia.org/T277990) (owner: 10Jbond)
[09:28:42] <wikibugs>	 (03Merged) 10jenkins-bot: doc: fix sphinx warning in docstring [software/cumin] - 10https://gerrit.wikimedia.org/r/684295 (owner: 10Volans)
[09:32:50] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] "This would need a deeper bacula classes refactor." [puppet] - 10https://gerrit.wikimedia.org/r/683675 (owner: 10Jbond)
[09:35:55] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] hieradata: introduce 'public_domain' variable [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676383 (owner: 10Filippo Giunchedi)
[09:36:32] <wikibugs>	 10SRE: Integrate Buster 10.9 point update - https://phabricator.wikimedia.org/T279054 (10MoritzMuehlenhoff)
[09:36:44] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm thanks" (031 comment) [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676383 (owner: 10Filippo Giunchedi)
[09:38:39] <wikibugs>	 (03CR) 10Jbond: "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/683675 (owner: 10Jbond)
[09:38:53] <wikibugs>	 (03Abandoned) 10Jbond: P:backup::host: add sets parameter [puppet] - 10https://gerrit.wikimedia.org/r/683675 (owner: 10Jbond)
[09:39:07] <wikibugs>	 (03Abandoned) 10Jbond: O:pki::root: move backup sets to hiera [puppet] - 10https://gerrit.wikimedia.org/r/683676 (owner: 10Jbond)
[09:40:18] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm thanks" [puppet] - 10https://gerrit.wikimedia.org/r/684298 (https://phabricator.wikimedia.org/T281369) (owner: 10Jcrespo)
[09:41:29] <wikibugs>	 (03CR) 10Filippo Giunchedi: hieradata: introduce 'public_domain' variable (031 comment) [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676383 (owner: 10Filippo Giunchedi)
[09:41:52] <logmsgbot>	 !log joal@deploy1002 Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s)
[09:41:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:42:50] <moritzm>	 !log installing python3.7 security updates
[09:42:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:43:21] <wikibugs>	 (03PS3) 10Filippo Giunchedi: hieradata: introduce 'public_domain' variable [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676383
[09:43:23] <wikibugs>	 (03PS5) 10Filippo Giunchedi: wmflib: add role/public_endpoint to wmflib::service [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676385
[09:44:40] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: introduce 'public_domain' variable [puppet] - 10https://gerrit.wikimedia.org/r/684309
[09:45:51] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: introduce 'public_domain' variable [puppet] (sandbox/filippo/pontoon-o11y) - 10https://gerrit.wikimedia.org/r/676383 (owner: 10Filippo Giunchedi)
[09:48:57] <wikibugs>	 (03CR) 10Jcrespo: "Jbond: one thing that could be done now, and it is "expected/documented" is to put the "include profile::backup::host" on the role (althou" [puppet] - 10https://gerrit.wikimedia.org/r/684298 (https://phabricator.wikimedia.org/T281369) (owner: 10Jcrespo)
[09:49:16] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] backups: Fix typo on fileset name, resulting on no backups scheduled [puppet] - 10https://gerrit.wikimedia.org/r/684298 (https://phabricator.wikimedia.org/T281369) (owner: 10Jcrespo)
[09:51:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] "Merging right away since this only a cherry pick in production" [puppet] - 10https://gerrit.wikimedia.org/r/684309 (owner: 10Filippo Giunchedi)
[09:51:28] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: wmcs.drain_hypervisor: skip all VMs in the canary project (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/683857 (https://phabricator.wikimedia.org/T280641) (owner: 10David Caro)
[09:54:21] <wikibugs>	 (03CR) 10David Caro: wmcs.drain_hypervisor: skip all VMs in the canary project (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/683857 (https://phabricator.wikimedia.org/T280641) (owner: 10David Caro)
[09:54:41] <wikibugs>	 (03CR) 10Awight: [C: 03+1] "Looks right.  I'd be scared to trim it down any further." [extensions/Popups] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684079 (https://phabricator.wikimedia.org/T281352) (owner: 10WMDE-Fisch)
[09:57:53] <icinga-wm>	 RECOVERY - Backup freshness on backup1001 is OK: Fresh: 102 jobs https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[09:58:07] <wikibugs>	 (03CR) 10Jbond: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/684298 (https://phabricator.wikimedia.org/T281369) (owner: 10Jcrespo)
[09:59:54] <jynus>	 ^ jbond42 that's the pki backups running!! :-)
[10:00:35] <jbond42>	 jynus: great thanks
[10:13:29] <wikibugs>	 (03PS1) 10JMeybohm: Rename configcluster_stretch to configcluster in hiera [puppet] - 10https://gerrit.wikimedia.org/r/684315 (https://phabricator.wikimedia.org/T271573)
[10:13:31] <wikibugs>	 (03PS1) 10JMeybohm: Remove unused profile::etcd and related classes [puppet] - 10https://gerrit.wikimedia.org/r/684316 (https://phabricator.wikimedia.org/T271573)
[10:14:51] <wikibugs>	 (03PS2) 10JMeybohm: Rename configcluster_stretch to configcluster in hiera [puppet] - 10https://gerrit.wikimedia.org/r/684315 (https://phabricator.wikimedia.org/T271573)
[10:14:53] <wikibugs>	 (03PS2) 10JMeybohm: Remove unused profile::etcd and related classes [puppet] - 10https://gerrit.wikimedia.org/r/684316 (https://phabricator.wikimedia.org/T271573)
[10:15:07] <icinga-wm>	 RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:19:27] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10decommission-hardware, 10serviceops: decommission conf200[1-3].codfw.wmnet - https://phabricator.wikimedia.org/T281374 (10JMeybohm) a:05JMeybohm→03Papaul
[10:25:11] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29349/console" [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100) (owner: 10Giuseppe Lavagetto)
[10:26:19] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: Drop 56.15.185.in-addr.arpa zone [dns] - 10https://gerrit.wikimedia.org/r/684320
[10:26:47] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100)
[10:27:03] <wikibugs>	 (03PS2) 10Majavah: Add grafana-cloud.{wm.o,d.wmnet} to replace labs [dns] - 10https://gerrit.wikimedia.org/r/684099
[10:28:07] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100) (owner: 10Giuseppe Lavagetto)
[10:28:39] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Drop 56.15.185.in-addr.arpa zone [dns] - 10https://gerrit.wikimedia.org/r/684320 (owner: 10Arturo Borrero Gonzalez)
[10:30:05] <jouncebot>	 jan_drewniak: (Dis)respected human, time to deploy Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210503T1030). Please do the needful.
[10:31:54] <wikibugs>	 (03PS1) 10Jbond: (WIP): add function to test if we are doing the initial puppet run [puppet] - 10https://gerrit.wikimedia.org/r/684321
[10:34:47] <wikibugs>	 (03CR) 10Majavah: [C: 04-1] "The current implementation does not work for most Cloud VPS projects where there is no per-project puppetmaster involved. I imagine most p" [puppet] - 10https://gerrit.wikimedia.org/r/684321 (owner: 10Jbond)
[10:34:56] <wikibugs>	 (03PS1) 10Jbond: hiera - cloud: add cert to test build process [puppet] - 10https://gerrit.wikimedia.org/r/684322
[10:35:25] <wikibugs>	 (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684323 (https://phabricator.wikimedia.org/T128546)
[10:38:56] <wikibugs>	 10SRE, 10Commons, 10Tools, 10Wikimedia-Mailing-lists: daily-image-l stopped sending on 2020-10-11 - https://phabricator.wikimedia.org/T265568 (10jcrespo) 05Open→03Resolved I am going to assume this is resolved, due to old age. Reopen if this is still happening.
[10:38:58] <wikibugs>	 (03CR) 10Jbond: "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/684321 (owner: 10Jbond)
[10:39:24] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] hiera - cloud: add cert to test build process [puppet] - 10https://gerrit.wikimedia.org/r/684322 (owner: 10Jbond)
[10:39:41] <wikibugs>	 (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684323 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[10:40:57] <wikibugs>	 (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684323 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak)
[10:42:24] <wikibugs>	 (03PS1) 10Gergő Tisza: GrowthExperiments: enable link recommendations backend on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684327 (https://phabricator.wikimedia.org/T278710)
[10:46:29] <logmsgbot>	 !log jdrewniak@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:684302| Bumping portals to master (T128546)]] (duration: 00m 58s)
[10:46:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:46:38] <stashbot>	 T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546
[10:46:40] <wikibugs>	 (03PS1) 10Gergő Tisza: GrowthExperiments: enable link recommendations frontend on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684331 (https://phabricator.wikimedia.org/T278710)
[10:47:27] <logmsgbot>	 !log jdrewniak@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:684302| Bumping portals to master (T128546)]] (duration: 00m 57s)
[10:47:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:48:45] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v4.1.0 [software/cumin] - 10https://gerrit.wikimedia.org/r/684332
[10:54:23] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v4.1.0 [software/cumin] - 10https://gerrit.wikimedia.org/r/684332 (owner: 10Volans)
[10:55:56] <wikibugs>	 (03PS1) 10Hashar: [WMF] register our plugins as submodules [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684336
[10:56:55] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "backport window" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684078 (https://phabricator.wikimedia.org/T281382) (owner: 10Gergő Tisza)
[10:57:17] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] "backport window" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684080 (owner: 10Gergő Tisza)
[10:59:17] <moritzm>	 !log installing avahi security updates on buster
[10:59:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do European mid-day backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210503T1100).
[11:00:04] <jouncebot>	 Tonina_WMDE, CFisch_WMDE, and tgr: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:14] <Urbanecm>	 i can deploy today
[11:00:17] <Lucas_WMDE>	 o/
[11:00:26] <tgr_>	 o/ I'll self-serve but only be around in the second half of the window.
[11:00:48] <Tonina_WMDE>	 o/
[11:00:50] <CFisch_WMDE>	 o/
[11:00:54] <Urbanecm>	 tgr_: ack. 
[11:01:04] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] wikidata: post edit constraint jobs on 70% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682608 (https://phabricator.wikimedia.org/T204031) (owner: 10Tonina Zhelyazkova)
[11:01:08] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] Fix settings dialog offering ReferencePreviews when unavailable [extensions/Popups] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684079 (https://phabricator.wikimedia.org/T281352) (owner: 10WMDE-Fisch)
[11:01:21] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Fix settings dialog offering ReferencePreviews when unavailable [extensions/Popups] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684079 (https://phabricator.wikimedia.org/T281352) (owner: 10WMDE-Fisch)
[11:01:28] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v4.1.0 [software/cumin] - 10https://gerrit.wikimedia.org/r/684332 (owner: 10Volans)
[11:01:51] <wikibugs>	 (03Merged) 10jenkins-bot: wikidata: post edit constraint jobs on 70% of edits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682608 (https://phabricator.wikimedia.org/T204031) (owner: 10Tonina Zhelyazkova)
[11:02:53] <Urbanecm>	 Tonina_WMDE: I assume your patch cannot be actually tested, right?
[11:03:12] <Tonina_WMDE>	 no, I don't think it can
[11:03:13] <Urbanecm>	 (it's on mwdebug1001 anyway)
[11:03:17] <Urbanecm>	 okay, syncing :)
[11:04:41] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: f1a5ef0116c77b86b1abfb7bfa7d4ed363c69f61: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s)
[11:04:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:04:49] <stashbot>	 T204031: Deploy regular running of wikidata constraint checks using the job queue - https://phabricator.wikimedia.org/T204031
[11:05:00] <Urbanecm>	 Tonina_WMDE: should be live :)
[11:05:20] <wikibugs>	 10SRE, 10Commons, 10Tools, 10Wikimedia-Mailing-lists: daily-image-l stopped sending on 2020-10-11 - https://phabricator.wikimedia.org/T265568 (10RhinosF1) >>! In T265568#7052296, @jcrespo wrote: > I am going to assume this is resolved, due to old age. Reopen if this is still happening. https://lists.wikime...
[11:05:21] <wikibugs>	 (03PS2) 10Urbanecm: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/683430 (https://phabricator.wikimedia.org/T279853)
[11:05:32] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/683430 (https://phabricator.wikimedia.org/T279853) (owner: 10Urbanecm)
[11:06:30] <wikibugs>	 (03Merged) 10jenkins-bot: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/683430 (https://phabricator.wikimedia.org/T279853) (owner: 10Urbanecm)
[11:06:35] <wikibugs>	 (03PS1) 10Muehlenhoff: Add library hint for avahi [puppet] - 10https://gerrit.wikimedia.org/r/684337
[11:06:45] <Tonina_WMDE>	 thanks Urbanecm :)
[11:06:51] <Urbanecm>	 any time :)
[11:08:25] <wikibugs>	 (03Merged) 10jenkins-bot: Handle DB readonly errors [extensions/GrowthExperiments] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684078 (https://phabricator.wikimedia.org/T281382) (owner: 10Gergő Tisza)
[11:08:28] <wikibugs>	 (03Merged) 10jenkins-bot: refreshLinkRecommendations.php: Use per-wiki locks [extensions/GrowthExperiments] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684080 (owner: 10Gergő Tisza)
[11:08:31] <wikibugs>	 (03Merged) 10jenkins-bot: Fix settings dialog offering ReferencePreviews when unavailable [extensions/Popups] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684079 (https://phabricator.wikimedia.org/T281352) (owner: 10WMDE-Fisch)
[11:09:26] <wikibugs>	 (03PS1) 10Volans: Upstream release v4.1.0 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/684340
[11:10:05] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for avahi [puppet] - 10https://gerrit.wikimedia.org/r/684337 (owner: 10Muehlenhoff)
[11:11:10] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: c5a7c67b4daf33e0f9aaabec3f35ab6d4184894b: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s)
[11:11:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:11:19] <stashbot>	 T279853: Migrate mentor/mentee relationship to a separate database table on Wikimedia wikis - https://phabricator.wikimedia.org/T279853
[11:11:54] <Urbanecm>	 CFisch_WMDE: your patch is on mwdebug1001
[11:12:08] * CFisch_WMDE testing
[11:13:12] <CFisch_WMDE>	 Urbanecm: All good, go on!
[11:13:16] <Urbanecm>	 syncing
[11:15:00] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.3/extensions/Popups/: a438b641c81fa16faba287407012beaff8b1f3ba: Fix settings dialog offering ReferencePreviews when unavailable (T281352) (duration: 00m 58s)
[11:15:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:15:08] <stashbot>	 T281352: Broken settings dialogue for reference previews when in conflict with a skin/gadget - https://phabricator.wikimedia.org/T281352
[11:15:09] <Urbanecm>	 should be live CFisch_WMDE 
[11:16:04] <Urbanecm>	 so, unless someone has anything else, i think tgr|away can deploy his patches when available.
[11:16:18] <CFisch_WMDE>	 All good. Thanks again Urbanecm! :-)
[11:16:24] <Urbanecm>	 any time :)
[11:18:34] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v4.1.0 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/684340 (owner: 10Volans)
[11:23:44] <addshore>	 Its a Tonina_WMDE =o
[11:24:41] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v4.1.0 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/684340 (owner: 10Volans)
[11:34:22] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100)
[11:35:27] <wikibugs>	 10SRE: Integrate Buster 10.9 point update - https://phabricator.wikimedia.org/T279054 (10MoritzMuehlenhoff)
[11:35:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100) (owner: 10Giuseppe Lavagetto)
[11:35:58] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29351/console" [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100) (owner: 10Giuseppe Lavagetto)
[11:44:40] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: wikimediacloud.org: add cloudsw addresses in vlan 1120 [dns] - 10https://gerrit.wikimedia.org/r/684353
[11:46:46] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/683551 (https://phabricator.wikimedia.org/T271573) (owner: 10JMeybohm)
[11:51:33] <wikibugs>	 (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] [beta] Enable new search features for the template dialog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684284 (https://phabricator.wikimedia.org/T271802) (owner: 10WMDE-Fisch)
[11:56:45] <logmsgbot>	 !log kharlan@deploy1002 Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:684080|refreshLinkRecommendations.php: Use per-wiki locks]] [[gerrit:684078|Handle DB readonly errors (T281382)]] (duration: 00m 58s)
[11:56:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:56:54] <stashbot>	 T281382: Make sure all GrowthExperiments DB writes handle readonly mode well - https://phabricator.wikimedia.org/T281382
[12:02:31] <wikibugs>	 (03PS2) 10Kosta Harlan: GrowthExperiments: enable link recommendations backend on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684327 (https://phabricator.wikimedia.org/T278710) (owner: 10Gergő Tisza)
[12:03:00] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+2] GrowthExperiments: enable link recommendations backend on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684327 (https://phabricator.wikimedia.org/T278710) (owner: 10Gergő Tisza)
[12:03:43] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments: enable link recommendations backend on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684327 (https://phabricator.wikimedia.org/T278710) (owner: 10Gergő Tisza)
[12:07:40] <logmsgbot>	 !log kharlan@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:684327|GrowthExperiments: enable link recommendations backend on cswiki (T278710)]] (duration: 00m 57s)
[12:07:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:07:48] <stashbot>	 T278710: Add a link: production deployment - https://phabricator.wikimedia.org/T278710
[12:08:27] <icinga-wm>	 PROBLEM - SSH on phab2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[12:09:03] <wikibugs>	 (03PS2) 10Kosta Harlan: GrowthExperiments: enable link recommendations frontend on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684331 (https://phabricator.wikimedia.org/T278710) (owner: 10Gergő Tisza)
[12:09:17] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+2] GrowthExperiments: enable link recommendations frontend on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684331 (https://phabricator.wikimedia.org/T278710) (owner: 10Gergő Tisza)
[12:10:19] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments: enable link recommendations frontend on cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684331 (https://phabricator.wikimedia.org/T278710) (owner: 10Gergő Tisza)
[12:19:08] <wikibugs>	 (03PS1) 10Gergő Tisza: GrowthExperiments: Set default variant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684378
[12:19:42] <wikibugs>	 (03PS2) 10Gergő Tisza: GrowthExperiments: Set default variant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684378 (https://phabricator.wikimedia.org/T278123)
[12:19:53] <tgr_>	 the deploy window is running over a bit
[12:22:41] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+2] GrowthExperiments: Set default variant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684378 (https://phabricator.wikimedia.org/T278123) (owner: 10Gergő Tisza)
[12:23:31] <wikibugs>	 (03Merged) 10jenkins-bot: GrowthExperiments: Set default variant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684378 (https://phabricator.wikimedia.org/T278123) (owner: 10Gergő Tisza)
[12:33:21] <logmsgbot>	 !log kharlan@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:684378|GrowthExperiments: Set default variant (T278123)]] [[gerrit:684331|GrowthExperiments: enable link recommendations frontend on cswiki (T278710)]] (duration: 00m 57s)
[12:33:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:33:31] <stashbot>	 T278710: Add a link: production deployment - https://phabricator.wikimedia.org/T278710
[12:33:31] <stashbot>	 T278123: Provide capability for A/B testing task types - https://phabricator.wikimedia.org/T278123
[12:35:41] <kostajh>	 tgr|away and I are done with backporting for now
[12:36:08] <kostajh>	 !log Backport window done
[12:36:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:00] <wikibugs>	 (03PS1) 10Majavah: beta: Use upload.wikimedia.beta.wmflabs.o for uploads [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684381 (https://phabricator.wikimedia.org/T281650)
[12:43:57] <wikibugs>	 (03PS3) 10Majavah: Add grafana-cloud.{wm.o,d.wmnet} to replace labs [dns] - 10https://gerrit.wikimedia.org/r/684099
[12:45:49] <wikibugs>	 (03PS1) 10Majavah: beta: Switch to deployment-urldownloader03 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684384
[12:47:51] <wikibugs>	 (03PS9) 10Majavah: etcd: Use cfssl for peer-to-peer communication [puppet] - 10https://gerrit.wikimedia.org/r/674077
[12:53:42] <wikibugs>	 (03PS16) 10DCausse: rdf-streaming-updater: create helmfile.d structure [deployment-charts] - 10https://gerrit.wikimedia.org/r/671204 (https://phabricator.wikimedia.org/T264006) (owner: 10Mstyles)
[12:53:44] <wikibugs>	 (03PS7) 10DCausse: rdf-streaming-updater: enable HA capability [deployment-charts] - 10https://gerrit.wikimedia.org/r/679519 (https://phabricator.wikimedia.org/T273098) (owner: 10Mstyles)
[12:53:46] <wikibugs>	 (03PS6) 10DCausse: rdf-streaming-updater: use session mode [deployment-charts] - 10https://gerrit.wikimedia.org/r/681497 (https://phabricator.wikimedia.org/T280166) (owner: 10Mstyles)
[12:54:36] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10observability, 10User-fgiunchedi: codfw: Testing Out Sample PDUs - https://phabricator.wikimedia.org/T265435 (10fgiunchedi) I did some work on this last week, there's temporary patches on netmon1002 to get things going at least minimally and collect voltage/current/power/e...
[13:10:47] <Urbanecm>	 !log Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user (T281703)
[13:10:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:55] <stashbot>	 T281703: TypeError: Argument 1 passed to GrowthExperiments\NewcomerTasks\TaskSuggester\CacheDecorator::suggest() must implement interface MediaWiki\User\UserIdentity, null given, called in /srv/mediawiki/php-1.37.0-wmf.3/extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php on line 170 - https://phabricator.wikimedia.org/T281703
[13:14:54] <wikibugs>	 (03PS1) 10Jdrewniak: Hotfix: loadRelatedArticles should consider existence of container element [extensions/RelatedArticles] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684393 (https://phabricator.wikimedia.org/T281547)
[13:20:07] <wikibugs>	 (03PS1) 10Jbond: wmflib: add new fact to puppet_config [puppet] - 10https://gerrit.wikimedia.org/r/684394
[13:20:59] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29352/console" [puppet] - 10https://gerrit.wikimedia.org/r/684394 (owner: 10Jbond)
[13:21:24] <wikibugs>	 (03PS2) 10Jbond: (WIP): add function to test if we are doing the initial puppet run [puppet] - 10https://gerrit.wikimedia.org/r/684321
[13:22:02] <wikibugs>	 (03PS3) 10Jbond: (WIP): add function to test if we are doing the initial puppet run [puppet] - 10https://gerrit.wikimedia.org/r/684321
[13:23:23] <wikibugs>	 (03CR) 10Jbond: [V: 03+1 C: 03+2] wmflib: add new fact to puppet_config [puppet] - 10https://gerrit.wikimedia.org/r/684394 (owner: 10Jbond)
[13:39:42] <wikibugs>	 (03PS1) 10Esanders: Make DT's source mode toolbar available as beta on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684404 (https://phabricator.wikimedia.org/T279124)
[13:43:21] <volans>	 !log uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
[13:43:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:44:15] <wikibugs>	 (03PS1) 10Jbond: hiera - cloud: add none existing class to test rebuild [puppet] - 10https://gerrit.wikimedia.org/r/684406
[13:45:26] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] hiera - cloud: add none existing class to test rebuild [puppet] - 10https://gerrit.wikimedia.org/r/684406 (owner: 10Jbond)
[13:49:23] <wikibugs>	 (03CR) 10JMeybohm: "Thanks for your review!" [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063) (owner: 10JMeybohm)
[13:50:24] <wikibugs>	 (03PS3) 10WMDE-Fisch: [beta] Enable new search features for the template dialog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684284 (https://phabricator.wikimedia.org/T271802)
[13:51:38] <wikibugs>	 (03CR) 10Jbond: [C: 04-1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/684321 (owner: 10Jbond)
[13:52:11] <CFisch_WMDE>	 FYI: Merging a labs only config patch.
[13:52:30] <wikibugs>	 (03CR) 10WMDE-Fisch: [C: 03+2] [beta] Enable new search features for the template dialog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684284 (https://phabricator.wikimedia.org/T271802) (owner: 10WMDE-Fisch)
[13:53:13] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] Enable new search features for the template dialog [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684284 (https://phabricator.wikimedia.org/T271802) (owner: 10WMDE-Fisch)
[13:57:33] <wikibugs>	 (03PS4) 10JMeybohm: kube-apiserver: Update admission controller config [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[13:59:49] <wikibugs>	 (03PS1) 10Hashar: [WMF] script to build our plugins [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684411
[14:01:14] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 10): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29353/console" [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063) (owner: 10JMeybohm)
[14:05:26] <wikibugs>	 (03CR) 10Muehlenhoff: "Looks fine, but better fold his into 683551 from the start? If e.g. a revert is needed, then it all happens in one go and it's also cleare" [puppet] - 10https://gerrit.wikimedia.org/r/684315 (https://phabricator.wikimedia.org/T271573) (owner: 10JMeybohm)
[14:06:33] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29354/console" [puppet] - 10https://gerrit.wikimedia.org/r/683916 (https://phabricator.wikimedia.org/T262847) (owner: 10Ottomata)
[14:08:13] <wikibugs>	 (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29356/console" [puppet] - 10https://gerrit.wikimedia.org/r/683916 (https://phabricator.wikimedia.org/T262847) (owner: 10Ottomata)
[14:09:13] <wikibugs>	 (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 10): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29355/console" [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063) (owner: 10JMeybohm)
[14:09:27] <wikibugs>	 (03CR) 10JMeybohm: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/684315 (https://phabricator.wikimedia.org/T271573) (owner: 10JMeybohm)
[14:10:03] <wikibugs>	 (03PS2) 10Ottomata: Remove SWAP / virtualenv based jupyterhub [puppet] - 10https://gerrit.wikimedia.org/r/683916 (https://phabricator.wikimedia.org/T262847)
[14:10:06] <wikibugs>	 (03PS5) 10JMeybohm: kube-apiserver: Update admission controller config [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[14:10:12] <wikibugs>	 (03CR) 10Muehlenhoff: Remove unused profile::etcd and related classes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/684316 (https://phabricator.wikimedia.org/T271573) (owner: 10JMeybohm)
[14:11:34] <wikibugs>	 (03PS3) 10JMeybohm: Rename role configcluster_stretch to configcluster [puppet] - 10https://gerrit.wikimedia.org/r/683551 (https://phabricator.wikimedia.org/T271573)
[14:11:36] <wikibugs>	 (03PS3) 10JMeybohm: Remove unused profile::etcd and related classes [puppet] - 10https://gerrit.wikimedia.org/r/684316 (https://phabricator.wikimedia.org/T271573)
[14:12:25] <wikibugs>	 (03PS2) 10Hashar: [WMF] register our plugins as submodules [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684336
[14:12:27] <wikibugs>	 (03PS2) 10Hashar: [WMF] script to build our plugins [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684411
[14:12:49] <wikibugs>	 (03Abandoned) 10JMeybohm: Rename configcluster_stretch to configcluster in hiera [puppet] - 10https://gerrit.wikimedia.org/r/684315 (https://phabricator.wikimedia.org/T271573) (owner: 10JMeybohm)
[14:17:36] <wikibugs>	 (03PS3) 10Hashar: [WMF] script to build our plugins [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684411
[14:18:58] <wikibugs>	 (03PS6) 10JMeybohm: kube-apiserver: Update admission controller config [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[14:23:13] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:25:37] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[14:27:21] <volans>	 !log uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia
[14:27:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:27:43] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] Remove SWAP / virtualenv based jupyterhub [puppet] - 10https://gerrit.wikimedia.org/r/683916 (https://phabricator.wikimedia.org/T262847) (owner: 10Ottomata)
[14:29:06] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100)
[14:29:36] <wikibugs>	 (03PS1) 10Jbond: P:gitlab: install gitlab-ce [puppet] - 10https://gerrit.wikimedia.org/r/684418
[14:29:39] <icinga-wm>	 PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-awight-singleuser.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:30:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] safe-service-restart: only verify pooled services [puppet] - 10https://gerrit.wikimedia.org/r/684287 (https://phabricator.wikimedia.org/T279100) (owner: 10Giuseppe Lavagetto)
[14:31:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:gitlab: install gitlab-ce [puppet] - 10https://gerrit.wikimedia.org/r/684418 (owner: 10Jbond)
[14:31:56] <wikibugs>	 (03PS2) 10Jbond: P:gitlab: install gitlab-ce [puppet] - 10https://gerrit.wikimedia.org/r/684418 (https://phabricator.wikimedia.org/T279545)
[14:33:17] <icinga-wm>	 PROBLEM - Check systemd state on stat1006 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-dsaez-singleuser.service,jupyter-ebernhardson-singleuser.service,jupyter-mneisler-singleuser.service,jupyter-neilpquinn-wmf-singleuser.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:33:17] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] modules::conftool add safe-service-restart scap option (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/682141 (https://phabricator.wikimedia.org/T266055) (owner: 10Effie Mouzeli)
[14:34:19] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/683551 (https://phabricator.wikimedia.org/T271573) (owner: 10JMeybohm)
[14:34:25] <icinga-wm>	 PROBLEM - Check systemd state on stat1007 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-dsaez-singleuser.service,jupyter-ebernhardson-singleuser.service,jupyter-zpapierski-singleuser.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:34:33] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:gitlab: install gitlab-ce [puppet] - 10https://gerrit.wikimedia.org/r/684418 (https://phabricator.wikimedia.org/T279545) (owner: 10Jbond)
[14:34:41] <icinga-wm>	 PROBLEM - Check systemd state on stat1005 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-aarora-singleuser.service,jupyter-dcausse-singleuser.service,jupyter-fdans-singleuser.service,jupyter-mmiller-singleuser.service,jupyter-mstyles-singleuser.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:35:09] <elukey>	 ottomata: --^
[14:35:23] <icinga-wm>	 PROBLEM - Check systemd state on stat1008 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-christinedk-singleuser.service,jupyter-joal-singleuser.service,jupyter-piccardi-singleuser.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:35:24] <ottomata>	 stranged, i stoped them.
[14:35:30] <ottomata>	 why is that degraded?
[14:35:35] <ottomata>	 they are ephemeral units
[14:35:37] <ottomata>	 anyway in progress, sorry
[14:35:44] <ottomata>	 am about to remove them too, but stopping each one took a while...
[14:36:22] <elukey>	 ottomata: they are all listed as failed, I can do a quick pass and reset-fail them
[14:36:42] <ottomata>	 i'm in there now, will do as soon as these other 3 finish stopping
[14:37:59] <ottomata>	 ok all reset-failed
[14:38:01] <icinga-wm>	 RECOVERY - Check systemd state on stat1006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:39:07] <icinga-wm>	 RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:39:09] <icinga-wm>	 RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:39:25] <icinga-wm>	 RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:40:09] <icinga-wm>	 RECOVERY - Check systemd state on stat1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:41:31] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Rename role configcluster_stretch to configcluster [puppet] - 10https://gerrit.wikimedia.org/r/683551 (https://phabricator.wikimedia.org/T271573) (owner: 10JMeybohm)
[14:42:13] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Remove unused profile::etcd and related classes [puppet] - 10https://gerrit.wikimedia.org/r/684316 (https://phabricator.wikimedia.org/T271573) (owner: 10JMeybohm)
[14:53:43] <wikibugs>	 (03PS7) 10JMeybohm: kube-apiserver: Update admission controller config [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[14:55:33] <wikibugs>	 (03PS1) 10Mforns: analytics:refinery:job:test:data_purge: remove -skipTrash from drop_event [puppet] - 10https://gerrit.wikimedia.org/r/684427 (https://phabricator.wikimedia.org/T273789)
[14:56:02] <wikibugs>	 (03PS8) 10JMeybohm: kube-apiserver: Update admission controller config [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[14:57:42] <wikibugs>	 (03CR) 10Ahmon Dancy: "Thanks Dzahn and Joe!" [puppet] - 10https://gerrit.wikimedia.org/r/683989 (https://phabricator.wikimedia.org/T245184) (owner: 10Ahmon Dancy)
[15:00:58] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] analytics:refinery:job:test:data_purge: remove -skipTrash from drop_event [puppet] - 10https://gerrit.wikimedia.org/r/684427 (https://phabricator.wikimedia.org/T273789) (owner: 10Mforns)
[15:01:21] <wikibugs>	 10SRE, 10ops-eqiad, 10cloud-services-team (Kanban): cloudvirt1040 primary NIC disconnected - https://phabricator.wikimedia.org/T281399 (10RobH)
[15:09:31] <wikibugs>	 10SRE, 10ops-codfw, 10ops-eqiad, 10DC-Ops: Dc-Ops Commands for Cumin - https://phabricator.wikimedia.org/T279721 (10RobH)
[15:12:33] <icinga-wm>	 RECOVERY - SSH on phab2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[15:14:31] <Amir1>	 I'm upgrading a couple of mailing lists now
[15:15:09] <icinga-wm>	 PROBLEM - wikimedia-client-errors-alerts grafana alert on alert1001 is CRITICAL: CRITICAL: Overview ( https://grafana.wikimedia.org/d/000000566/overview ) is alerting: Client error alert. https://logstash.wikimedia.org/app/kibana%23/dashboard/AXDBY8Qhh3Uj6x1zCF56 https://grafana.wikimedia.org/d/000000566/
[15:27:26] <Amir1>	 !log upgrade group A to mailman3 (T280322)
[15:27:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:27:34] <stashbot>	 T280322: Upgrade mailing lists from mailman2 to 3 in batches - https://phabricator.wikimedia.org/T280322
[15:44:07] <logmsgbot>	 !log pt1979@cumin2001 START - Cookbook sre.dns.netbox
[15:44:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:03] <wikibugs>	 (03PS1) 10Jbond: gitlab_sshd_macs: Fix type [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/684434
[15:48:25] <logmsgbot>	 !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
[15:48:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:49:36] <wikibugs>	 (03PS2) 10Jbond: gitlab_sshd_macs: Fix type [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/684434
[15:55:58] <wikibugs>	 (03PS1) 10Ryan Kemper: wdqs: shift 1 codfw internal host to codfw public [puppet] - 10https://gerrit.wikimedia.org/r/684435 (https://phabricator.wikimedia.org/T281498)
[15:58:28] <icinga-wm>	 PROBLEM - MariaDB memory on clouddb1013 is CRITICAL: CRIT Memory 95% used. Largest process: mysqld (6246) = 66.4% https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting
[15:59:00] <jynus>	 ^ andrewbogott bstorm 
[15:59:18] <bstorm>	 interesting...
[15:59:33] <jynus>	 I think there is a ticket about this, I was told by manuel
[15:59:56] <jynus>	 not necesarilly an issue, but a monitoring issue, but do not know much really
[16:00:06] <bstorm>	 Yeah. That went to "admins"
[16:00:10] <bstorm>	 for one
[16:00:19] <bstorm>	 and then there's the memory issue :)
[16:01:14] <bstorm>	 It looks like it has a lot of free memory at the moment...
[16:03:17] <bstorm>	 Ah, no it doesn't. 
[16:03:27] <bstorm>	 That's what I get for reading stuff like that during a meeting
[16:04:04] <elukey>	 bstorm: o/ Razz*i and Manuel reviewed the alarm for clouddb1021, and IIRC we decided that it was not that useful for multi-instance, so we added hiera lookups to raise the thresholds
[16:04:30] <bstorm>	 Thanks for the info :) I'll make a ticket to dig a bit deeper today
[16:04:35] <elukey>	 ack :)
[16:05:24] <bstorm>	 The dbs aren't doing much of anything
[16:05:53] <jynus>	 bstorm, @ meeting, maybe I can talk to you when I finish, as I know some of the issues
[16:06:13] <bstorm>	 👍🏻
[16:07:28] <icinga-wm>	 RECOVERY - wikimedia-client-errors-alerts grafana alert on alert1001 is OK: OK: Overview ( https://grafana.wikimedia.org/d/000000566/overview ) is not alerting. https://logstash.wikimedia.org/app/kibana%23/dashboard/AXDBY8Qhh3Uj6x1zCF56 https://grafana.wikimedia.org/d/000000566/
[16:09:01] <bstorm>	 Ah good. The alert did go to the wmcs thingy as well :)
[16:09:01] <wikibugs>	 (03PS1) 10Gergő Tisza: [beta] GrowthExperiments: make link recommendations default in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684436
[16:12:52] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] wdqs: shift 1 codfw internal host to codfw public [puppet] - 10https://gerrit.wikimedia.org/r/684435 (https://phabricator.wikimedia.org/T281498) (owner: 10Ryan Kemper)
[16:14:24] <wikibugs>	 (03PS1) 10Jbond: C:gitlab::ssh: add new gilab::ssh class [puppet] - 10https://gerrit.wikimedia.org/r/684437
[16:14:27] <wikibugs>	 (03PS1) 10Jbond: P:gitlab: add ability to manage gitlab sshd instance [puppet] - 10https://gerrit.wikimedia.org/r/684438 (https://phabricator.wikimedia.org/T276148)
[16:15:15] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29360/console" [puppet] - 10https://gerrit.wikimedia.org/r/684438 (https://phabricator.wikimedia.org/T276148) (owner: 10Jbond)
[16:15:56] <wikibugs>	 (03PS1) 10Jbond: O:gitlab: manage sshd config [puppet] - 10https://gerrit.wikimedia.org/r/684439 (https://phabricator.wikimedia.org/T276148)
[16:16:52] <wikibugs>	 (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29361/console" [puppet] - 10https://gerrit.wikimedia.org/r/684439 (https://phabricator.wikimedia.org/T276148) (owner: 10Jbond)
[16:19:27] <wikibugs>	 (03CR) 10Herron: "> Patch Set 4:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/677002 (https://phabricator.wikimedia.org/T224565) (owner: 10Herron)
[16:19:37] <legoktm>	 !log legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging
[16:19:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:20:07] <wikibugs>	 (03PS1) 10Hashar: [WMF] Add XDG_CACHE_HOME to tools/download_file.py [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684440
[16:23:27] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+2] wdqs: shift 1 codfw internal host to codfw public [puppet] - 10https://gerrit.wikimedia.org/r/684435 (https://phabricator.wikimedia.org/T281498) (owner: 10Ryan Kemper)
[16:24:11] <wikibugs>	 (03PS6) 10Jbond: P:gitlab: Deploy acme chief certificate [puppet] - 10https://gerrit.wikimedia.org/r/670427 (https://phabricator.wikimedia.org/T276673)
[16:26:17] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] P:gitlab: Deploy acme chief certificate [puppet] - 10https://gerrit.wikimedia.org/r/670427 (https://phabricator.wikimedia.org/T276673) (owner: 10Jbond)
[16:27:32] <logmsgbot>	 !log ryankemper@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet
[16:27:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:13] <ryankemper>	 !log T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435
[16:29:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:29:21] <stashbot>	 T281498: Transfer one codfw wdqs-internal host over to codfw wdqs (public) - https://phabricator.wikimedia.org/T281498
[16:30:11] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
[16:30:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:30:19] <stashbot>	 T280563: Reboot elasticsearch* and relforge* to apply kernel security updates - https://phabricator.wikimedia.org/T280563
[16:40:21] <wikibugs>	 (03PS1) 10Jbond: sshd review: do not merge [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/684443 (https://phabricator.wikimedia.org/T276148)
[16:43:10] <icinga-wm>	 PROBLEM - Check systemd state on elastic1060 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9400.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:43:16] <jynus>	 (finised meeting) bstorm, so talk to manuel on ticket
[16:43:25] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install phab2002 - https://phabricator.wikimedia.org/T280544 (10Papaul)
[16:43:36] <jynus>	 but the model that was used for old labsdbs may need some tweaks
[16:43:38] <icinga-wm>	 PROBLEM - Check systemd state on elastic1066 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9600.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:43:52] <jynus>	 either on how much memory is being used per instance
[16:43:59] <jynus>	 or the monitoring limits of it
[16:44:06] <bstorm>	 Yeah. Will do :)
[16:44:22] <jynus>	 tuning is very workload dependent- so what works for produciton won't work for clouddbs
[16:44:45] <jynus>	 of both resources and monitoring
[16:45:11] <wikibugs>	 (03PS1) 10Ladsgroup: mailman3: Copy the config file before disabling the list [puppet] - 10https://gerrit.wikimedia.org/r/684444 (https://phabricator.wikimedia.org/T280322)
[16:45:12] <icinga-wm>	 RECOVERY - Check systemd state on elastic1060 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:53:14] <wikibugs>	 (03PS2) 10Ladsgroup: mailman3: Copy the config file before disabling the list [puppet] - 10https://gerrit.wikimedia.org/r/684444 (https://phabricator.wikimedia.org/T280322)
[16:55:03] <wikibugs>	 (03PS2) 10Jbond: C:gitlab::ssh: add new gilab::ssh class [puppet] - 10https://gerrit.wikimedia.org/r/684437
[16:55:33] <wikibugs>	 (03CR) 10Legoktm: [C: 03+2] mailman3: Copy the config file before disabling the list [puppet] - 10https://gerrit.wikimedia.org/r/684444 (https://phabricator.wikimedia.org/T280322) (owner: 10Ladsgroup)
[16:57:41] <wikibugs>	 10SRE, 10GitLab (Initialization), 10Patch-For-Review, 10Release-Engineering-Team (Doing), 10User-brennen: SSH Access of Git data in GitLab - https://phabricator.wikimedia.org/T276148 (10jbond) >>! In T276148#7050178, @Sergey.Trofimovsky.SF wrote: > Here it is, requesting settings review: >  > https://ger...
[16:58:30] <wikibugs>	 (03PS9) 10JMeybohm: kube-apiserver: Update admission controller config [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[17:00:05] <jouncebot>	 ryankemper: Your horoscope predicts another unfortunate Wikidata Query Service weekly deploy deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210503T1700).
[17:11:24] <icinga-wm>	 RECOVERY - Check systemd state on elastic1066 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[17:12:35] <wikibugs>	 (03PS3) 10Jdlrobson: Enable new language button for all logged in users outside test projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682758 (https://phabricator.wikimedia.org/T280526)
[17:12:51] <wikibugs>	 (03CR) 10Jdlrobson: "Jan will deploy this at 11am UTC tomorrow." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682758 (https://phabricator.wikimedia.org/T280526) (owner: 10Jdlrobson)
[17:14:19] <wikibugs>	 (03CR) 10JMeybohm: "Think I finally managed. PCC with ml clusters not explicitly enabling any admission plugins (for me to test the struct type etc.) at: http" [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063) (owner: 10JMeybohm)
[17:14:47] <wikibugs>	 (03PS10) 10JMeybohm: kube-apiserver: Update admission controller config [puppet] - 10https://gerrit.wikimedia.org/r/677922 (https://phabricator.wikimedia.org/T270063)
[17:17:04] <wikibugs>	 10SRE, 10SRE-Access-Requests: SRE Onboarding for Marc Mandere - https://phabricator.wikimedia.org/T281344 (10Dzahn)
[17:18:26] <wikibugs>	 10SRE, 10SRE-Access-Requests: SRE Onboarding for Marc Mandere - https://phabricator.wikimedia.org/T281344 (10Dzahn) @MMandere Just subscribed you to 2 new mailing lists called "ops". That is the name we had before we became SRE.  also see:  https://lists.wikimedia.org/mailman/listinfo/ops  https://lists.wikime...
[17:20:09] <hashar>	 !log Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737
[17:20:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:17] <stashbot>	 T281737: Zuul can't stop jobs or set the build description - https://phabricator.wikimedia.org/T281737
[17:21:13] <wikibugs>	 (03PS1) 10Volans: setup.py: relax elasticsearch dependencies [software/spicerack] - 10https://gerrit.wikimedia.org/r/684476
[17:25:33] <wikibugs>	 (03PS2) 10Jdlrobson: Prepare for new configuration option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/683720 (https://phabricator.wikimedia.org/T277951)
[17:29:49] <wikibugs>	 (03CR) 10Ryan Kemper: [C: 03+1] "LGTM! Thanks for the detailed context in the commit" [software/spicerack] - 10https://gerrit.wikimedia.org/r/684476 (owner: 10Volans)
[17:30:41] <wikibugs>	 (03CR) 10Volans: "> Patch Set 1: Code-Review+1" [software/spicerack] - 10https://gerrit.wikimedia.org/r/684476 (owner: 10Volans)
[17:37:55] <wikibugs>	 (03CR) 10Bstorm: "I see you found the traffic quirks in this over at If3e7a29b5c17a012cdd2" [dns] - 10https://gerrit.wikimedia.org/r/684099 (owner: 10Majavah)
[17:39:42] <wikibugs>	 (03CR) 10Bstorm: "Adding some traffic team folks in case there are gotchas around that." [puppet] - 10https://gerrit.wikimedia.org/r/684100 (owner: 10Majavah)
[17:44:34] <logmsgbot>	 !log ryankemper@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
[17:44:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:44:43] <stashbot>	 T280563: Reboot elasticsearch* and relforge* to apply kernel security updates - https://phabricator.wikimedia.org/T280563
[17:48:13] <wikibugs>	 (03CR) 10Bstorm: "Fun question that I don't have the answer to just yet: Is TLS is done by envoy, can we simply add the new name to hieradata/role/common/wm" [puppet] - 10https://gerrit.wikimedia.org/r/684100 (owner: 10Majavah)
[17:48:19] <wikibugs>	 (03CR) 10Volans: [C: 03+2] setup.py: relax elasticsearch dependencies [software/spicerack] - 10https://gerrit.wikimedia.org/r/684476 (owner: 10Volans)
[17:55:26] <wikibugs>	 10SRE, 10SRE-Access-Requests: SRE Onboarding for Marc Mandere - https://phabricator.wikimedia.org/T281344 (10Dzahn)
[17:55:52] <wikibugs>	 10SRE, 10SRE-Access-Requests: SRE Onboarding for Marc Mandere - https://phabricator.wikimedia.org/T281344 (10Dzahn) - added to private exim aliases incl. root and dns-admin
[17:56:16] <wikibugs>	 (03Merged) 10jenkins-bot: setup.py: relax elasticsearch dependencies [software/spicerack] - 10https://gerrit.wikimedia.org/r/684476 (owner: 10Volans)
[18:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210503T1800). Please do the needful.
[18:00:05] <jouncebot>	 jan_drewniak, tgr, and Majavah: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[18:00:09] <Majavah>	 here
[18:00:15] <jan_drewniak>	 o/
[18:00:16] <Majavah>	 mine are both beta only
[18:00:34] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Upgrade mailing lists from mailman2 to 3 in batches - https://phabricator.wikimedia.org/T280322 (10Ladsgroup)
[18:00:45] <tgr_>	 mine is also beta only
[18:01:24] <Urbanecm>	 I can deploy today
[18:01:40] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10Dzahn) @xSavitar Hi, ticket looks good. I'll handle it as the "clinic duty" person this week.  @thcipriani Let's start with your approval. Do you approve?
[18:02:23] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Hotfix: loadRelatedArticles should consider existence of container element [extensions/RelatedArticles] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684393 (https://phabricator.wikimedia.org/T281547) (owner: 10Jdrewniak)
[18:02:27] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10thcipriani)
[18:02:43] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] [beta] GrowthExperiments: make link recommendations default in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684436 (owner: 10Gergő Tisza)
[18:02:57] <Urbanecm>	 tgr_: +2'ed, will be live within ~30 minutes (but i bet you know it :))
[18:04:01] <wikibugs>	 (03PS2) 10Urbanecm: beta: Use upload.wikimedia.beta.wmflabs.o for uploads [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684381 (https://phabricator.wikimedia.org/T281650) (owner: 10Majavah)
[18:04:06] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] beta: Use upload.wikimedia.beta.wmflabs.o for uploads [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684381 (https://phabricator.wikimedia.org/T281650) (owner: 10Majavah)
[18:04:08] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10thcipriani) >>! In T281564#7054274, @Dzahn wrote: > @xSavitar Hi, ticket looks good. I'll handle it as the "clinic duty" person this week. >  > @thcipriani Let's start with your approva...
[18:05:01] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] beta: Switch to deployment-urldownloader03 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684384 (owner: 10Majavah)
[18:05:30] <Urbanecm>	 Majavah: +2'ed, will get deployed soon :). I'll sync CS.php changes to prod too, althrough they should be no-op.
[18:05:38] <wikibugs>	 (03Merged) 10jenkins-bot: [beta] GrowthExperiments: make link recommendations default in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684436 (owner: 10Gergő Tisza)
[18:05:41] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Upgrade mailing lists from mailman2 to 3 in batches - https://phabricator.wikimedia.org/T280322 (10Ladsgroup) Group A is done.  It was really messy.  - There are wildcard bans being overriden by userlist, please don't do that.  - Some mailing lists simply don't have an owner (!)
[18:06:05] <Majavah>	 ty!
[18:07:07] <wikibugs>	 (03Merged) 10jenkins-bot: beta: Switch to deployment-urldownloader03 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684384 (owner: 10Majavah)
[18:09:49] <wikibugs>	 (03PS3) 10Urbanecm: beta: Use upload.wikimedia.beta.wmflabs.o for uploads [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684381 (https://phabricator.wikimedia.org/T281650) (owner: 10Majavah)
[18:09:55] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] beta: Use upload.wikimedia.beta.wmflabs.o for uploads [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684381 (https://phabricator.wikimedia.org/T281650) (owner: 10Majavah)
[18:11:54] <wikibugs>	 (03Merged) 10jenkins-bot: beta: Use upload.wikimedia.beta.wmflabs.o for uploads [mediawiki-config] - 10https://gerrit.wikimedia.org/r/684381 (https://phabricator.wikimedia.org/T281650) (owner: 10Majavah)
[18:14:39] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/CommonSettings.php: bc1bc903169e4982c0c5a930094bed9f22616293: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s)
[18:14:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:14:47] <stashbot>	 T281650: Move upload.beta.wmflabs.org to upload.wikimedia.beta.wmflabs.org - https://phabricator.wikimedia.org/T281650
[18:15:54] <wikibugs>	 (03Merged) 10jenkins-bot: Hotfix: loadRelatedArticles should consider existence of container element [extensions/RelatedArticles] (wmf/1.37.0-wmf.3) - 10https://gerrit.wikimedia.org/r/684393 (https://phabricator.wikimedia.org/T281547) (owner: 10Jdrewniak)
[18:15:57] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/filebackend.php: bc1bc903169e4982c0c5a930094bed9f22616293: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s)
[18:16:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:16:41] <Urbanecm>	 jan_drewniak: pulled onto mwdebug1001, can you test it there, please?
[18:16:52] <jan_drewniak>	 Urbanecm: sure thing
[18:17:32] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:18:10] <jan_drewniak>	 Urbanecm: ok, looks good
[18:18:14] <Urbanecm>	 thanks, syncing
[18:18:31] <wikibugs>	 (03PS1) 10Ottomata: refine - Remove webproxy for eventlogging_analytics job [puppet] - 10https://gerrit.wikimedia.org/r/684482 (https://phabricator.wikimedia.org/T247510)
[18:19:49] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da3bf272d33c2d9b29d9172b1c81bfd8beb: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s)
[18:19:55] <Urbanecm>	 jan_drewniak: should be live
[18:19:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:57] <stashbot>	 T281547: TypeError: Cannot read property 'top' of undefined - https://phabricator.wikimedia.org/T281547
[18:19:58] <Urbanecm>	 anything else, anyone? :)
[18:20:06] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:20:07] <jan_drewniak>	 Urbanecm: thanks!
[18:20:10] <Urbanecm>	 any time
[18:20:55] <Urbanecm>	 !log Morning B&C window done
[18:21:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:21:10] <wikibugs>	 (03CR) 10Ottomata: [C: 03+2] refine - Remove webproxy for eventlogging_analytics job [puppet] - 10https://gerrit.wikimedia.org/r/684482 (https://phabricator.wikimedia.org/T247510) (owner: 10Ottomata)
[18:29:34] <tabbycat>	 Amir1: hi, you there?
[18:30:57] <legoktm>	 tabbycat: is it mailing list related?
[18:31:13] <tabbycat>	 legoktm: yup
[18:31:26] <tabbycat>	 it's a bit sensitive too, if you don't mind me PMing?
[18:31:31] <legoktm>	 go for it
[18:31:37] <tabbycat>	 ok thanks
[18:32:09] <Amir1>	 tabbycat: I'm
[18:32:28] <Amir1>	 tabbycat: let me know if there is anything I can do
[18:33:19] <wikibugs>	 (03PS1) 10Bstorm: maintain_dbusers: add new multi-instance analytics dedicated host [puppet] - 10https://gerrit.wikimedia.org/r/684485 (https://phabricator.wikimedia.org/T281287)
[18:33:26] <RhinosF1>	 I found a spelling error on the cu list description
[18:33:49] <RhinosF1>	 Last 3 letters are missing
[18:33:54] <RhinosF1>	 So maybe cut off?
[18:34:17] <Amir1>	 RhinosF1: yeah... I should add that as well
[18:34:49] <RhinosF1>	 I was being nosey to see what got moved
[18:36:53] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10Dzahn) Thanks @thcipriani   I confirm L3 has already been signed as well.
[18:38:36] <tabbycat>	 Amir1: thanks, I'm talking to legoktm about it :)
[18:38:43] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10Dzahn)
[18:48:42] <wikibugs>	 (03PS3) 10Jbond: C:gitlab::ssh: add new gilab::ssh class [puppet] - 10https://gerrit.wikimedia.org/r/684437
[18:48:44] <wikibugs>	 (03PS1) 10Jbond: P:gitlab: add basic gitlab class [puppet] - 10https://gerrit.wikimedia.org/r/684486
[18:48:46] <wikibugs>	 (03PS1) 10Jbond: P:gitlab: manage gitlab with gitlab module [puppet] - 10https://gerrit.wikimedia.org/r/684487
[18:51:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:gitlab: add basic gitlab class [puppet] - 10https://gerrit.wikimedia.org/r/684486 (owner: 10Jbond)
[19:01:45] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10Dzahn) @xSavitar All boxes are checked except the "sign valid NDA with legal".   Assuming you haven't already done this, I will add @KFrancis to get this going.   @KFrancis Hello, this...
[19:01:53] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10Dzahn) p:05Triage→03Medium
[19:02:38] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:03:00] <wikibugs>	 10SRE, 10Dumps-Generation, 10SRE-Access-Requests: Create new group for root access to snapshot*, dumpsdata* and labstore1006,7 with holger in it - https://phabricator.wikimedia.org/T277629 (10Dzahn) 05Open→03Stalled
[19:03:23] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10Dzahn) a:03Dzahn
[19:04:27] <wikibugs>	 10SRE, 10DBA, 10Wikimedia-Mailing-lists: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Dzahn) p:05Triage→03Low
[19:04:38] <wikibugs>	 10SRE, 10Services, 10Service-deployment-requests: New Service Request miscweb - https://phabricator.wikimedia.org/T281538 (10Dzahn) p:05Triage→03Medium
[19:04:52] <wikibugs>	 10SRE, 10Datacenter-Switchover, 10Performance-Team (Radar): June 2021 Datacenter switchover - https://phabricator.wikimedia.org/T281515 (10Dzahn) p:05Triage→03Medium
[19:05:14] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:07:12] <wikibugs>	 10SRE, 10ops-codfw: Degraded RAID on wdqs2007 - https://phabricator.wikimedia.org/T281504 (10Dzahn) p:05Triage→03Medium
[19:07:29] <wikibugs>	 10SRE, 10ops-codfw, 10Wikidata-Query-Service: Degraded RAID on wdqs2007 - https://phabricator.wikimedia.org/T281504 (10Dzahn)
[19:13:34] <wikibugs>	 10SRE, 10ops-codfw, 10Wikidata-Query-Service: Degraded RAID on wdqs2007 - https://phabricator.wikimedia.org/T281504 (10RKemper) a:03RKemper
[19:14:42] <wikibugs>	 10SRE, 10ops-codfw, 10Wikidata-Query-Service: Degraded RAID on wdqs2007 - https://phabricator.wikimedia.org/T281504 (10Dzahn)
[19:14:44] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): hw troubleshooting: ssh unreachable for wdqs2007.codfw.wmnet - https://phabricator.wikimedia.org/T281437 (10Dzahn)
[19:21:03] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Rename mailinglists eliso, and eliso-anoncoj - https://phabricator.wikimedia.org/T281686 (10Ladsgroup) Unfortunately, renaming a mailing list is not that easy in mailman3: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/Q3YHKZKUALBWIESNOQLRBFRNJ6F3O7...
[19:21:29] <ryankemper>	 !log T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead)
[19:21:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:21:41] <stashbot>	 T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382
[19:21:46] <logmsgbot>	 !log ryankemper@puppetmaster1001 conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
[19:21:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:24:35] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer
[19:24:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:24:57] <ryankemper>	 !log T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
[19:25:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:36] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: Mailman3 non-members interface is confusing - https://phabricator.wikimedia.org/T281746 (10Dzahn) p:05Triage→03Medium
[19:32:03] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Rename mailinglists eliso, and eliso-anoncoj - https://phabricator.wikimedia.org/T281686 (10Dzahn) p:05Triage→03Low
[19:36:25] <wikibugs>	 (03PS1) 10Andrew Bogott: wmcs-policy-tests.py: add Trove policy tests [puppet] - 10https://gerrit.wikimedia.org/r/684494 (https://phabricator.wikimedia.org/T279845)
[19:39:03] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
[19:39:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:39:43] <wikibugs>	 (03PS2) 10Jbond: P:gitlab: add basic gitlab class [puppet] - 10https://gerrit.wikimedia.org/r/684486
[19:40:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] P:gitlab: add basic gitlab class [puppet] - 10https://gerrit.wikimedia.org/r/684486 (owner: 10Jbond)
[19:40:54] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
[19:41:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:27] <wikibugs>	 (03CR) 10Herron: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/683695 (https://phabricator.wikimedia.org/T233134) (owner: 10Herron)
[20:00:04] <jouncebot>	 chrisalbon and accraze: Time to snap out of that daydream and deploy Services – Graphoid / ORES. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210503T2000).
[20:22:05] <wikibugs>	 (03CR) 10Herron: "> Patch Set 2:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/683047 (https://phabricator.wikimedia.org/T279342) (owner: 10Herron)
[20:24:01] <wikibugs>	 (03CR) 10Herron: "> Patch Set 2:" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/683706 (https://phabricator.wikimedia.org/T225005) (owner: 10Herron)
[20:25:57] <wikibugs>	 (03CR) 10Herron: "Ok, I think this is ready for another look" [puppet] - 10https://gerrit.wikimedia.org/r/683044 (https://phabricator.wikimedia.org/T225005) (owner: 10Herron)
[20:35:47] <wikibugs>	 10SRE, 10ops-codfw, 10Wikidata-Query-Service: Degraded RAID on wdqs2007 - https://phabricator.wikimedia.org/T281504 (10RKemper) 05Open→03Resolved
[20:35:49] <wikibugs>	 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): hw troubleshooting: ssh unreachable for wdqs2007.codfw.wmnet - https://phabricator.wikimedia.org/T281437 (10RKemper)
[20:36:18] <wikibugs>	 10SRE, 10ops-codfw, 10Wikidata-Query-Service: Degraded RAID on wdqs2007 - https://phabricator.wikimedia.org/T281504 (10RKemper) Re-image of `wdqs2007` was completed successfully.
[20:37:05] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer
[20:37:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:37:30] <ryankemper>	 !log T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
[20:37:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:37:39] <stashbot>	 T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382
[20:42:31] <logmsgbot>	 !log ryankemper@cumin1001 END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
[20:42:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:43:19] <wikibugs>	 10SRE, 10Mail, 10Wikimedia-Mailing-lists: In Mailman3 if a list has no owners, mail goes to root@ - https://phabricator.wikimedia.org/T281753 (10Legoktm)
[20:44:01] <wikibugs>	 10SRE, 10Mail, 10Wikimedia-Mailing-lists: In Mailman3 if a list has no owners, mail goes to root@ - https://phabricator.wikimedia.org/T281753 (10Legoktm)
[20:44:46] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
[20:44:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:46:24] <icinga-wm>	 RECOVERY - Check systemd state on wdqs2007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[20:46:36] <icinga-wm>	 PROBLEM - WDQS high update lag on wdqs1003 is CRITICAL: 4879 ge 3600 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[20:48:24] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: mailman3: Let users choose the UI language - https://phabricator.wikimedia.org/T281747 (10MarcoAurelio) Thanks for your comments. Indeed there's no dropdown to select in which language you'd like to see the UI in. In my case, the translations shown to me are some...
[20:53:17] <wikibugs>	 (03PS5) 10Jdlrobson: Replace $wgRelatedArticlesFooterWhitelistedSkins [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680814 (owner: 10Reedy)
[20:53:34] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] "Reedy can you remove your -2 here? I'll get this deployed later." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680814 (owner: 10Reedy)
[20:54:25] <wikibugs>	 (03PS3) 10Jdlrobson: Prepare for new configuration option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/683720 (https://phabricator.wikimedia.org/T277951)
[20:54:33] <wikibugs>	 (03CR) 10Jdlrobson: [C: 03+1] "Thanks! :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680814 (owner: 10Reedy)
[20:56:31] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: mailman3: Let users choose the UI language - https://phabricator.wikimedia.org/T281747 (10Legoktm)
[20:56:49] <ryankemper>	 !log T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force`
[20:56:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:56:57] <stashbot>	 T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382
[20:57:44] <icinga-wm>	 PROBLEM - Check systemd state on stat1005 is CRITICAL: CRITICAL - degraded: The following units failed: jupyter-aarora-singleuser.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:00:04] <jouncebot>	 Reedy and sbassett: Your horoscope predicts another unfortunate Weekly Security deployment window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210503T2100).
[21:01:01] <wikibugs>	 (03PS2) 10Krinkle: Move ExternalStore log group from debug to error [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682322 (https://phabricator.wikimedia.org/T281048) (owner: 10Reedy)
[21:01:35] <wikibugs>	 (03PS3) 10Krinkle: logging: Raise ExternalStore min level from debug to warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682322 (https://phabricator.wikimedia.org/T281048) (owner: 10Reedy)
[21:02:45] <ryankemper>	 !log T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  975G  1.5T  39% /srv`
[21:02:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:02:54] <stashbot>	 T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382
[21:04:44] <wikibugs>	 10SRE, 10Wikimedia-Mailing-lists: Rename mailinglists eliso, and eliso-anoncoj - https://phabricator.wikimedia.org/T281686 (10Legoktm) >>! In T281686#7054623, @Ladsgroup wrote: > Unfortunately, renaming a mailing list is not that easy in mailman3: > https://lists.mailman3.org/archives/list/mailman-users@mailma...
[21:05:59] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer
[21:06:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:06:07] <ryankemper>	 !log T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
[21:06:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:32] <wikibugs>	 (03PS4) 10Krinkle: logging: Raise ExternalStore min level from debug to warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682322 (https://phabricator.wikimedia.org/T281048) (owner: 10Reedy)
[21:09:40] <ryankemper>	 !log T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
[21:09:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:09:48] <stashbot>	 T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382
[21:10:03] <wikibugs>	 (03CR) 10Krinkle: [C: 03+2] logging: Raise ExternalStore min level from debug to warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682322 (https://phabricator.wikimedia.org/T281048) (owner: 10Reedy)
[21:11:10] <wikibugs>	 (03Merged) 10jenkins-bot: logging: Raise ExternalStore min level from debug to warning [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682322 (https://phabricator.wikimedia.org/T281048) (owner: 10Reedy)
[21:14:44] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-internal_80: Servers wdqs1011.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[21:15:12] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - wdqs-internal_80: Servers wdqs1011.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[21:17:28] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([wdqs1011.eqiad.wmnet]) https://wikitech.wikimedia.org/wiki/PyBal
[21:18:23] <ryankemper>	 ^ looking
[21:19:34] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:19:38] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1015 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([wdqs1011.eqiad.wmnet]) https://wikitech.wikimedia.org/wiki/PyBal
[21:19:50] <logmsgbot>	 !log ryankemper@puppetmaster1001 conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet
[21:19:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:20:20] <icinga-wm>	 PROBLEM - SSH on phab2001.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[21:20:27] <ryankemper>	 !log T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no`
[21:20:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:20:35] <stashbot>	 T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382
[21:22:03] <ryankemper>	 !log [WDQS] `ryankemper@wdqs1003:~$ sudo pool`
[21:22:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:22:34] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[21:22:36] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1015 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[21:22:36] <ryankemper>	 Forcing a re-check to clear these alerts
[21:22:40] <ryankemper>	 (done)
[21:23:26] <icinga-wm>	 RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[21:25:25] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
[21:25:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:27:27] <logmsgbot>	 !log ryankemper@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
[21:27:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:32:31] <logmsgbot>	 !log krinkle@deploy1002 Synchronized wmf-config/InitialiseSettings.php: d95b91648 (duration: 00m 58s)
[21:32:32] <wikibugs>	 10SRE, 10MediaWiki-Revision-backend, 10observability, 10MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), 10Performance-Team (Radar): mwlog1001 is running out of free space on /srv/mw-log - https://phabricator.wikimedia.org/T281048 (10Krinkle)
[21:32:35] <wikibugs>	 10SRE, 10Patch-For-Review: try planet/people on bullseye - https://phabricator.wikimedia.org/T280989 (10Dzahn) >>! In T280989#7050524, @jcrespo wrote: > There is in fact 5 categories, with different meanings and alerting levels, from "Fresh" to "All failures", as seen at: https://wikitech.wikimedia.org/wiki/Ba...
[21:32:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:32:44] <wikibugs>	 10SRE, 10MediaWiki-Revision-backend, 10observability, 10MW-1.37-notes (1.37.0-wmf.3; 2021-04-27), 10Performance-Team (Radar): mwlog1001 is running out of free space on /srv/mw-log - https://phabricator.wikimedia.org/T281048 (10Krinkle) 05Open→03Resolved a:03Krinkle
[21:32:54] <wikibugs>	 10SRE, 10MediaWiki-Revision-backend, 10Performance-Team, 10observability, 10MW-1.37-notes (1.37.0-wmf.3; 2021-04-27): mwlog1001 is running out of free space on /srv/mw-log - https://phabricator.wikimedia.org/T281048 (10Krinkle)
[21:36:34] <wikibugs>	 (03PS1) 10Dzahn: Revert "bacula: add people1003 job to monitoring ignorelist" [puppet] - 10https://gerrit.wikimedia.org/r/684463
[21:40:33] <wikibugs>	 (03CR) 10Bstorm: "This looks like a good idea. The only thing that gives me pause is whether or not this file is actually used for anything and if we should" [puppet] - 10https://gerrit.wikimedia.org/r/684115 (https://phabricator.wikimedia.org/T198673) (owner: 10Krinkle)
[21:42:37] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[21:42:54] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10KFrancis) @Dzahn I am confirming Alangi Derick has a signed NDA on file with legal.  Thanks!
[21:46:00] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
[21:46:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:46:09] <stashbot>	 T280563: Reboot elasticsearch* and relforge* to apply kernel security updates - https://phabricator.wikimedia.org/T280563
[21:47:25] <ryankemper>	 !log T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
[21:47:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:51:12] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "bacula: add people1003 job to monitoring ignorelist" [puppet] - 10https://gerrit.wikimedia.org/r/684463 (owner: 10Dzahn)
[21:52:43] <logmsgbot>	 !log ryankemper@cumin1001 END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
[21:52:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:52:51] <stashbot>	 T280563: Reboot elasticsearch* and relforge* to apply kernel security updates - https://phabricator.wikimedia.org/T280563
[21:54:24] <wikibugs>	 10SRE, 10Patch-For-Review: try planet/people on bullseye - https://phabricator.wikimedia.org/T280989 (10Dzahn) I reverted the addition to the ignore list.  Setup is done, there is no reason why it should fail. Let's see what happens.  I am refresh Icinga etc.   https://gerrit.wikimedia.org/r/c/operations/puppe...
[21:54:48] <ryankemper>	 !log T280563 eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))`
[21:54:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:55:02] <logmsgbot>	 !log ryankemper@cumin1001 START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
[21:55:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:56:04] <ryankemper>	 !log T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
[21:56:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:00:44] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on lists1001 is CRITICAL: CRITICAL - degraded: The following units failed: check_exclude_backups.service daniel_zahn https://phabricator.wikimedia.org/T280744 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:02:10] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1131 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[22:02:32] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1131 is CRITICAL: CRITICAL - degraded: The following units failed: hadoop-yarn-nodemanager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:02:58] <icinga-wm>	 PROBLEM - Hadoop NodeManager on an-worker1123 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[22:08:38] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[22:10:22] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[22:14:28] <mutante>	 [backup1001:~] $ sudo check_bacula.py --icinga
[22:14:31] <mutante>	 !log [backup1001:~] $ sudo check_bacula.py --icinga
[22:14:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:15:54] <icinga-wm>	 ACKNOWLEDGEMENT - Backup freshness on backup1001 is CRITICAL: All failures: 1 (people1003), Fresh: 102 jobs daniel_zahn https://phabricator.wikimedia.org/T280989#7055122 https://wikitech.wikimedia.org/wiki/Bacula%23Monitoring
[22:16:34] <icinga-wm>	 PROBLEM - Check systemd state on elastic1067 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9600.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:16:34] <icinga-wm>	 PROBLEM - Check systemd state on elastic1045 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9600.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:17:12] <icinga-wm>	 PROBLEM - Check systemd state on an-worker1123 is CRITICAL: CRITICAL - degraded: The following units failed: hadoop-yarn-nodemanager.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:17:22] <legoktm>	 !log ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l
[22:17:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:17:58] <icinga-wm>	 PROBLEM - Check systemd state on elastic1034 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9400.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:17:59] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10Dzahn) Thank you @KFrancis , perfect. Will go ahead.
[22:18:16] <wikibugs>	 10SRE, 10SRE-Access-Requests: Requesting access to Deployment shell for derick - https://phabricator.wikimedia.org/T281564 (10Dzahn)
[22:24:42] <wikibugs>	 10SRE, 10Patch-For-Review: try planet/people on bullseye - https://phabricator.wikimedia.org/T280989 (10Dzahn) ` == jobs_with_all_failures (1) ==  people1003.eqiad.wmnet-Monthly-1st-Sun-production-home  `   ` [backup1001:~] $ sudo check_bacula.py people1003.eqiad.wmnet-Monthly-1st-Sun-production-home 2021-04-2...
[22:25:00] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1131 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[22:25:10] <icinga-wm>	 RECOVERY - Check systemd state on elastic1067 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:25:26] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1131 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:25:52] <icinga-wm>	 RECOVERY - Check systemd state on an-worker1123 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:25:58] <icinga-wm>	 RECOVERY - Hadoop NodeManager on an-worker1123 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Alerts%23Yarn_Nodemanager_process
[22:26:44] <icinga-wm>	 PROBLEM - Check systemd state on elastic1032 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9400.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:27:26] <icinga-wm>	 RECOVERY - Check systemd state on elastic1045 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:29:50] <icinga-wm>	 RECOVERY - WDQS high update lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1154 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[22:35:24] <icinga-wm>	 PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: CRITICAL - failed 44 probes of 718 (alerts on 35) - https://atlas.ripe.net/measurements/11645085/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[22:40:26] <icinga-wm>	 RECOVERY - Check systemd state on elastic1032 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:42:42] <icinga-wm>	 RECOVERY - Check systemd state on elastic1034 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:50:06] <icinga-wm>	 PROBLEM - Check systemd state on elastic1033 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9400.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:50:24] <icinga-wm>	 PROBLEM - Check systemd state on elastic1048 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9600.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[22:53:42] <icinga-wm>	 PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 40 probes of 721 (alerts on 35) - https://atlas.ripe.net/measurements/1790945/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[22:56:46] <icinga-wm>	 RECOVERY - Check systemd state on elastic1033 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:00:04] <jouncebot>	 RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Evening backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210503T2300). Please do the needful.
[23:00:04] <jouncebot>	 Jdlrobson: A patch you scheduled for Evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:00:13] <Urbanecm>	 I can deploy today
[23:00:28] <Urbanecm>	 Jdlrobson: around? 🙂
[23:00:52] <icinga-wm>	 PROBLEM - IPv4 ping to esams on ripe-atlas-esams is CRITICAL: CRITICAL - failed 39 probes of 716 (alerts on 35) - https://atlas.ripe.net/measurements/23449935/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:01:36] <icinga-wm>	 RECOVERY - Check systemd state on elastic1048 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:01:42] <Jdlrobson>	 hello
[23:01:46] <Jdlrobson>	 im here :)
[23:01:52] <icinga-wm>	 RECOVERY - Check systemd state on snapshot1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:03:45] <Urbanecm>	 cool :)
[23:04:16] <Urbanecm>	 could someone from sre confirm the IPv4 ping alerts are not a reason to worry please?
[23:05:04] <wikibugs>	 10SRE, 10Patch-For-Review: try planet/people on bullseye - https://phabricator.wikimedia.org/T280989 (10jcrespo) I am answering from mail- apologies for any formatting errors.  I can have a deeper look tomorrow.  But first...,  one important thing I forgot to communicate: please do not ack/downtime the bacula...
[23:08:41] <Jdlrobson>	 cdanis @elukey are these a concern?  ^
[23:08:49] <rzl>	 Urbanecm: not sure exactly what's going on there, but you're fine to deploy
[23:09:03] <Urbanecm>	 thanks a lot. Let's start then :).
[23:09:06] <Jdlrobson>	 Thanks rzl 
[23:09:24] <rzl>	 thanks for checking!
[23:09:56] <wikibugs>	 (03PS6) 10Urbanecm: Replace $wgRelatedArticlesFooterWhitelistedSkins [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680814 (owner: 10Reedy)
[23:10:09] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Replace $wgRelatedArticlesFooterWhitelistedSkins [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680814 (owner: 10Reedy)
[23:10:56] <wikibugs>	 (03Merged) 10jenkins-bot: Replace $wgRelatedArticlesFooterWhitelistedSkins [mediawiki-config] - 10https://gerrit.wikimedia.org/r/680814 (owner: 10Reedy)
[23:12:17] <Urbanecm>	 Jdlrobson: pulled onto mwdebug1001, can you check?
[23:12:21] <wikibugs>	 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install an-web1001 - https://phabricator.wikimedia.org/T281787 (10RobH)
[23:12:22] <Jdlrobson>	 looking
[23:12:30] <wikibugs>	 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install an-web1001 - https://phabricator.wikimedia.org/T281787 (10RobH)
[23:13:21] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wikireplicas: redirect all database CNAMEs to the new system [puppet] - 10https://gerrit.wikimedia.org/r/683929 (https://phabricator.wikimedia.org/T260389) (owner: 10Bstorm)
[23:13:23] <Jdlrobson>	 Urbanecm: LGTM, that one can be synced.
[23:13:33] <Urbanecm>	 thanks, syncing
[23:14:03] <logmsgbot>	 !log urbanecm@deploy1002 sync-file aborted: 7c47ee17b3936fb1f79590187a9e0028276e4a9d: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958)¨ (duration: 00m 01s)
[23:14:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:14:12] <wikibugs>	 (03PS4) 10Urbanecm: Prepare for new configuration option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/683720 (https://phabricator.wikimedia.org/T277951) (owner: 10Jdlrobson)
[23:14:12] <stashbot>	 T277958: Address Voice and Tone issues in RelatedArticles - https://phabricator.wikimedia.org/T277958
[23:14:17] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Prepare for new configuration option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/683720 (https://phabricator.wikimedia.org/T277951) (owner: 10Jdlrobson)
[23:15:03] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 7c47ee17b3936fb1f79590187a9e0028276e4a9d: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958) (duration: 00m 57s)
[23:15:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:15:13] <Urbanecm>	 should be live Jdlrobson 
[23:15:27] <Jdlrobson>	 Urbanecm: yay
[23:15:31] <Jdlrobson>	 will watch the logs
[23:15:36] <wikibugs>	 (03Merged) 10jenkins-bot: Prepare for new configuration option [mediawiki-config] - 10https://gerrit.wikimedia.org/r/683720 (https://phabricator.wikimedia.org/T277951) (owner: 10Jdlrobson)
[23:15:41] <Urbanecm>	 Jdlrobson: i appreciate that
[23:16:03] <Urbanecm>	 Jdlrobson: second one is on mwdebug1001, please test
[23:16:09] <Jdlrobson>	 Urbanecm: on it..
[23:17:18] <Jdlrobson>	 Urbanecm: this one is also good to go. 
[23:17:24] <Urbanecm>	 excellent, syncing
[23:18:51] <logmsgbot>	 !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 230ef5716b34ca83348667f289180313b76ce8a3: Prepare for new configuration option (T277951) (duration: 00m 57s)
[23:18:54] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[23:18:57] <Urbanecm>	 and also live.
[23:18:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:19:00] <Urbanecm>	 anything else Jdlrobson ?
[23:19:01] <stashbot>	 T277951: Address Voice and Tone issues in MobileFrontend - https://phabricator.wikimedia.org/T277951
[23:19:49] <Jdlrobson>	 Urbanecm: nope that's it (provided no log spikes in next 10 mins)
[23:19:57] <Jdlrobson>	 I'll keep an eye on things but risk is low
[23:20:02] <Jdlrobson>	 thanks for your help!
[23:20:05] <Jdlrobson>	 glad it was a quick one!
[23:20:18] <Urbanecm>	 okay. I should be reachable for next half an hour or so should you need me :)
[23:21:10] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[23:21:47] <Jdlrobson>	 good to know :)
[23:22:44] <icinga-wm>	 RECOVERY - SSH on phab2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[23:25:54] <icinga-wm>	 RECOVERY - IPv4 ping to esams on ripe-atlas-esams is OK: OK - failed 35 probes of 716 (alerts on 35) - https://atlas.ripe.net/measurements/23449935/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:29:12] <icinga-wm>	 PROBLEM - WDQS SPARQL on wdqs1011 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string http://www.w3.org/2001/XML... not found on https://query.wikidata.org:443/bigdata/namespace/wdq/sparql?query=SELECT%20*%20WHERE%20%7Bwikibase%3ADump%20schema%3AdateModified%20%3Fy%7D%20LIMIT%201 - 532 bytes in 1.075 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook
[23:35:40] <Jdlrobson>	 Urbanecm: i think we're safe :) Hope you have a good evening/night/morning!
[23:35:48] <Urbanecm>	 you too!
[23:37:04] <icinga-wm>	 PROBLEM - IPv4 ping to esams on ripe-atlas-esams is CRITICAL: CRITICAL - failed 40 probes of 716 (alerts on 35) - https://atlas.ripe.net/measurements/23449935/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas
[23:40:35] <wikibugs>	 (03CR) 10Bstorm: "I was a little worried about using a heredoc, but puppet compiler says this is totally legit! https://puppet-compiler.wmflabs.org/compiler" [puppet] - 10https://gerrit.wikimedia.org/r/684485 (https://phabricator.wikimedia.org/T281287) (owner: 10Bstorm)
[23:42:41] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] maintain_dbusers: add new multi-instance analytics dedicated host [puppet] - 10https://gerrit.wikimedia.org/r/684485 (https://phabricator.wikimedia.org/T281287) (owner: 10Bstorm)
[23:57:26] <icinga-wm>	 PROBLEM - Check systemd state on elastic1053 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9400.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:57:54] <icinga-wm>	 PROBLEM - Check systemd state on elastic1035 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9600.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[23:58:48] <icinga-wm>	 PROBLEM - Check systemd state on elastic1044 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-wmf-elasticsearch-exporter-9200.service,prometheus-wmf-elasticsearch-exporter-9600.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state