[00:00:51] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:01:59] RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:13:24] !log installing python3-dbg on lists1001 [00:13:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:30:53] PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The following units failed: docker-reporter-base-images.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:49:01] PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 14595355760 and 1176 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:21:07] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [01:23:43] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [01:30:41] PROBLEM - mailman list info on lists1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [01:30:56] yeah yeah [01:31:03] !log restarted mailman3-web [01:31:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:33:09] RECOVERY - mailman list info on lists1001 is OK: OK - Certificate lists.wikimedia.org will expire on Wed 30 Jun 2021 09:00:48 AM GMT +0000. https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [01:36:47] RECOVERY - SSH on contint2001.mgmt is OK: SSH OK - OpenSSH_6.6 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:03:55] (03PS1) 10Legoktm: mailman: Disable hyperkitty export endpoint [puppet] - 10https://gerrit.wikimedia.org/r/692108 (https://phabricator.wikimedia.org/T282957) [02:07:12] (03PS2) 10Legoktm: mailman: Disable hyperkitty export endpoint [puppet] - 10https://gerrit.wikimedia.org/r/692108 (https://phabricator.wikimedia.org/T282957) [02:08:04] (03CR) 10Legoktm: [C: 03+2] mailman: Disable hyperkitty export endpoint [puppet] - 10https://gerrit.wikimedia.org/r/692108 (https://phabricator.wikimedia.org/T282957) (owner: 10Legoktm) [02:10:26] !log uninstalled python3-dbg on lists1001 [02:10:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:13:19] mailman3 should stop going down every few hours now [03:30:19] PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 110975776 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [03:33:05] RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 457656 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [04:21:07] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [04:23:47] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [04:25:36] (03PS3) 10Kormat: mariadb: Promote db1173 to s6 eqiad master. [puppet] - 10https://gerrit.wikimedia.org/r/686505 (https://phabricator.wikimedia.org/T282124) [04:27:51] jouncebot: next [04:27:51] In 6 hour(s) and 2 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T1030) [04:31:17] 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Schema-change, 10User-notice: Mailman3 schema change: change utf8 columns to utf8mb4 - https://phabricator.wikimedia.org/T282621 (10Marostegui) From what I can see the tables aren't huge, so it might not take a long time. However, on the last time we had to alte... [04:31:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1124', diff saved to https://phabricator.wikimedia.org/P15974 and previous config saved to /var/cache/conftool/dbconfig/20210517-043148-marostegui.json [04:31:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:32:04] jouncebot: reload [04:32:16] jouncebot: next [04:32:16] In 5 hour(s) and 57 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T1030) [04:32:21] grr [04:35:52] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1112 T280492', diff saved to https://phabricator.wikimedia.org/P15975 and previous config saved to /var/cache/conftool/dbconfig/20210517-043551-marostegui.json [04:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:35:56] T280492: Upgrade all sanitarium masters to 10.4 and Buster - https://phabricator.wikimedia.org/T280492 [04:41:16] jouncebot: refresh [04:41:16] I refreshed my knowledge about deployments. [04:41:24] jouncebot: next [04:41:24] In 0 hour(s) and 18 minute(s): Database primary switchover for s6 (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T0500) [04:41:40] kormat: refresh, not reload :/ [04:41:53] Majavah: yeah, thanks :) [04:43:40] 10SRE, 10ops-eqiad, 10Data-Services, 10Epic, 10cloud-services-team (Hardware): Move labstore1004 and labstore1005 to 10G Ethernet - https://phabricator.wikimedia.org/T266198 (10Bstorm) [04:46:23] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Master switchover s6 T282124 [04:46:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:46:27] T282124: Switchover s6 from db1131 to db1173 - https://phabricator.wikimedia.org/T282124 [04:46:33] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Master switchover s6 T282124 [04:46:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:46:58] !log kormat@cumin1001 dbctl commit (dc=all): 'Set db1173 with weight 0 T282124', diff saved to https://phabricator.wikimedia.org/P15976 and previous config saved to /var/cache/conftool/dbconfig/20210517-044657-kormat.json [04:47:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:50:58] !log marostegui@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE [04:51:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:53:01] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE [04:53:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:05] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210516T0700) [05:00:05] kormat and marostegui: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Database primary switchover for s6 . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T0500). [05:00:10] here [05:00:25] o/ [05:01:44] 3 replicas left to move [05:03:31] (03CR) 10Kormat: [C: 03+2] mariadb: Promote db1173 to s6 eqiad master. [puppet] - 10https://gerrit.wikimedia.org/r/686505 (https://phabricator.wikimedia.org/T282124) (owner: 10Kormat) [05:03:58] (03PS1) 10Marostegui: db1112: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/692111 (https://phabricator.wikimedia.org/T280492) [05:04:45] (03CR) 10Marostegui: [C: 03+2] db1112: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/692111 (https://phabricator.wikimedia.org/T280492) (owner: 10Marostegui) [05:05:05] !log Starting s6 eqiad failover from db1131 to db1173 - T282124 [05:05:08] here we go [05:05:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:05:09] T282124: Switchover s6 from db1131 to db1173 - https://phabricator.wikimedia.org/T282124 [05:05:12] let's go! [05:05:27] !log kormat@cumin1001 dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T282124', diff saved to https://phabricator.wikimedia.org/P15977 and previous config saved to /var/cache/conftool/dbconfig/20210517-050526-kormat.json [05:05:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:05:54] can confirm s6 is read-only [05:06:03] same [05:07:40] !log kormat@cumin1001 dbctl commit (dc=all): 'Promote db1173 to s6 master and set section read-write T282124', diff saved to https://phabricator.wikimedia.org/P15978 and previous config saved to /var/cache/conftool/dbconfig/20210517-050740-kormat.json [05:07:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:07:53] r/w again [05:08:02] I can write [05:09:33] recentchanges seems to be moving [05:09:59] (03CR) 10Kormat: [C: 03+2] wmnet: Update s6-master to db1173 [dns] - 10https://gerrit.wikimedia.org/r/686513 (https://phabricator.wikimedia.org/T282124) (owner: 10Kormat) [05:10:46] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15979 and previous config saved to /var/cache/conftool/dbconfig/20210517-051045-root.json [05:10:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:13] !log kormat@cumin1001 dbctl commit (dc=all): 'Depool db1131 until it's reimaged to buster T282124', diff saved to https://phabricator.wikimedia.org/P15980 and previous config saved to /var/cache/conftool/dbconfig/20210517-051312-kormat.json [05:13:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:17] T282124: Switchover s6 from db1131 to db1173 - https://phabricator.wikimedia.org/T282124 [05:15:12] (03PS1) 10Marostegui: instances.yaml: Remove db1079 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/692113 (https://phabricator.wikimedia.org/T282079) [05:16:35] (03CR) 10Marostegui: [C: 03+2] instances.yaml: Remove db1079 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/692113 (https://phabricator.wikimedia.org/T282079) (owner: 10Marostegui) [05:17:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove db1079 from dbctl T282079', diff saved to https://phabricator.wikimedia.org/P15981 and previous config saved to /var/cache/conftool/dbconfig/20210517-051728-marostegui.json [05:17:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:17:33] T282079: decommission db1079.eqiad.wmnet - https://phabricator.wikimedia.org/T282079 [05:25:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15982 and previous config saved to /var/cache/conftool/dbconfig/20210517-052549-root.json [05:25:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:29:17] (03PS1) 10Marostegui: mariadb: Decommission db1079 [puppet] - 10https://gerrit.wikimedia.org/r/692114 (https://phabricator.wikimedia.org/T282079) [05:32:48] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission for hosts db1079.eqiad.wmnet [05:32:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:40:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15983 and previous config saved to /var/cache/conftool/dbconfig/20210517-054053-root.json [05:40:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:41:30] (03PS1) 10Elukey: admin: add user stei to ldap only [puppet] - 10https://gerrit.wikimedia.org/r/692117 (https://phabricator.wikimedia.org/T282947) [05:42:21] (03CR) 10Marostegui: [C: 03+2] mariadb: Decommission db1079 [puppet] - 10https://gerrit.wikimedia.org/r/692114 (https://phabricator.wikimedia.org/T282079) (owner: 10Marostegui) [05:42:33] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1079.eqiad.wmnet [05:42:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:43:06] (03CR) 10Elukey: [C: 03+2] admin: add user stei to ldap only [puppet] - 10https://gerrit.wikimedia.org/r/692117 (https://phabricator.wikimedia.org/T282947) (owner: 10Elukey) [05:43:53] 10ops-eqiad, 10DBA, 10DC-Ops, 10decommission-hardware: decommission db1079.eqiad.wmnet - https://phabricator.wikimedia.org/T282079 (10Marostegui) a:05Marostegui→03Cmjohnson [05:44:03] 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission db1079.eqiad.wmnet - https://phabricator.wikimedia.org/T282079 (10Marostegui) [05:44:08] 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission db1079.eqiad.wmnet - https://phabricator.wikimedia.org/T282079 (10Marostegui) Ready for DC-Ops [05:44:29] 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [05:46:36] 10SRE, 10Analytics, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021), 10Patch-For-Review: Superset/Turnilo access for User:STei - https://phabricator.wikimedia.org/T282947 (10elukey) 05Open→03Resolved The LDAP user `stei` has a @wikimedia.org email and I see the manager approval f... [05:46:41] 10SRE, 10Analytics, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo (deadline EOD Monday 17) - https://phabricator.wikimedia.org/T282589 (10elukey) [05:55:57] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15984 and previous config saved to /var/cache/conftool/dbconfig/20210517-055556-root.json [05:55:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:32] !log restarting mariadb on db1131 to pick up report_host T266483 [06:01:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:01:36] T266483: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 [06:12:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15985 and previous config saved to /var/cache/conftool/dbconfig/20210517-061232-marostegui.json [06:12:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:26] (03CR) 10Elukey: "I have used the instructions in https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/#configuring-ingress-using-a" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/688211 (https://phabricator.wikimedia.org/T278192) (owner: 10Elukey) [06:50:07] (03PS1) 10Marostegui: install_server: Remove labsdb entry [puppet] - 10https://gerrit.wikimedia.org/r/692246 (https://phabricator.wikimedia.org/T282662) [06:52:17] (03CR) 10Ayounsi: [C: 03+1] cr/firewall.conf: allow openstack Trove port TCP/8779 [homer/public] - 10https://gerrit.wikimedia.org/r/691140 (https://phabricator.wikimedia.org/T282809) (owner: 10Arturo Borrero Gonzalez) [07:09:53] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:12:07] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:18:47] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:20:59] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:22:05] PROBLEM - Thanos compact has not run on alert1001 is CRITICAL: 4.503e+05 ge 24 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact [07:28:24] 10SRE: Offboard Cas - https://phabricator.wikimedia.org/T282993 (10MoritzMuehlenhoff) [07:33:23] RECOVERY - Thanos compact has not run on alert1001 is OK: (C)24 ge (W)12 ge 0.01752 https://wikitech.wikimedia.org/wiki/Thanos%23Alerts https://grafana.wikimedia.org/d/651943d05a8123e32867b4673963f42b/thanos-compact [07:54:10] (03CR) 10Muehlenhoff: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/690405 (owner: 10Jbond) [07:57:09] (03PS1) 10Ema: varnish: fix check_vcl_reload state check [puppet] - 10https://gerrit.wikimedia.org/r/692249 (https://phabricator.wikimedia.org/T282880) [07:58:11] 10SRE, 10Traffic, 10Patch-For-Review: Revisit varnish dynamic backends mechanism - https://phabricator.wikimedia.org/T282880 (10ema) p:05Triage→03Medium [08:03:10] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: Remove absented crons [puppet] - 10https://gerrit.wikimedia.org/r/689787 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [08:03:54] (03CR) 10JMeybohm: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29575/console" [puppet] - 10https://gerrit.wikimedia.org/r/691106 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [08:04:56] (03CR) 10JMeybohm: [V: 03+1 C: 03+1] docker-registry: Clean up old nginx http endpoint [puppet] - 10https://gerrit.wikimedia.org/r/691106 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [08:05:26] (03CR) 10JMeybohm: [C: 03+1] docker-registry: Remove Docker-Distribution-API-version header [puppet] - 10https://gerrit.wikimedia.org/r/691107 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [08:05:53] (03CR) 10JMeybohm: [C: 03+1] docker-registry: Remove absented nginx-site resource [puppet] - 10https://gerrit.wikimedia.org/r/691110 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [08:06:21] 10Puppet, 10SRE, 10SRE-tools, 10Patch-For-Review, and 4 others: Forward port Python2 files to Python3 in Puppet Repository - https://phabricator.wikimedia.org/T247364 (10MoritzMuehlenhoff) a:05crusnov→03None [08:07:23] (03PS1) 10Muehlenhoff: Revert "openldap/offboard-user.py: Port to Python 3" [puppet] - 10https://gerrit.wikimedia.org/r/692251 (https://phabricator.wikimedia.org/T247364) [08:09:27] (03CR) 10Volans: [C: 03+1] "Safe path for now." [puppet] - 10https://gerrit.wikimedia.org/r/692251 (https://phabricator.wikimedia.org/T247364) (owner: 10Muehlenhoff) [08:11:16] (03PS1) 10Muehlenhoff: Remove access for crusnov [puppet] - 10https://gerrit.wikimedia.org/r/692252 [08:12:00] (03CR) 10Ayounsi: [C: 03+2] Remove Ariel from network devices [homer/public] - 10https://gerrit.wikimedia.org/r/689718 (owner: 10Ayounsi) [08:12:44] (03Merged) 10jenkins-bot: Remove Ariel from network devices [homer/public] - 10https://gerrit.wikimedia.org/r/689718 (owner: 10Ayounsi) [08:15:49] 10SRE: Offboard Cas - https://phabricator.wikimedia.org/T282993 (10MoritzMuehlenhoff) [08:16:29] (03CR) 10Muehlenhoff: [C: 03+2] Remove access for crusnov [puppet] - 10https://gerrit.wikimedia.org/r/692252 (owner: 10Muehlenhoff) [08:19:55] jouncebot: now [08:19:55] No deployments scheduled for the next 2 hour(s) and 10 minute(s) [08:20:13] (03CR) 10Muehlenhoff: [C: 03+2] Revert "openldap/offboard-user.py: Port to Python 3" [puppet] - 10https://gerrit.wikimedia.org/r/692251 (https://phabricator.wikimedia.org/T247364) (owner: 10Muehlenhoff) [08:20:32] (03PS2) 10Urbanecm: Add svwiki 20th anniversary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687785 (https://phabricator.wikimedia.org/T282389) (owner: 10Zabe) [08:20:37] (03CR) 10Urbanecm: [C: 03+2] Add svwiki 20th anniversary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687785 (https://phabricator.wikimedia.org/T282389) (owner: 10Zabe) [08:20:43] (03PS2) 10Urbanecm: Use svwiki 20th anniversary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687787 (https://phabricator.wikimedia.org/T282389) (owner: 10Zabe) [08:20:46] (03CR) 10Urbanecm: [C: 03+2] Use svwiki 20th anniversary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687787 (https://phabricator.wikimedia.org/T282389) (owner: 10Zabe) [08:21:28] (03Merged) 10jenkins-bot: Add svwiki 20th anniversary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687785 (https://phabricator.wikimedia.org/T282389) (owner: 10Zabe) [08:21:31] (03Merged) 10jenkins-bot: Use svwiki 20th anniversary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687787 (https://phabricator.wikimedia.org/T282389) (owner: 10Zabe) [08:24:53] !log urbanecm@deploy1002 Synchronized static/images/project-logos/: 0f356a3: Add svwiki 20th anniversary logos (T282389) (duration: 01m 12s) [08:24:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:24:58] T282389: Requesting temporary logo change for sv.wikipedia.org - https://phabricator.wikimedia.org/T282389 [08:26:43] !log urbanecm@deploy1002 Synchronized wmf-config/logos.php: 93e61f7: Use svwiki 20th anniversary logos (T282389) (duration: 01m 08s) [08:26:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:57] RECOVERY - BGP status on cr2-esams is OK: BGP OK - up: 428, down: 2, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [08:27:12] (03CR) 10JMeybohm: docker-registry: Re-apply Cache-Control rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/691108 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [08:28:36] !log wikiadmin@10.64.48.109(centralauth)> delete from global_group_restrictions where ggr_group="Indic_Bots"; # T282968 [08:28:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:39] T282968: Remove wikiset restriction for non-existent global group - https://phabricator.wikimedia.org/T282968 [08:30:54] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15986 and previous config saved to /var/cache/conftool/dbconfig/20210517-083053-root.json [08:30:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:40] 10SRE: Offboard Cas - https://phabricator.wikimedia.org/T282993 (10fgiunchedi) [08:35:15] (03CR) 10Elukey: "Little "nit": today I tried from a fresh install of minikube, and with the option of never pulling images from remote I got this error whi" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/688211 (https://phabricator.wikimedia.org/T278192) (owner: 10Elukey) [08:35:21] Anyone around today that can help sort out a helmfile issue for a service? [08:36:06] When I try to upgrade the server it downgrades the chart, so I've got one server running the new chart and an old version, and another running the old version and the new chart. [08:36:52] deployment-prep is all correct/current, so I'm not sure how to fix the helmfiles. [08:37:35] (03PS1) 10Muehlenhoff: Remove Cas from Icinga config [puppet] - 10https://gerrit.wikimedia.org/r/692253 [08:39:11] (03CR) 10JMeybohm: [C: 04-1] httpd: add a resursive chmod to ensure log files are group writable (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/691293 (owner: 10Dzahn) [08:39:24] 10Puppet, 10SRE-tools, 10Python3-Porting, 10User-crusnov, 10User-jbond: Port dstat related scripts to Python 3 - https://phabricator.wikimedia.org/T277910 (10Volans) a:05crusnov→03None [08:41:05] (03CR) 10MMandere: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/692249 (https://phabricator.wikimedia.org/T282880) (owner: 10Ema) [08:43:11] (03CR) 10Muehlenhoff: [C: 03+2] Remove Cas from Icinga config [puppet] - 10https://gerrit.wikimedia.org/r/692253 (owner: 10Muehlenhoff) [08:44:05] 10SRE: Offboard Cas - https://phabricator.wikimedia.org/T282993 (10MoritzMuehlenhoff) [08:45:00] !log depool cp5016 [08:45:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15987 and previous config saved to /var/cache/conftool/dbconfig/20210517-084557-root.json [08:45:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:02] (03PS2) 10Ema: varnish: fix check_vcl_reload state check [puppet] - 10https://gerrit.wikimedia.org/r/692249 (https://phabricator.wikimedia.org/T282880) [08:48:31] PROBLEM - puppet last run on cp5016 is CRITICAL: CRITICAL: Puppet last ran 2 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:48:59] awesome timing.. I'm running puppet there right now [08:49:15] (03CR) 10Ema: [C: 03+2] varnish: fix check_vcl_reload state check [puppet] - 10https://gerrit.wikimedia.org/r/692249 (https://phabricator.wikimedia.org/T282880) (owner: 10Ema) [08:52:16] (03CR) 10JMeybohm: [C: 03+1] "Just a wording nit remaining from my side, so I'll +1 ! 🎉" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/670220 (https://phabricator.wikimedia.org/T265327) (owner: 10Giuseppe Lavagetto) [08:52:26] RECOVERY - ats-tls HTTPS wikiworkshop.org ECDSA on cp5016 is OK: SSL OK - OCSP staple validity for wikiworkshop.org has 205655 seconds left:Certificate wikiworkshop.org contains all required SANs:Certificate wikiworkshop.org (ECDSA) valid until 2021-08-07 17:00:13 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:52:26] RECOVERY - ats-tls HTTPS wikiworkshop.org RSA on cp5016 is OK: SSL OK - OCSP staple validity for wikiworkshop.org has 205655 seconds left:Certificate wikiworkshop.org contains all required SANs:Certificate wikiworkshop.org (ECDSA) valid until 2021-08-07 17:00:13 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/HTTPS [08:52:28] RECOVERY - puppet last run on cp5016 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:52:40] !log pool cp5016 [08:52:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:42] PROBLEM - Confd vcl based reload on cp5002 is CRITICAL: reload-vcl failed to run since 118h, 15 minutes. https://wikitech.wikimedia.org/wiki/Varnish [08:57:48] ema ^^is that you? :) [08:58:40] vgutierrez: this is the reload-vcl icinga script working again for the first time after 2 years :) [09:00:18] PROBLEM - Confd vcl based reload on cp5013 is CRITICAL: reload-vcl failed to run since 118h, 19 minutes. https://wikitech.wikimedia.org/wiki/Varnish [09:00:52] PROBLEM - Confd vcl based reload on cp5014 is CRITICAL: reload-vcl failed to run since 118h, 20 minutes. https://wikitech.wikimedia.org/wiki/Varnish [09:00:53] we're gonna get a few of those, sorry for the spam [09:01:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15988 and previous config saved to /var/cache/conftool/dbconfig/20210517-090101-root.json [09:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:44] (03CR) 10Jcrespo: "So this can be merged, right?" [puppet] - 10https://gerrit.wikimedia.org/r/685494 (https://phabricator.wikimedia.org/T280751) (owner: 10Jcrespo) [09:03:26] (03CR) 10Marostegui: "yep!" [puppet] - 10https://gerrit.wikimedia.org/r/685494 (https://phabricator.wikimedia.org/T280751) (owner: 10Jcrespo) [09:04:07] (03PS1) 10Muehlenhoff: Enable SLO for piwik [puppet] - 10https://gerrit.wikimedia.org/r/692258 [09:05:20] (03PS4) 10Jcrespo: dbbackups: Switchover backup generation for s6 on eqiad from db1139 to db1140 [puppet] - 10https://gerrit.wikimedia.org/r/685494 (https://phabricator.wikimedia.org/T280751) [09:06:18] !log cp_eqsin: run confd-reload-vcl manually to fix /var/run/reload-vcl-state T282880 [09:06:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:22] T282880: Revisit varnish dynamic backends mechanism - https://phabricator.wikimedia.org/T282880 [09:06:41] (03CR) 10Cathal Mooney: [C: 03+2] cr/firewall.conf: allow openstack Trove port TCP/8779 [homer/public] - 10https://gerrit.wikimedia.org/r/691140 (https://phabricator.wikimedia.org/T282809) (owner: 10Arturo Borrero Gonzalez) [09:07:17] (03Merged) 10jenkins-bot: cr/firewall.conf: allow openstack Trove port TCP/8779 [homer/public] - 10https://gerrit.wikimedia.org/r/691140 (https://phabricator.wikimedia.org/T282809) (owner: 10Arturo Borrero Gonzalez) [09:07:18] RECOVERY - Confd vcl based reload on cp5013 is OK: reload-vcl successfully ran 0h, 1 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [09:07:36] (03CR) 10Jcrespo: [C: 03+2] dbbackups: Switchover backup generation for s6 on eqiad from db1139 to db1140 [puppet] - 10https://gerrit.wikimedia.org/r/685494 (https://phabricator.wikimedia.org/T280751) (owner: 10Jcrespo) [09:08:28] (03PS1) 10Jbond: (do not merge) openldap/offboard-user.py: Port to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/692071 [09:08:52] RECOVERY - Confd vcl based reload on cp5002 is OK: reload-vcl successfully ran 0h, 1 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [09:09:40] RECOVERY - Confd vcl based reload on cp5014 is OK: reload-vcl successfully ran 0h, 1 minutes ago. https://wikitech.wikimedia.org/wiki/Varnish [09:09:55] (03CR) 10jerkins-bot: [V: 04-1] (do not merge) openldap/offboard-user.py: Port to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/692071 (owner: 10Jbond) [09:15:44] 10SRE, 10netbox, 10Patch-For-Review: Add SSO support to netbox - https://phabricator.wikimedia.org/T244849 (10Volans) a:05crusnov→03None [09:16:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15989 and previous config saved to /var/cache/conftool/dbconfig/20210517-091604-root.json [09:16:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P15990 and previous config saved to /var/cache/conftool/dbconfig/20210517-091636-marostegui.json [09:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:02] (03PS1) 10Muehlenhoff: Enable SLO for Turnilo [puppet] - 10https://gerrit.wikimedia.org/r/692262 [09:18:16] (03PS1) 10Arturo Borrero Gonzalez: scripts/webservice: don't check for the release arg in k8s backend [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/692263 (https://phabricator.wikimedia.org/T282972) [09:18:29] !log Restarting CI Jenkins to upgrade the Gearman plugin # T281737 [09:18:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:32] T281737: Zuul can't stop jobs or set the build description - https://phabricator.wikimedia.org/T281737 [09:18:44] 10SRE, 10Traffic, 10Patch-For-Review: Make Netbox Active/Active - https://phabricator.wikimedia.org/T234997 (10Volans) a:05crusnov→03None [09:22:25] 10SRE, 10DC-Ops, 10SRE-tools: Host decommission improvements - https://phabricator.wikimedia.org/T231066 (10Volans) [09:25:33] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/29577/" [puppet] - 10https://gerrit.wikimedia.org/r/692262 (owner: 10Muehlenhoff) [09:26:17] (03PS3) 10Alexandros Kosiaris: docker-registry: Clean up old nginx http endpoint [puppet] - 10https://gerrit.wikimedia.org/r/691106 (https://phabricator.wikimedia.org/T256762) [09:26:33] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/29576/" [puppet] - 10https://gerrit.wikimedia.org/r/692258 (owner: 10Muehlenhoff) [09:27:10] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/692262 (owner: 10Muehlenhoff) [09:28:35] (03CR) 10Jbond: "LGTM, I'm not sure but this could be the first one that uses post messages?" [puppet] - 10https://gerrit.wikimedia.org/r/692258 (owner: 10Muehlenhoff) [09:28:37] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/691106 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [09:28:38] topranks: hi! was https://gerrit.wikimedia.org/r/691140 already deployed to the routers? [09:29:00] I am just about to do so [09:29:01] (03CR) 10Alexandros Kosiaris: [C: 03+2] docker-registry: Remove Docker-Distribution-API-version header [puppet] - 10https://gerrit.wikimedia.org/r/691107 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [09:29:05] ah, thanks [09:29:11] !log push CR691140 to eqiad and codfw core routers - T282809 [09:29:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:15] T282809: Allow access to Trove API endpoints (port 8779) from cloud-vps instances - https://phabricator.wikimedia.org/T282809 [09:30:59] (03CR) 10Kormat: [C: 03+1] prometheus-mysqld-exporter: Update generator to remove multisource exception [puppet] - 10https://gerrit.wikimedia.org/r/690402 (https://phabricator.wikimedia.org/T282662) (owner: 10Jcrespo) [09:32:23] 10SRE, 10netbox, 10Patch-For-Review: Setup Swift Storage for Netbox image (was: netbox won't allow me to upload photos of the rack) - https://phabricator.wikimedia.org/T209182 (10Volans) a:05crusnov→03None [09:33:11] !log installing libimage-exiftool-perl security updates [09:33:11] (03CR) 10Kormat: [C: 03+1] install_server: Remove labsdb entry [puppet] - 10https://gerrit.wikimedia.org/r/692246 (https://phabricator.wikimedia.org/T282662) (owner: 10Marostegui) [09:33:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:20] (03CR) 10Marostegui: [C: 03+2] install_server: Remove labsdb entry [puppet] - 10https://gerrit.wikimedia.org/r/692246 (https://phabricator.wikimedia.org/T282662) (owner: 10Marostegui) [09:33:41] akosiaris: can I merge your two changes? [09:36:03] akosiaris: let me know when you could help me sort out citoid helmfiles :). [09:39:16] marostegui: yes please [09:39:33] mvolz: o/ how may I help ? [09:39:40] ok! [09:40:17] Majavah: those changes have gone in, looks ok. Are you able to check it's working? [09:41:08] topranks: working fine, thank you! [09:41:19] wohoo! my first every change :) [09:41:21] 10SRE: x509-bundle as used by envoy::tlsproxy fails on single certificate file - https://phabricator.wikimedia.org/T283001 (10fgiunchedi) [09:41:27] *ever [09:43:14] !log Restarted CI Jenkins to update the instant-messaging and ircbot plugins # T271122 [09:43:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:18] T271122: Restore IRC alerts for beta-scap-eqiad job - https://phabricator.wikimedia.org/T271122 [09:43:56] 10SRE, 10Datacenter-Switchover, 10Performance-Team (Radar): June 2021 Datacenter switchover - https://phabricator.wikimedia.org/T281515 (10Aklapper) [09:44:05] (03PS1) 10Marostegui: analytic.pp: s/labsdb/clouddb [puppet] - 10https://gerrit.wikimedia.org/r/692269 (https://phabricator.wikimedia.org/T282662) [09:45:07] (03CR) 10Marostegui: [C: 03+2] analytic.pp: s/labsdb/clouddb [puppet] - 10https://gerrit.wikimedia.org/r/692269 (https://phabricator.wikimedia.org/T282662) (owner: 10Marostegui) [09:46:23] (03CR) 10Majavah: [C: 03+2] scripts/webservice: don't check for the release arg in k8s backend [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/692263 (https://phabricator.wikimedia.org/T282972) (owner: 10Arturo Borrero Gonzalez) [09:46:36] akosiaris: so, I messed up my deploy Thursday [09:46:59] but basically, I have one server running the new chart and an old version of citoid [09:47:15] but when I deployed the other one, it runs the new version, but it downgraded the chart [09:47:19] (03Merged) 10jenkins-bot: scripts/webservice: don't check for the release arg in k8s backend [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/692263 (https://phabricator.wikimedia.org/T282972) (owner: 10Arturo Borrero Gonzalez) [09:47:28] so I can't deploy the second server because it will downgrade the chart [09:48:36] staging and codfw have the downgraded chart and the upgraded citoid version [09:48:55] I didn't notice the chart was getting downgraded on them [09:49:12] eqiad is still on the new chart .17 [09:49:35] but if I try to upgrade it, the diff has it getting downgraded to .16 [09:49:59] akosiaris: thoughts? [09:50:42] (03PS1) 10Muehlenhoff: Fix typo in username [puppet] - 10https://gerrit.wikimedia.org/r/692271 (https://phabricator.wikimedia.org/T282774) [09:52:54] image should be docker-registry.discovery.wmnet/wikimedia/mediawiki-services-citoid:2021-05-13-083446-production and chart should be citoid-0.0.17 [09:53:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15991 and previous config saved to /var/cache/conftool/dbconfig/20210517-095312-root.json [09:53:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:00] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/691154 (owner: 10Arturo Borrero Gonzalez) [09:55:25] mvolz: that is... weird [09:55:35] :( [09:55:37] I see that eqiad uses 0.0.17 [09:55:38] 10Puppet, 10User-jbond: hiera_lookup: Allow query against checkout of labs/private in addition to checkout of operations/puppet - https://phabricator.wikimedia.org/T216647 (10Volans) 05Open→03Declined The above way is the more canonical and safe way to check a hiera key. Declined for now. [09:55:47] but the others 0.0.16 [09:56:06] PROBLEM - Docker registry HTTP interface on registry2003 is CRITICAL: connect to address 10.192.0.39 and port 81: Connection refused https://wikitech.wikimedia.org/wiki/Docker [09:56:08] but interestingly, eqiad was last deployed 1 week before [09:56:14] Yeah, I deployed codfw on Thursday without realising it downgraded the chart [09:56:29] * akosiaris fixing the registry port 81 error [09:57:18] 10SRE, 10netbox, 10Patch-For-Review: Add SSO support to netbox - https://phabricator.wikimedia.org/T244849 (10Volans) a:03jbond As agreed on IRC assigning to John to not loose momentum on this. [09:58:15] (03PS1) 10Jbond: P:sretest: test puppetdb_query [puppet] - 10https://gerrit.wikimedia.org/r/692273 [09:58:58] And apparently forgot to deploy eqiad, I assume :D [09:59:15] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29578/console" [puppet] - 10https://gerrit.wikimedia.org/r/692273 (owner: 10Jbond) [09:59:16] Which I guess is good because when I realised that and tried to redeploy I realised it was the wrong chart. [09:59:43] Although I guess I was really not paying much attention Thursday overall :( [09:59:51] (03PS1) 10Alexandros Kosiaris: docker-registry: Remove monitoring for port 81 [puppet] - 10https://gerrit.wikimedia.org/r/692275 [10:00:22] (03CR) 10Alexandros Kosiaris: [C: 03+2] docker-registry: Remove monitoring for port 81 [puppet] - 10https://gerrit.wikimedia.org/r/692275 (owner: 10Alexandros Kosiaris) [10:00:38] (03PS1) 10Elukey: admin: add expiry date and contact for user sannita [puppet] - 10https://gerrit.wikimedia.org/r/692277 (https://phabricator.wikimedia.org/T282600) [10:01:16] (03CR) 10jerkins-bot: [V: 04-1] docker-registry: Remove monitoring for port 81 [puppet] - 10https://gerrit.wikimedia.org/r/692275 (owner: 10Alexandros Kosiaris) [10:01:19] (03PS2) 10Alexandros Kosiaris: docker-registry: Remove monitoring for port 81 [puppet] - 10https://gerrit.wikimedia.org/r/692275 (https://phabricator.wikimedia.org/T256762) [10:01:56] RECOVERY - rpki grafana alert on alert1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [10:01:59] (03PS2) 10Jbond: P:sretest: test puppetdb_query [puppet] - 10https://gerrit.wikimedia.org/r/692273 [10:02:06] (03PS2) 10Elukey: admin: add expiry date and contact for user sannita [puppet] - 10https://gerrit.wikimedia.org/r/692277 (https://phabricator.wikimedia.org/T282600) [10:03:09] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29579/console" [puppet] - 10https://gerrit.wikimedia.org/r/692273 (owner: 10Jbond) [10:03:28] (03CR) 10Muehlenhoff: admin: add expiry date and contact for user sannita (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/692277 (https://phabricator.wikimedia.org/T282600) (owner: 10Elukey) [10:03:48] mvolz: what's actually concerning is that https://helm-charts.wikimedia.org/stable/index.yaml doesn't seem to have 0.0.17 in there [10:04:02] (03CR) 10Elukey: admin: add expiry date and contact for user sannita (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/692277 (https://phabricator.wikimedia.org/T282600) (owner: 10Elukey) [10:04:04] this is ... weird [10:04:11] :( [10:04:20] (03CR) 10Elukey: [C: 03+2] admin: add expiry date and contact for user sannita [puppet] - 10https://gerrit.wikimedia.org/r/692277 (https://phabricator.wikimedia.org/T282600) (owner: 10Elukey) [10:04:51] akosiaris: o/ ok to merge? [10:04:52] elukey: merging yours too [10:05:00] super thanks [10:05:53] 10SRE, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021), 10Patch-For-Review: Grant access to LDAP nda for Sannita - https://phabricator.wikimedia.org/T282600 (10elukey) To keep archives happy - after a chat with Moritz I moved user `sannita` to the `wmf` ldap group, and updated puppet w... [10:07:42] (03PS3) 10Jbond: P:sretest: test puppetdb_query [puppet] - 10https://gerrit.wikimedia.org/r/692273 [10:07:48] PROBLEM - Docker registry HTTP interface on registry1003 is CRITICAL: connect to address 10.64.0.93 and port 81: Connection refused https://wikitech.wikimedia.org/wiki/Docker [10:08:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15992 and previous config saved to /var/cache/conftool/dbconfig/20210517-100815-root.json [10:08:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:08] PROBLEM - Docker registry HTTP interface on registry2004 is CRITICAL: connect to address 10.192.16.49 and port 81: Connection refused https://wikitech.wikimedia.org/wiki/Docker [10:10:23] (03PS1) 10Ladsgroup: prometheus: Remove absented crons [puppet] - 10https://gerrit.wikimedia.org/r/692280 (https://phabricator.wikimedia.org/T273673) [10:12:41] (03Abandoned) 10Jbond: P:sretest: test puppetdb_query [puppet] - 10https://gerrit.wikimedia.org/r/692273 (owner: 10Jbond) [10:13:01] (03CR) 10Elukey: "> Patch Set 19:" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/688211 (https://phabricator.wikimedia.org/T278192) (owner: 10Elukey) [10:14:10] (03CR) 10Elukey: "> Patch Set 19:" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/688211 (https://phabricator.wikimedia.org/T278192) (owner: 10Elukey) [10:15:04] PROBLEM - Docker registry HTTP interface on registry1004 is CRITICAL: connect to address 10.64.32.143 and port 81: Connection refused https://wikitech.wikimedia.org/wiki/Docker [10:18:48] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692281 (https://phabricator.wikimedia.org/T128546) [10:23:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15993 and previous config saved to /var/cache/conftool/dbconfig/20210517-102319-root.json [10:23:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:40] (03PS1) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [10:29:42] (03PS1) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [10:30:05] jan_drewniak: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T1030). [10:30:57] !log installing postgresql-11 security updates [10:30:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:41] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692281 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:32:34] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692281 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:32:36] (03PS2) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [10:32:50] (03PS1) 10Urbanecm: urwiki: Grant `editprotected` to eliminators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692288 (https://phabricator.wikimedia.org/T281274) [10:33:50] (03PS2) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [10:34:05] (03PS3) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [10:35:03] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29581/console" [puppet] - 10https://gerrit.wikimedia.org/r/692287 (owner: 10Jbond) [10:36:30] !log jdrewniak@deploy1002 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:692281| Bumping portals to master (T128546)]] (duration: 01m 08s) [10:36:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:34] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:37:38] !log jdrewniak@deploy1002 Synchronized portals: Wikimedia Portals Update: [[gerrit:692281| Bumping portals to master (T128546)]] (duration: 01m 07s) [10:37:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15994 and previous config saved to /var/cache/conftool/dbconfig/20210517-103823-root.json [10:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:52] topranks: thanks for merging https://gerrit.wikimedia.org/r/c/operations/homer/public/+/691140 ! and welcome by the way! FYI #wikimedia-sre is less noisy channel [10:38:55] (03PS3) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [10:39:14] (03PS4) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [10:40:00] akosiaris: should I "upgrade" eqiad in the meantime while we figure out what happened to the chart? Or leave it? [10:40:10] (03PS2) 10Zabe: Update bnwiki project logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/691245 (https://phabricator.wikimedia.org/T282886) [10:40:22] mvolz: let it be for now, I am trying to figure out that chart version issue [10:40:29] cool :) [10:40:30] I 'll let you know once I have something. [10:42:19] 👍️ [10:48:11] (03PS4) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [10:48:17] (03PS5) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [10:50:59] (03PS5) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [10:51:04] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:51:05] 10SRE, 10Wikimedia-Mailing-lists: Mailman 3: Changing email address seems to break subscription for listadmins list - https://phabricator.wikimedia.org/T282328 (10MarcoAurelio) @Tgr This happened to me as well a week or two ago, but in my case the email addresses where identical (that is, I got twice subscribe... [10:51:34] (03PS6) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [10:53:20] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:54:48] (03PS6) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [10:55:05] (03PS7) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [11:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: That opportune time is upon us again. Time for a European mid-day backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T1100). [11:00:05] Zabe: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:13] i can deploy today [11:01:31] Zabe: hello, around? [11:01:59] (03CR) 10Urbanecm: [C: 03+2] Update bnwiki project logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/691245 (https://phabricator.wikimedia.org/T282886) (owner: 10Zabe) [11:02:40] (03Merged) 10jenkins-bot: Update bnwiki project logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/691245 (https://phabricator.wikimedia.org/T282886) (owner: 10Zabe) [11:02:57] Urbanecm: sorry, [11:03:02] I'm here [11:03:05] cool [11:03:53] !log [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Lusccasdeutsch . # T278856 [11:03:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:03:58] T278856: Server side upload for Lusccasdeutsch (master task) - https://phabricator.wikimedia.org/T278856 [11:04:45] syncing the logo change Zabe [11:04:55] (03PS2) 10Urbanecm: Enable NewUserMessage on ptwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/691132 (https://phabricator.wikimedia.org/T282845) (owner: 10Zabe) [11:04:58] (03CR) 10Urbanecm: [C: 03+2] Enable NewUserMessage on ptwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/691132 (https://phabricator.wikimedia.org/T282845) (owner: 10Zabe) [11:05:48] (03Merged) 10jenkins-bot: Enable NewUserMessage on ptwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/691132 (https://phabricator.wikimedia.org/T282845) (owner: 10Zabe) [11:06:21] !log urbanecm@deploy1002 Synchronized static/images/project-logos/: b1da7aa0517074cfa74c52c3889e4b185828d5c8: Update bnwiki project logo (T282886) (duration: 01m 42s) [11:06:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:25] T282886: Update bnwiki logo (for legacy vector) - https://phabricator.wikimedia.org/T282886 [11:06:39] Zabe: and the other one is at mwdebug1001 now [11:07:21] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster [11:07:21] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster [11:07:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:35] !log Purge https://en.wikipedia.org/static/images/project-logos/{bnwiki,bnwiki-1.5x,bnwiki-2x}.png (T282886) [11:07:35] Urbanecm: works the supposed way [11:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:08:57] Zabe: are you sure? [11:09:03] i just created https://pt.wikinews.org/wiki/Utilizador_Discuss%C3%A3o:MU_test_124 [11:09:09] (the acc i mean) [11:09:34] 10SRE, 10DBA: Migrate MySQLs to use ROW-based replication - https://phabricator.wikimedia.org/T109179 (10LSobanski) [11:11:25] It's a bit broken. is that an onwiki thing? [11:11:45] yep. see docs at https://www.mediawiki.org/wiki/Extension:NewUserMessage ;) [11:11:57] apparently substitution is enabled, but the template is not substitution-ready [11:12:42] see https://pt.wikinews.org/wiki/Utilizador:Martin_Urbanec/Testes and source :) [11:13:43] I'm going to disable substitution, and sync [11:13:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P15995 and previous config saved to /var/cache/conftool/dbconfig/20210517-111343-marostegui.json [11:13:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:14:38] 13:14, 17 May 2021 Martin Urbanec talk contribs block deleted page MediaWiki:Newusermessage-substitute (system administrator: template currently used for welcoming (Bem-vindo(a)) does not support substitution, see https://w.wiki/3Lem (phab:T282845)) (view/restore) [11:14:39] T282845: Add NewUserMessage extension on ptwikinews - https://phabricator.wikimedia.org/T282845 [11:14:57] thanks. [11:15:42] https://pt.wikinews.org/wiki/Utilizador_Discuss%C3%A3o:MU_test_125, much better [11:16:14] (03PS7) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [11:16:27] (03PS8) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [11:17:16] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 36d29a667bacebc880632c6a6a3614f4b1f5aa2e: Enable NewUserMessage on ptwikinews (T282845) (duration: 01m 09s) [11:17:19] Zabe: done [11:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:55] (03PS9) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [11:18:00] thanks :) [11:18:19] np [11:18:46] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:19:17] (03PS2) 10Urbanecm: urwiki: Grant `editprotected` to eliminators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692288 (https://phabricator.wikimedia.org/T281274) [11:19:20] (03CR) 10Urbanecm: [C: 03+2] urwiki: Grant `editprotected` to eliminators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692288 (https://phabricator.wikimedia.org/T281274) (owner: 10Urbanecm) [11:19:22] mvolz: ok, I think I solved it. Some kind of corruption seemed to have happened just for that 1 version of the chart (alongside another chart and version pair). I 'll file a task to dig deeper into it, but my first guess is a race condiation [11:19:37] mvolz: Go ahead and update all environments, helmfile should work fine now [11:20:05] (03Merged) 10jenkins-bot: urwiki: Grant `editprotected` to eliminators [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692288 (https://phabricator.wikimedia.org/T281274) (owner: 10Urbanecm) [11:20:50] (03PS8) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [11:21:05] (03PS10) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [11:21:08] (03PS2) 10Urbanecm: Enable SandboxLink at azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/691754 (https://phabricator.wikimedia.org/T282954) [11:21:13] (03CR) 10Urbanecm: [C: 03+2] Enable SandboxLink at azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/691754 (https://phabricator.wikimedia.org/T282954) (owner: 10Urbanecm) [11:21:58] (03Merged) 10jenkins-bot: Enable SandboxLink at azwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/691754 (https://phabricator.wikimedia.org/T282954) (owner: 10Urbanecm) [11:22:01] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 32e43439c88147439109403ea2805da648fef97f: urwiki: Grant `editprotected` to eliminators (T281274) (duration: 01m 08s) [11:22:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:05] T281274: Eliminator not able to edit "cascade-protected" pages despite having the right on urwiki - https://phabricator.wikimedia.org/T281274 [11:23:20] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:23:39] (03PS9) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [11:23:51] (03PS11) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [11:24:32] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 1e06f83293be63bd32703731ef1386e63d4ae94a: Enable SandboxLink at azwiki (T282954) (duration: 01m 08s) [11:24:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:36] T282954: Creating personal "sandbox" pages on AzWiki. - https://phabricator.wikimedia.org/T282954 [11:26:20] (03PS10) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [11:26:31] (03PS12) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [11:27:26] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster [11:27:26] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster [11:27:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:28:12] (03PS11) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [11:28:34] (03PS13) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [11:32:11] (03CR) 10Alexandros Kosiaris: docker-registry: Re-apply Cache-Control rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/691108 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [11:34:00] 10SRE, 10Traffic: Revisit varnish dynamic backends mechanism - https://phabricator.wikimedia.org/T282880 (10jbond) > and is driven by the cache::nodes['upload']['eqsin'] hiera setting. In relation to this would it be better to pull this information directly from puppetdb. This would mean that list would cont... [11:36:35] (03PS2) 10Alexandros Kosiaris: docker-registry: Remove absented nginx-site resource [puppet] - 10https://gerrit.wikimedia.org/r/691110 (https://phabricator.wikimedia.org/T256762) [11:39:20] (03CR) 10Alexandros Kosiaris: [C: 03+2] docker-registry: Remove absented nginx-site resource [puppet] - 10https://gerrit.wikimedia.org/r/691110 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [11:39:33] ok great, thank you! I'll do it after the backport window is finished. (12 utc ish) [11:41:59] 10SRE, 10Traffic: Configure dns and puppet repositories for new drmrs datacenter - https://phabricator.wikimedia.org/T282787 (10MMandere) >>! In T282787#7085273, @Volans wrote: > Other random things that needs to be updated sooner or later. I hope you don't mind if I drop them here, feel free to move them to a... [11:46:04] (03PS4) 10Urbanecm: Make the Malaysian talk namespaces names consistent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687064 (owner: 10Amire80) [11:46:08] (03CR) 10Urbanecm: [C: 03+2] Make the Malaysian talk namespaces names consistent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687064 (owner: 10Amire80) [11:47:19] (03Merged) 10jenkins-bot: Make the Malaysian talk namespaces names consistent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/687064 (owner: 10Amire80) [11:49:28] meh [11:49:48] !log 11:49:22 Synchronized wmf-config/InitialiseSettings.php: a73fe2d: Make the Malaysian talk namespaces names consistent (duration: 01m 08s) [11:49:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:19] !log [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswiki --fix [11:50:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:33] !log [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswikibooks --fix [11:50:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15996 and previous config saved to /var/cache/conftool/dbconfig/20210517-115037-root.json [11:50:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:43] mvolz: fwiw I'm done with B&C now [11:55:23] !log Deploy schema change on s8 codfw, lag will appear in codfw T266486 T268392 T273360 [11:55:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:55:28] T268392: Schema change for watchlist.wl_notificationtimestamp going binary(14) from varbinary(14) - https://phabricator.wikimedia.org/T268392 [11:55:29] T273360: Schema change for dropping default of img_timestamp and making it binary(14) - https://phabricator.wikimedia.org/T273360 [11:55:29] T266486: Schema change to turn user_last_timestamp.user_newtalk to binary(14) - https://phabricator.wikimedia.org/T266486 [12:00:46] (03PS3) 10Jbond: admin: drop christinedk do not merge before 17/05/2021 [puppet] - 10https://gerrit.wikimedia.org/r/690405 [12:04:33] !log mvolz@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' . [12:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:40] (03CR) 10JMeybohm: [C: 03+1] docker-registry: Re-apply Cache-Control rules [puppet] - 10https://gerrit.wikimedia.org/r/691108 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [12:05:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15997 and previous config saved to /var/cache/conftool/dbconfig/20210517-120541-root.json [12:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:14] !log mvolz@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' . [12:07:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:56] !log mvolz@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' . [12:08:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:01] 10SRE, 10Analytics, 10Traffic, 10Patch-For-Review: Add Traffic's notion of "from public cloud" to Analytics webrequest data - https://phabricator.wikimedia.org/T279380 (10JAllemandou) @CDanis the patch for Druid is there - sorry for not having acted quicker. [12:10:33] 10SRE, 10Analytics, 10Analytics-Kanban, 10Traffic, 10Patch-For-Review: Add Traffic's notion of "from public cloud" to Analytics webrequest data - https://phabricator.wikimedia.org/T279380 (10JAllemandou) a:03JAllemandou [12:11:11] all done, everything looks good ^-^ [12:18:59] (03CR) 10Kormat: [C: 03+1] dbbackups: remove db2097 s6 section for this codfw backup source [puppet] - 10https://gerrit.wikimedia.org/r/685726 (https://phabricator.wikimedia.org/T280751) (owner: 10Jcrespo) [12:20:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15998 and previous config saved to /var/cache/conftool/dbconfig/20210517-122045-root.json [12:20:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:27:11] (03CR) 10Jbond: [C: 03+2] admin: drop christinedk do not merge before 17/05/2021 [puppet] - 10https://gerrit.wikimedia.org/r/690405 (owner: 10Jbond) [12:28:10] (03Abandoned) 10Jcrespo: dbbackups: Switchover s6 eqiad database backups from db1139 to db1140 [puppet] - 10https://gerrit.wikimedia.org/r/681622 (https://phabricator.wikimedia.org/T280751) (owner: 10Jcrespo) [12:31:49] (03PS1) 10Marostegui: mysqld_exporter_config.py: Replace db1083 [puppet] - 10https://gerrit.wikimedia.org/r/692311 (https://phabricator.wikimedia.org/T281445) [12:32:04] (03CR) 10Marostegui: [C: 03+2] prometheus-mysqld-exporter: Update generator to remove multisource exception [puppet] - 10https://gerrit.wikimedia.org/r/690402 (https://phabricator.wikimedia.org/T282662) (owner: 10Jcrespo) [12:32:35] (03PS2) 10Marostegui: mysqld_exporter_config.py: Replace db1083 [puppet] - 10https://gerrit.wikimedia.org/r/692311 (https://phabricator.wikimedia.org/T281445) [12:35:12] (03CR) 10Marostegui: [C: 03+2] mysqld_exporter_config.py: Replace db1083 [puppet] - 10https://gerrit.wikimedia.org/r/692311 (https://phabricator.wikimedia.org/T281445) (owner: 10Marostegui) [12:35:49] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15999 and previous config saved to /var/cache/conftool/dbconfig/20210517-123548-root.json [12:35:50] (03PS1) 10Jbond: Revert "admin: drop christinedk do not merge before 17/05/2021" [puppet] - 10https://gerrit.wikimedia.org/r/692072 [12:35:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:34] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cloud nfs: Change primary cluster rate limits dramatically [puppet] - 10https://gerrit.wikimedia.org/r/691267 (https://phabricator.wikimedia.org/T218338) (owner: 10Bstorm) [12:52:04] (03PS1) 10Kormat: install_server: Switch db1131 to buster. [puppet] - 10https://gerrit.wikimedia.org/r/692314 (https://phabricator.wikimedia.org/T280751) [12:52:06] (03PS1) 10Kormat: db1131: Disable notifications. [puppet] - 10https://gerrit.wikimedia.org/r/692315 (https://phabricator.wikimedia.org/T280751) [12:54:33] (03PS1) 10Jbond: cfssl::cert: Add support for wildcard names [puppet] - 10https://gerrit.wikimedia.org/r/692316 (https://phabricator.wikimedia.org/T282930) [12:55:17] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29591/console" [puppet] - 10https://gerrit.wikimedia.org/r/692316 (https://phabricator.wikimedia.org/T282930) (owner: 10Jbond) [12:57:22] (03CR) 10Kormat: [C: 03+2] db1131: Disable notifications. [puppet] - 10https://gerrit.wikimedia.org/r/692315 (https://phabricator.wikimedia.org/T280751) (owner: 10Kormat) [12:57:28] 10SRE, 10SRE-Access-Requests: Requesting access to production for joanna_borun - https://phabricator.wikimedia.org/T282661 (10ayounsi) 05Open→03Resolved a:03ayounsi All good. [12:57:32] (03CR) 10Kormat: [C: 03+2] install_server: Switch db1131 to buster. [puppet] - 10https://gerrit.wikimedia.org/r/692314 (https://phabricator.wikimedia.org/T280751) (owner: 10Kormat) [12:57:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16000 and previous config saved to /var/cache/conftool/dbconfig/20210517-125742-marostegui.json [12:57:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:58:42] (03CR) 10Jbond: [C: 03+2] Revert "admin: drop christinedk do not merge before 17/05/2021" [puppet] - 10https://gerrit.wikimedia.org/r/692072 (owner: 10Jbond) [12:59:20] (03PS1) 10Jbond: admin: drop christinedk do not merge before 17/05/2021 [puppet] - 10https://gerrit.wikimedia.org/r/692073 [12:59:30] (03CR) 10Volans: "nits and a question inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/692316 (https://phabricator.wikimedia.org/T282930) (owner: 10Jbond) [12:59:51] (03PS2) 10Majavah: cfssl::cert: Add support for wildcard names [puppet] - 10https://gerrit.wikimedia.org/r/692316 (https://phabricator.wikimedia.org/T282930) (owner: 10Jbond) [13:01:29] (03PS3) 10Majavah: cfssl::cert: Add support for wildcard names [puppet] - 10https://gerrit.wikimedia.org/r/692316 (https://phabricator.wikimedia.org/T282930) (owner: 10Jbond) [13:02:36] PROBLEM - Check systemd state on sodium is CRITICAL: CRITICAL - degraded: The following units failed: update-ubuntu-mirror.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:05:48] (03PS1) 10Majavah: scap: Use https for beta swagger checks [puppet] - 10https://gerrit.wikimedia.org/r/692318 (https://phabricator.wikimedia.org/T278686) [13:09:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1144:3314, db1144:3315 for kernel and mysql upgrade', diff saved to https://phabricator.wikimedia.org/P16001 and previous config saved to /var/cache/conftool/dbconfig/20210517-130935-marostegui.json [13:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:56] (03CR) 10Hashar: "> Patch Set 1:" (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [13:10:04] !log Upgrade kernel and mysql (10.4.19) on db1144:3314, db1144:3315 [13:10:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:14] (03CR) 10Hashar: multiversion: enhance buildDBList output (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [13:10:48] (03PS2) 10Hashar: multiversion: enhance buildDBList output [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 [13:17:13] (03PS2) 10Muehlenhoff: Fix typo in username [puppet] - 10https://gerrit.wikimedia.org/r/692271 (https://phabricator.wikimedia.org/T282774) [13:19:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16002 and previous config saved to /var/cache/conftool/dbconfig/20210517-131924-root.json [13:19:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16003 and previous config saved to /var/cache/conftool/dbconfig/20210517-131927-root.json [13:19:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:33] (03PS1) 10Zabe: Revert "Add a throttle rule for for edit-a-thon" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692320 (https://phabricator.wikimedia.org/T275237) [13:20:52] (03PS2) 10Zabe: Revert "Add a throttle rule for for edit-a-thon" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692320 (https://phabricator.wikimedia.org/T275237) [13:27:44] (03CR) 10Jgreen: [C: 03+2] Monitor civiproxy nginx port [puppet] - 10https://gerrit.wikimedia.org/r/691277 (https://phabricator.wikimedia.org/T281321) (owner: 10Dwisehaupt) [13:30:54] (03CR) 10Majavah: P::kubernetes::deployment_server: Do not use ipv6 on beta (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/691494 (https://phabricator.wikimedia.org/T281986) (owner: 10Majavah) [13:31:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16004 and previous config saved to /var/cache/conftool/dbconfig/20210517-133116-root.json [13:31:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:08] (03PS1) 10Kormat: install_server: Update db1131 MAC address. [puppet] - 10https://gerrit.wikimedia.org/r/692324 [13:32:55] (03CR) 10Kormat: [C: 03+2] install_server: Update db1131 MAC address. [puppet] - 10https://gerrit.wikimedia.org/r/692324 (owner: 10Kormat) [13:34:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16005 and previous config saved to /var/cache/conftool/dbconfig/20210517-133427-root.json [13:34:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16006 and previous config saved to /var/cache/conftool/dbconfig/20210517-133431-root.json [13:34:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:57] 10SRE, 10Wikimedia-Mailing-lists, 10translatewiki.net: Add mailman-templates to translatewiki.net - https://phabricator.wikimedia.org/T282022 (10Nikerabbit) [13:37:25] 10SRE, 10Wikimedia-Mailing-lists, 10translatewiki.net: Add mailman-templates to translatewiki.net - https://phabricator.wikimedia.org/T282022 (10Nikerabbit) @Legoktm I added a few questions to the description and Nemo_bis has left similar ones in the qqq patch. Logo is not needed. [13:40:14] (03CR) 10Elukey: [C: 03+2] "Luca + LDAP + Friday is not a great combination, it is official. Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/692271 (https://phabricator.wikimedia.org/T282774) (owner: 10Muehlenhoff) [13:46:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16007 and previous config saved to /var/cache/conftool/dbconfig/20210517-134619-root.json [13:46:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:24] 10SRE, 10Wikimedia-Mailing-lists, 10translatewiki.net: Add mailman-templates to translatewiki.net - https://phabricator.wikimedia.org/T282022 (10Nikerabbit) [13:46:37] (03CR) 10Cathal Mooney: [C: 03+1] remove Zayo from transit providers [homer/public] - 10https://gerrit.wikimedia.org/r/659383 (owner: 10CDanis) [13:46:40] (03CR) 10Cathal Mooney: [C: 03+2] remove Zayo from transit providers [homer/public] - 10https://gerrit.wikimedia.org/r/659383 (owner: 10CDanis) [13:47:12] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE [13:47:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:16] (03Merged) 10jenkins-bot: remove Zayo from transit providers [homer/public] - 10https://gerrit.wikimedia.org/r/659383 (owner: 10CDanis) [13:49:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16008 and previous config saved to /var/cache/conftool/dbconfig/20210517-134931-root.json [13:49:33] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE [13:49:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16009 and previous config saved to /var/cache/conftool/dbconfig/20210517-134934-root.json [13:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:49:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:46] 10SRE, 10Wikimedia-Mailing-lists, 10translatewiki.net: Add mailman-templates to translatewiki.net - https://phabricator.wikimedia.org/T282022 (10Nikerabbit) [13:52:54] (03CR) 10Jbond: cfssl::cert: Add support for wildcard names (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/692316 (https://phabricator.wikimedia.org/T282930) (owner: 10Jbond) [13:52:57] (03CR) 10Jbond: [C: 03+2] cfssl::cert: Add support for wildcard names [puppet] - 10https://gerrit.wikimedia.org/r/692316 (https://phabricator.wikimedia.org/T282930) (owner: 10Jbond) [13:53:28] (03CR) 10Alexandros Kosiaris: [C: 04-2] P::kubernetes::deployment_server: Do not use ipv6 on beta (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/691494 (https://phabricator.wikimedia.org/T281986) (owner: 10Majavah) [13:53:44] (03CR) 10Jbond: [C: 03+2] scap: Use https for beta swagger checks [puppet] - 10https://gerrit.wikimedia.org/r/692318 (https://phabricator.wikimedia.org/T278686) (owner: 10Majavah) [14:01:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16010 and previous config saved to /var/cache/conftool/dbconfig/20210517-140123-root.json [14:01:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16011 and previous config saved to /var/cache/conftool/dbconfig/20210517-140435-root.json [14:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16012 and previous config saved to /var/cache/conftool/dbconfig/20210517-140438-root.json [14:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:11] (03PS2) 10Jbond: admin: drop christinedk do not merge before 17/05/2021 [puppet] - 10https://gerrit.wikimedia.org/r/692073 [14:08:45] (03CR) 10Hnowlan: [V: 03+2 C: 03+2] New envoy upstream version 1.15.5 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/689950 (owner: 10Hnowlan) [14:11:57] (03PS1) 10MMandere: conftool-data/node: Add drmrs nodes [puppet] - 10https://gerrit.wikimedia.org/r/692331 (https://phabricator.wikimedia.org/T282787) [14:11:59] (03PS1) 10MMandere: hieradata: Add drmrs domain to puppet master allow list [puppet] - 10https://gerrit.wikimedia.org/r/692332 (https://phabricator.wikimedia.org/T282787) [14:12:01] (03PS1) 10MMandere: hieradata/cloud: Add drmrs to ntp peers list [puppet] - 10https://gerrit.wikimedia.org/r/692333 (https://phabricator.wikimedia.org/T282787) [14:12:58] PROBLEM - Unmerged changes on repository puppet on puppetmaster1002 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [14:13:06] PROBLEM - Unmerged changes on repository puppet on puppetmaster2001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [14:14:09] (03CR) 10Hnowlan: [C: 03+2] api-gateway: bump Envoy version [deployment-charts] - 10https://gerrit.wikimedia.org/r/690404 (owner: 10Hnowlan) [14:14:24] PROBLEM - Unmerged changes on repository puppet on puppetmaster2003 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [14:14:37] (03PS2) 10Herron: add "cdofw" to typos [puppet] - 10https://gerrit.wikimedia.org/r/674863 [14:15:16] (03CR) 10BBlack: "Couldn't we also genericize this for all other such cases by making the whole role name an argument, instead of a cache::foo-specific func" [puppet] - 10https://gerrit.wikimedia.org/r/692286 (owner: 10Jbond) [14:15:28] (03Merged) 10jenkins-bot: api-gateway: bump Envoy version [deployment-charts] - 10https://gerrit.wikimedia.org/r/690404 (owner: 10Hnowlan) [14:16:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16013 and previous config saved to /var/cache/conftool/dbconfig/20210517-141627-root.json [14:16:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:44] (03CR) 10Herron: [C: 03+2] add "cdofw" to typos [puppet] - 10https://gerrit.wikimedia.org/r/674863 (owner: 10Herron) [14:17:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16014 and previous config saved to /var/cache/conftool/dbconfig/20210517-141737-marostegui.json [14:17:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:48] PROBLEM - Unmerged changes on repository puppet on puppetmaster2002 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [14:18:52] (03CR) 10Jbond: "> Patch Set 11:" [puppet] - 10https://gerrit.wikimedia.org/r/692286 (owner: 10Jbond) [14:19:48] RECOVERY - Unmerged changes on repository puppet on puppetmaster1002 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [14:19:56] RECOVERY - Unmerged changes on repository puppet on puppetmaster2001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [14:20:06] RECOVERY - Unmerged changes on repository puppet on puppetmaster2002 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [14:20:26] (03CR) 10BBlack: "> Patch Set 11:" [puppet] - 10https://gerrit.wikimedia.org/r/692286 (owner: 10Jbond) [14:29:11] (03CR) 10Jbond: "> Yeah, like that :)" [puppet] - 10https://gerrit.wikimedia.org/r/692286 (owner: 10Jbond) [14:30:57] (03PS1) 10Cwhite: logstash: remove logstash-output-statsd plugin [puppet] - 10https://gerrit.wikimedia.org/r/692337 [14:31:01] (03PS1) 10Muehlenhoff: profile::mariadb::packages_client: On bullseye use wmf-mariadb105-client [puppet] - 10https://gerrit.wikimedia.org/r/692338 [14:33:03] (03CR) 10Jcrespo: "No es esto un duplicado de: https://gerrit.wikimedia.org/r/c/operations/puppet/+/686393 ?" [puppet] - 10https://gerrit.wikimedia.org/r/692338 (owner: 10Muehlenhoff) [14:33:30] (03PS2) 10Cwhite: logstash: remove logstash-output-statsd plugin [puppet] - 10https://gerrit.wikimedia.org/r/692337 [14:33:51] (03PS1) 10Filippo Giunchedi: hieradata: use public_domain for grafana vhosts [puppet] - 10https://gerrit.wikimedia.org/r/692339 [14:34:20] (03CR) 10Jcrespo: "Apologies, I meant if this wasn't a duplicate of https://gerrit.wikimedia.org/r/c/operations/puppet/+/686393 ?" [puppet] - 10https://gerrit.wikimedia.org/r/692338 (owner: 10Muehlenhoff) [14:35:16] (03CR) 10Cwhite: [C: 03+2] logstash: clean up mtail config [puppet] - 10https://gerrit.wikimedia.org/r/672771 (https://phabricator.wikimedia.org/T277080) (owner: 10Cwhite) [14:37:50] (03PS2) 10Filippo Giunchedi: hieradata: use public_domain for grafana vhosts [puppet] - 10https://gerrit.wikimedia.org/r/692339 [14:38:38] (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29593/console" [puppet] - 10https://gerrit.wikimedia.org/r/692339 (owner: 10Filippo Giunchedi) [14:39:39] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=jmx_puppetdb site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:41:21] !log hnowlan@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [14:41:22] !log hnowlan@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . [14:41:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:25] (03PS1) 10Jcrespo: dbbackups: Remove s6 stretch backup source instance on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/692341 (https://phabricator.wikimedia.org/T280751) [14:41:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:33] (03CR) 10Cwhite: [C: 03+1] hieradata: use public_domain for grafana vhosts [puppet] - 10https://gerrit.wikimedia.org/r/692339 (owner: 10Filippo Giunchedi) [14:43:37] (03PS2) 10Jbond: openldap/offboard-user.py: Port to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/692071 [14:44:00] (03CR) 10Jcrespo: "As per your advice, I will now proceed to merge the removal of the codfw s6 stretch (https://gerrit.wikimedia.org/r/c/operations/puppet/+/" [puppet] - 10https://gerrit.wikimedia.org/r/692341 (https://phabricator.wikimedia.org/T280751) (owner: 10Jcrespo) [14:44:20] (03CR) 10Jbond: "tested and ready for review" [puppet] - 10https://gerrit.wikimedia.org/r/692071 (owner: 10Jbond) [14:44:36] (03CR) 10Filippo Giunchedi: [V: 03+1 C: 03+2] hieradata: use public_domain for grafana vhosts [puppet] - 10https://gerrit.wikimedia.org/r/692339 (owner: 10Filippo Giunchedi) [14:45:33] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:45:40] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, I'm assuming we can clean up manually if needed" [puppet] - 10https://gerrit.wikimedia.org/r/692337 (owner: 10Cwhite) [14:48:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16015 and previous config saved to /var/cache/conftool/dbconfig/20210517-144800-root.json [14:48:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:43] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:50:19] !log hnowlan@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [14:50:19] !log hnowlan@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . [14:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:57] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [14:52:37] RECOVERY - Unmerged changes on repository puppet on puppetmaster2003 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [14:53:13] !log hnowlan@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . [14:53:13] !log hnowlan@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [14:53:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:57] 10SRE: Offboard Cas - https://phabricator.wikimedia.org/T282993 (10MoritzMuehlenhoff) [15:00:00] 10SRE: Offboard Cas - https://phabricator.wikimedia.org/T282993 (10herron) [15:03:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16016 and previous config saved to /var/cache/conftool/dbconfig/20210517-150303-root.json [15:03:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:54] (03CR) 10Ahmon Dancy: [C: 03+1] multiversion: enhance buildDBList output [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [15:05:26] (03PS12) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [15:06:15] !log elukey@deploy1002 Started deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359 [15:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:19] T257359: Update Turkish Wikipedia's labeling campaign for 2020 - https://phabricator.wikimedia.org/T257359 [15:07:10] (03PS14) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [15:08:57] (03PS15) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [15:10:59] (03PS16) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [15:17:21] (03PS13) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [15:17:37] (03PS17) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [15:17:40] (03PS1) 10Nray: Allow `languageinheader` query param to fully control treatment of languages [skins/Vector] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/692075 (https://phabricator.wikimedia.org/T282543) [15:18:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16017 and previous config saved to /var/cache/conftool/dbconfig/20210517-151807-root.json [15:18:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:19:55] (03PS14) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [15:20:15] (03PS18) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [15:24:27] (03PS1) 10Ahmon Dancy: group2 wikis to 1.37.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692346 [15:24:29] (03CR) 10Ahmon Dancy: [C: 03+2] group2 wikis to 1.37.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692346 (owner: 10Ahmon Dancy) [15:25:20] (03Merged) 10jenkins-bot: group2 wikis to 1.37.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692346 (owner: 10Ahmon Dancy) [15:26:03] !log elukey@deploy1002 Finished deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359 (duration: 19m 48s) [15:26:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:07] T257359: Update Turkish Wikipedia's labeling campaign for 2020 - https://phabricator.wikimedia.org/T257359 [15:27:08] !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.5 [15:27:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:25] (03Abandoned) 10Nray: Fix 'final_state: vector' bug in VectorPrefDiffInstrumentation [extensions/WikimediaEvents] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/690789 (https://phabricator.wikimedia.org/T261842) (owner: 10Nray) [15:29:09] (03PS15) 10Jbond: wmflib::cache::nodes: test new puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692286 [15:29:37] 10SRE, 10Data-Persistence-Backup, 10netops: Understand (and mitigate) the backup speed differences between backup1002->backup2002 and backup2002->backup1002 - https://phabricator.wikimedia.org/T274234 (10LSobanski) Bonus question, is there an option for some traffic shaping / QoS to remediate the above autom... [15:29:39] (03PS19) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [15:32:07] (03PS20) 10Jbond: (do not merge) test puppetdb function [puppet] - 10https://gerrit.wikimedia.org/r/692287 [15:33:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16018 and previous config saved to /var/cache/conftool/dbconfig/20210517-153311-root.json [15:33:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:25] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good (and tested with a decom run)." [puppet] - 10https://gerrit.wikimedia.org/r/692071 (owner: 10Jbond) [15:35:50] (03CR) 10Jbond: "> Patch Set 11:" [puppet] - 10https://gerrit.wikimedia.org/r/692286 (owner: 10Jbond) [15:37:08] (03PS16) 10Jbond: wmflib::role_hosts: new function return list of hosts running a role [puppet] - 10https://gerrit.wikimedia.org/r/692286 (https://phabricator.wikimedia.org/T282880) [15:39:16] (03CR) 10Volans: [C: 03+1] "Looks reasonable to me" [puppet] - 10https://gerrit.wikimedia.org/r/692071 (owner: 10Jbond) [15:40:31] 10SRE, 10Analytics, 10Discovery, 10Event-Platform, and 2 others: Avoid accepting Kafka messages with whacky timestamps - https://phabricator.wikimedia.org/T282887 (10odimitrijevic) p:05Triage→03High [15:41:27] 10SRE, 10Traffic, 10Patch-For-Review: Revisit varnish dynamic backends mechanism - https://phabricator.wikimedia.org/T282880 (10jbond) > I created a quick [[ https://gerrit.wikimedia.org/r/c/operations/puppet/+/692286/10 | PoC ]] which would be called with e.g `wmflib::cache::nodes('upload', 'eqsin')` After... [15:42:24] (03CR) 10Jbond: [C: 03+2] openldap/offboard-user.py: Port to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/692071 (owner: 10Jbond) [15:43:13] 10SRE, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-web1001 - https://phabricator.wikimedia.org/T281787 (10Ottomata) [15:43:15] (03CR) 10Muehlenhoff: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/692338 (owner: 10Muehlenhoff) [15:43:24] (03Abandoned) 10Muehlenhoff: profile::mariadb::packages_client: On bullseye use wmf-mariadb105-client [puppet] - 10https://gerrit.wikimedia.org/r/692338 (owner: 10Muehlenhoff) [15:44:16] 10SRE, 10fundraising-tech-ops: (Need By: TBD) rack/setup/install payments100[5-8] - https://phabricator.wikimedia.org/T266481 (10Jgreen) 05Open→03Resolved [15:44:22] (03PS1) 10Marostegui: install_server: Reimage db1106 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/692348 (https://phabricator.wikimedia.org/T280492) [15:45:10] (03CR) 10Marostegui: [C: 03+2] install_server: Reimage db1106 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/692348 (https://phabricator.wikimedia.org/T280492) (owner: 10Marostegui) [15:45:24] 10SRE, 10Analytics-Clusters: Switch kafka/Hadoop away from java::security - https://phabricator.wikimedia.org/T282454 (10Ottomata) a:03Ottomata [15:45:29] jbond42: is your puppet change ok to be merged? [15:45:34] 10SRE, 10Analytics-Clusters, 10Analytics-Kanban: Switch kafka/Hadoop away from java::security - https://phabricator.wikimedia.org/T282454 (10Ottomata) [15:45:47] (03CR) 10Volans: wmflib::role_hosts: new function return list of hosts running a role (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/692286 (https://phabricator.wikimedia.org/T282880) (owner: 10Jbond) [15:48:11] jbond42: my change is safe to be merged anytime, proceed whenever you like [15:48:26] thanks merging now [15:48:52] thanks! [15:53:12] (03PS1) 10Marostegui: Revert "db1112: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/692078 [15:58:24] (03CR) 10BryanDavis: "> Patch Set 8:" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [16:03:58] (03CR) 10Majavah: [C: 04-1] "> Patch Set 8:" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/621776 (https://phabricator.wikimedia.org/T169695) (owner: 10BryanDavis) [16:06:15] 10SRE, 10Analytics, 10Discovery, 10Event-Platform, and 2 others: Avoid accepting Kafka messages with whacky timestamps - https://phabricator.wikimedia.org/T282887 (10Ottomata) I'd say this is medium to low priority and is something that needs to be worked on in collaboration with maintainers of other Kafka... [16:08:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16019 and previous config saved to /var/cache/conftool/dbconfig/20210517-160811-root.json [16:08:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:54] (03CR) 10Marostegui: [C: 03+2] Revert "db1112: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/692078 (owner: 10Marostegui) [16:20:43] 10SRE: Offboard Cas - https://phabricator.wikimedia.org/T282993 (10MoritzMuehlenhoff) [16:21:03] 10SRE: Offboard Cas - https://phabricator.wikimedia.org/T282993 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This is completed [16:23:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16020 and previous config saved to /var/cache/conftool/dbconfig/20210517-162315-root.json [16:23:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:19] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16021 and previous config saved to /var/cache/conftool/dbconfig/20210517-163819-root.json [16:38:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:01] (03PS1) 10Jsn.sherman: [BETA CLUSTER] Disable `TheWikipediaLibrary` for votewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692353 (https://phabricator.wikimedia.org/T283003) [16:50:13] (03PS3) 10Jforrester: multiversion: enhance buildDBList output [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [16:50:15] (03CR) 10Jforrester: multiversion: enhance buildDBList output (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [16:50:47] (03CR) 10Jforrester: [C: 03+1] multiversion: enhance buildDBList output [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [16:53:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16022 and previous config saved to /var/cache/conftool/dbconfig/20210517-165322-root.json [16:53:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:01] (03CR) 10Jbond: wmflib::role_hosts: new function return list of hosts running a role (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/692286 (https://phabricator.wikimedia.org/T282880) (owner: 10Jbond) [16:58:03] (03PS2) 10Jcrespo: dbbackups: remove db2097 s6 section for this codfw backup source [puppet] - 10https://gerrit.wikimedia.org/r/685726 (https://phabricator.wikimedia.org/T280751) [16:59:34] jbond42: I have a bunch of drive-by comments re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/692286 but have to go shortly, will address tomorrow and/or post-merge [17:00:04] ryankemper: My dear minions, it's time we take the moon! Just kidding. Time for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T1700). [17:01:51] godog: i dont think there is any rish on that so i can wait for the comments thanks [17:01:58] *rush [17:02:27] jbond42: ok! will ping you tomorrow too [17:02:41] ack thanks enjoy your evening [17:05:16] (03CR) 10Jcrespo: [C: 03+2] dbbackups: remove db2097 s6 section for this codfw backup source [puppet] - 10https://gerrit.wikimedia.org/r/685726 (https://phabricator.wikimedia.org/T280751) (owner: 10Jcrespo) [17:05:47] (03PS4) 10Jcrespo: mariadb: Remove s3 from db2098 [puppet] - 10https://gerrit.wikimedia.org/r/681448 (https://phabricator.wikimedia.org/T280492) [17:06:31] 10SRE, 10Analytics, 10netops: Audit analytics firewall filters - https://phabricator.wikimedia.org/T279429 (10ayounsi) @razzi from our IRC chat, the way I'd approach it is: - for all the removed IPs, check if the host still exist, most of the cases it's just that the host is gone and the ACL never got updat... [17:07:08] (03CR) 10Jcrespo: [C: 03+2] mariadb: Remove s3 from db2098 [puppet] - 10https://gerrit.wikimedia.org/r/681448 (https://phabricator.wikimedia.org/T280492) (owner: 10Jcrespo) [17:11:11] (03Abandoned) 10Jsn.sherman: [BETA CLUSTER] Disable `TheWikipediaLibrary` for votewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692353 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [17:11:38] (03PS1) 10Andrew Bogott: Trove: open up a lot of read-only policies [puppet] - 10https://gerrit.wikimedia.org/r/692354 (https://phabricator.wikimedia.org/T282809) [17:12:50] (03PS2) 10Andrew Bogott: Trove: open up a lot of read-only policies [puppet] - 10https://gerrit.wikimedia.org/r/692354 (https://phabricator.wikimedia.org/T282809) [17:18:07] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:19:07] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:21:17] (03PS1) 10Jsn.sherman: [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) [17:22:58] (03CR) 10jerkins-bot: [V: 04-1] [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [17:25:14] (03PS2) 10Jsn.sherman: [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) [17:34:14] 10SRE, 10LDAP-Access-Requests: Grant Access to wmf for ODimitrijevic - https://phabricator.wikimedia.org/T282836 (10odimitrijevic) [17:46:46] 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for JStephenson1980 - https://phabricator.wikimedia.org/T282521 (10ayounsi) a:03ayounsi [18:00:05] RoanKattouw, Niharika, and Urbanecm: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Morning backport window . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T1800). [18:00:05] nray and Zabe: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:28] i can deploy today [18:00:44] nray: Zabe: hello, are you around? [18:00:52] o/ hi Urbanecm ! [18:00:53] o/ [18:00:55] i'm here [18:00:56] (03CR) 10Urbanecm: [C: 03+2] Allow `languageinheader` query param to fully control treatment of languages [skins/Vector] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/692075 (https://phabricator.wikimedia.org/T282543) (owner: 10Nray) [18:01:00] (03CR) 10Urbanecm: [C: 03+2] Revert "Add a throttle rule for for edit-a-thon" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692320 (https://phabricator.wikimedia.org/T275237) (owner: 10Zabe) [18:01:04] great :). Let's do it then. [18:03:09] (03Merged) 10jenkins-bot: Revert "Add a throttle rule for for edit-a-thon" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692320 (https://phabricator.wikimedia.org/T275237) (owner: 10Zabe) [18:06:03] Zabe: your patch will be synced soon [18:11:52] (03PS1) 10Ahmon Dancy: WIP: initial train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692362 [18:11:54] (03CR) 10Ahmon Dancy: [C: 03+2] WIP: initial train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692362 (owner: 10Ahmon Dancy) [18:13:04] (03Merged) 10jenkins-bot: WIP: initial train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692362 (owner: 10Ahmon Dancy) [18:14:59] 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): hw troubleshooting: ssh unreachable for wdqs2007.codfw.wmnet - https://phabricator.wikimedia.org/T281437 (10RKemper) >>! In T281437#7087127, @wiki_willy wrote: > Hi @RKemper - Papaul will be back on the 24th. Would you be able to hold off un... [18:19:10] !log urbanecm@deploy1002 Synchronized wmf-config/throttle.php: c30f92b5: Remove expired throttle rule (duration: 00m 59s) [18:19:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:14] Zabe: done [18:20:05] thx [18:20:11] np [18:21:19] (03CR) 10Hashar: "So I actually have a question now. I have generated the .dblist files a five days ago (May 12th), how can we check I am not overwritting " (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [18:21:54] (03Merged) 10jenkins-bot: Allow `languageinheader` query param to fully control treatment of languages [skins/Vector] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/692075 (https://phabricator.wikimedia.org/T282543) (owner: 10Nray) [18:22:51] nray: your patch is available at mwdebug1001, can you test, please? [18:22:58] yes, testing now, thanks! [18:26:00] Urbanecm: looks great, you can proceed! [18:26:05] thank you, syncing [18:26:28] (03CR) 10Ahmon Dancy: "> Patch Set 3:" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [18:27:35] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.5/skins/Vector/includes/FeatureManagement/Requirements/LanguageInHeaderTreatmentRequirement.php: e180b99: Allow `languageinheader` query param to fully control treatment of languages (T282543) (duration: 00m 58s) [18:27:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:39] T282543: Add parameter that allows language button to be seen across all wikis - https://phabricator.wikimedia.org/T282543 [18:27:40] nray: here you go [18:27:42] anything else? [18:27:52] that's all. Thanks so much ! [18:32:02] np :) [18:37:22] (03PS1) 10Ahmon Dancy: 1.37.0-wmf.3 -> 123452342234 [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692365 [18:38:03] 10SRE, 10Traffic, 10Patch-For-Review: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh) [18:38:42] (03PS1) 10Ahmon Dancy: update 1.37.0-wmf.4 [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692366 [18:40:43] (03PS1) 10Ssingh: bird: add Wikidough's /24 to vips_filter (accept) [puppet] - 10https://gerrit.wikimedia.org/r/692367 (https://phabricator.wikimedia.org/T283027) [18:43:53] (03CR) 10Ahmon Dancy: [C: 03+2] 1.37.0-wmf.3 -> 123452342234 [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692365 (owner: 10Ahmon Dancy) [18:43:55] (03CR) 10Ahmon Dancy: [C: 03+2] update 1.37.0-wmf.4 [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692366 (owner: 10Ahmon Dancy) [18:44:41] (03Merged) 10jenkins-bot: 1.37.0-wmf.3 -> 123452342234 [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692365 (owner: 10Ahmon Dancy) [18:44:43] (03CR) 10jerkins-bot: [V: 04-1] update 1.37.0-wmf.4 [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692366 (owner: 10Ahmon Dancy) [18:45:07] 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): hw troubleshooting: ssh unreachable for wdqs2007.codfw.wmnet - https://phabricator.wikimedia.org/T281437 (10RobH) I've setup SR1059928911 self dispatch for a part & a Dell Technician to swap the SSD. The Dell Tech should contact me to schedu... [18:45:32] 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search (Current work): hw troubleshooting: ssh unreachable for wdqs2007.codfw.wmnet - https://phabricator.wikimedia.org/T281437 (10RobH) a:05Papaul→03RobH [18:50:21] 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: Unnecessary horizontal scrollbars - https://phabricator.wikimedia.org/T283028 (10Ladsgroup) p:05Triage→03Low [18:51:01] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:53:23] (03PS1) 10Ssingh: WIP: wikidough: update role to work towards anycast support [puppet] - 10https://gerrit.wikimedia.org/r/692368 (https://phabricator.wikimedia.org/T283027) [18:53:27] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:56:30] (03CR) 10Ssingh: "[Do not merge! This commit has the incorrect parameters.]" [puppet] - 10https://gerrit.wikimedia.org/r/692368 (https://phabricator.wikimedia.org/T283027) (owner: 10Ssingh) [18:57:29] 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: Unnecessary horizontal scrollbars - https://phabricator.wikimedia.org/T283028 (10Legoktm) There's a breakpoint in the CSS around 950px, if your window size is just slightly above that then the menu gets cut off. [18:57:59] RECOVERY - Check systemd state on sodium is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:00:07] 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: Unnecessary horizontal scrollbars - https://phabricator.wikimedia.org/T283028 (10Reedy) Viewport is 1280 x 622 px [19:05:01] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:07:19] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:09:03] (03PS6) 10Mforns: Migrate VirtualPageView to EventPlatform on group 0 and 1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689205 (https://phabricator.wikimedia.org/T238138) [19:09:07] 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Implement static redirects from pipermail archives to hyperkitty archives - https://phabricator.wikimedia.org/T280731 (10Quiddity) Just noting some missing redirects, per discussion at [[https://www.mediawiki.org/wiki/Topic:W92g75vedoknbde1|Template talk:... [19:11:40] (03CR) 10Ottomata: [C: 03+2] Migrate VirtualPageView to EventPlatform on group 0 and 1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689205 (https://phabricator.wikimedia.org/T238138) (owner: 10Mforns) [19:12:01] (03PS1) 10Zabe: Avoid using wfGetLB() [puppet] - 10https://gerrit.wikimedia.org/r/692370 [19:13:33] !log otto@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to Event Platform on group 0 and group 1 - T238138 (duration: 00m 59s) [19:13:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:38] T238138: VirtualPageView Event Platform Migration - https://phabricator.wikimedia.org/T238138 [19:14:31] (03PS2) 10Zabe: Avoid using wfGetLB() [puppet] - 10https://gerrit.wikimedia.org/r/692370 [19:15:02] (03PS3) 10Zabe: scap: avoid using wfGetLB() [puppet] - 10https://gerrit.wikimedia.org/r/692370 [19:18:51] (03PS1) 10Effie Mouzeli: WIP: mcrouter: update comments in mcrouter image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/692371 [19:21:02] (03PS2) 10Effie Mouzeli: WIP: mcrouter: update comments in mcrouter image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/692371 [19:21:11] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:23:29] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:25:11] 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Implement static redirects from pipermail archives to hyperkitty archives - https://phabricator.wikimedia.org/T280731 (10Legoktm) >>! In T280731#7093914, @Quiddity wrote: > * the links below that all go to Mailman 2. The archives //have been// correctly c... [19:29:52] 10SRE, 10Okapi [Wikimedia Enterprise], 10Platform Engineering: Securely connect Wikimedia Enterprise Infrastructure with WMF Kafka Streams - https://phabricator.wikimedia.org/T280628 (10Ottomata) FYI, Cloud Services is working on a more standard way to expose stuff in production to Cloud: https://wikitech.wi... [19:41:44] (03PS1) 10Ottomata: Finalize WikidataCompletionSearchClicks Event Platform migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692377 (https://phabricator.wikimedia.org/T282140) [19:43:45] (03CR) 10Ottomata: [C: 03+2] Finalize WikidataCompletionSearchClicks Event Platform migration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692377 (https://phabricator.wikimedia.org/T282140) (owner: 10Ottomata) [19:45:54] !log otto@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Finalize WikidataCompletionSearchClicks Event Platform migration - T282140 (duration: 00m 58s) [19:45:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:59] T282140: WikidataCompletionSearchClicks Event Platform Migration - https://phabricator.wikimedia.org/T282140 [19:49:00] (03PS1) 10Andrew Bogott: wmfkeystonehooks: Narrow our security group search to the new project [puppet] - 10https://gerrit.wikimedia.org/r/692378 (https://phabricator.wikimedia.org/T282876) [19:49:38] (03CR) 10jerkins-bot: [V: 04-1] wmfkeystonehooks: Narrow our security group search to the new project [puppet] - 10https://gerrit.wikimedia.org/r/692378 (https://phabricator.wikimedia.org/T282876) (owner: 10Andrew Bogott) [19:51:21] (03PS2) 10Andrew Bogott: wmfkeystonehooks: Narrow our security group search to the new project [puppet] - 10https://gerrit.wikimedia.org/r/692378 (https://phabricator.wikimedia.org/T282876) [19:59:23] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users, Kerberos, and LDAP for ODimitrijevic - https://phabricator.wikimedia.org/T282836 (10Ottomata) [20:00:05] chrisalbon and accraze: I, the Bot under the Fountain, allow thee, The Deployer, to do Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T2000). [20:01:36] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users, Kerberos, and LDAP for ODimitrijevic - https://phabricator.wikimedia.org/T282836 (10Ottomata) I expanded this ticket to include all the access Olja needs. She shouldn't need `nda` LDAP, as `wmf` should su... [20:02:54] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users, Kerberos, and LDAP for ODimitrijevic - https://phabricator.wikimedia.org/T282836 (10Ottomata) Approved by me (Olja, soon you will be the new Analytics Cluster approver! :) ) Needs approval by your manager... [20:02:58] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users, Kerberos, and LDAP for ODimitrijevic - https://phabricator.wikimedia.org/T282836 (10Ottomata) [20:04:21] (03PS1) 10Volans: Include codfw router loopbacks prefix [dns] - 10https://gerrit.wikimedia.org/r/692383 [20:04:41] 10SRE, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users, Kerberos, and LDAP for ODimitrijevic - https://phabricator.wikimedia.org/T282836 (10Ottomata) [20:05:25] (03CR) 10Ayounsi: [C: 03+1] Include codfw router loopbacks prefix [dns] - 10https://gerrit.wikimedia.org/r/692383 (owner: 10Volans) [20:07:21] (03PS2) 10Volans: Include codfw router loopbacks prefix [dns] - 10https://gerrit.wikimedia.org/r/692383 [20:08:12] (03CR) 10Ayounsi: [C: 03+1] Include codfw router loopbacks prefix [dns] - 10https://gerrit.wikimedia.org/r/692383 (owner: 10Volans) [20:08:29] (03CR) 10Volans: [C: 03+2] Include codfw router loopbacks prefix [dns] - 10https://gerrit.wikimedia.org/r/692383 (owner: 10Volans) [20:18:36] (03PS2) 10Ssingh: WIP: wikidough: update role to work towards anycast support [puppet] - 10https://gerrit.wikimedia.org/r/692368 (https://phabricator.wikimedia.org/T283027) [20:19:27] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:21:51] (03CR) 10Ssingh: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29601/console" [puppet] - 10https://gerrit.wikimedia.org/r/692368 (https://phabricator.wikimedia.org/T283027) (owner: 10Ssingh) [20:26:57] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:28:24] (03PS3) 10Herron: logstash: add logstash101[012] to elk7 cluster as ES backends [puppet] - 10https://gerrit.wikimedia.org/r/689994 (https://phabricator.wikimedia.org/T281266) [20:32:56] (03Abandoned) 10Ahmon Dancy: update 1.37.0-wmf.4 [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692366 (owner: 10Ahmon Dancy) [20:35:11] (03PS4) 10Herron: logstash: add logstash101[012] to elk7 cluster as ES backends [puppet] - 10https://gerrit.wikimedia.org/r/689994 (https://phabricator.wikimedia.org/T281266) [20:35:41] (03PS1) 10Ahmon Dancy: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692397 [20:35:43] (03CR) 10Ahmon Dancy: [C: 03+2] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692397 (owner: 10Ahmon Dancy) [20:36:36] (03Merged) 10jenkins-bot: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692397 (owner: 10Ahmon Dancy) [20:36:39] (03CR) 10Herron: [C: 03+2] logstash: add logstash101[012] to elk7 cluster as ES backends [puppet] - 10https://gerrit.wikimedia.org/r/689994 (https://phabricator.wikimedia.org/T281266) (owner: 10Herron) [20:37:07] (03PS1) 10Ahmon Dancy: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692399 [20:37:09] (03CR) 10Ahmon Dancy: [C: 03+2] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692399 (owner: 10Ahmon Dancy) [20:37:44] (03PS1) 10Ahmon Dancy: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692400 [20:37:46] (03CR) 10Ahmon Dancy: [C: 03+2] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692400 (owner: 10Ahmon Dancy) [20:37:53] (03Merged) 10jenkins-bot: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692399 (owner: 10Ahmon Dancy) [20:38:30] (03Merged) 10jenkins-bot: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692400 (owner: 10Ahmon Dancy) [20:41:40] (03PS1) 10Clarakosi: api-gateway: Implement new ratelimit configurations from envoy 1.16 [deployment-charts] - 10https://gerrit.wikimedia.org/r/692404 (https://phabricator.wikimedia.org/T260591) [20:42:05] 10SRE, 10Wikimedia-Mailing-lists, 10SecTeam-Processed, 10SecTeam-wikimedia-project-event, and 3 others: mailman3-web got stuck on lists1001, possible DoS - https://phabricator.wikimedia.org/T282957 (10sbassett) [20:42:07] (03PS1) 10Ahmon Dancy: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692405 [20:42:09] (03CR) 10Ahmon Dancy: [C: 03+2] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692405 (owner: 10Ahmon Dancy) [20:42:57] (03Merged) 10jenkins-bot: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692405 (owner: 10Ahmon Dancy) [20:44:57] (03PS1) 10Ahmon Dancy: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692407 [20:44:59] (03CR) 10Ahmon Dancy: [C: 03+2] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692407 (owner: 10Ahmon Dancy) [20:45:01] (03PS1) 10Ahmon Dancy: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692408 [20:45:03] (03CR) 10Ahmon Dancy: [C: 03+2] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692408 (owner: 10Ahmon Dancy) [20:45:43] (03Merged) 10jenkins-bot: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692407 (owner: 10Ahmon Dancy) [20:45:45] (03CR) 10jerkins-bot: [V: 04-1] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692408 (owner: 10Ahmon Dancy) [20:45:48] (03PS2) 10Ahmon Dancy: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692408 [20:45:50] (03CR) 10Ahmon Dancy: [C: 03+2] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692408 (owner: 10Ahmon Dancy) [20:46:02] (03CR) 10Legoktm: [C: 03+1] docker-registry: Re-apply Cache-Control rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/691108 (https://phabricator.wikimedia.org/T256762) (owner: 10Alexandros Kosiaris) [20:46:34] (03Merged) 10jenkins-bot: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692408 (owner: 10Ahmon Dancy) [20:47:31] (03PS1) 10Ahmon Dancy: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692410 [20:47:33] (03CR) 10Ahmon Dancy: [C: 03+2] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692410 (owner: 10Ahmon Dancy) [20:47:35] (03PS1) 10Ahmon Dancy: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692411 [20:47:37] (03CR) 10Ahmon Dancy: [C: 03+2] Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692411 (owner: 10Ahmon Dancy) [20:48:18] (03Merged) 10jenkins-bot: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692410 (owner: 10Ahmon Dancy) [20:48:21] (03Merged) 10jenkins-bot: Update train-versions.json [mediawiki-config] (sandbox/dancy) - 10https://gerrit.wikimedia.org/r/692411 (owner: 10Ahmon Dancy) [20:52:25] (03PS2) 10Legoktm: Revert "mailman3: Enable debug logging" [puppet] - 10https://gerrit.wikimedia.org/r/687409 (owner: 10Ladsgroup) [20:53:07] (03CR) 10Legoktm: [C: 03+2] Revert "mailman3: Enable debug logging" [puppet] - 10https://gerrit.wikimedia.org/r/687409 (owner: 10Ladsgroup) [20:55:12] (03PS1) 10Zabe: robots.php: avoid using ContentHandler::getContentText() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692413 [20:55:41] (03PS2) 10Legoktm: exim4: Add blackhole for "disabled-lists" [puppet] - 10https://gerrit.wikimedia.org/r/686879 (https://phabricator.wikimedia.org/T281779) (owner: 10Ladsgroup) [20:56:43] (03PS2) 10Zabe: robots.php: avoid using ContentHandler::getContentText() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692413 [20:59:24] 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti102[34] - https://phabricator.wikimedia.org/T283036 (10RobH) [20:59:42] 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti102[34] - https://phabricator.wikimedia.org/T283036 (10RobH) a:03Jclark-ctr [21:00:05] Reedy and sbassett: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T2100). [21:00:18] 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti102[34] - https://phabricator.wikimedia.org/T283036 (10RobH) [21:01:37] (03CR) 10Legoktm: [C: 04-1] scap: avoid using wfGetLB() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/692370 (owner: 10Zabe) [21:04:48] (03CR) 10Legoktm: [C: 03+2] exim4: Add blackhole for "disabled-lists" [puppet] - 10https://gerrit.wikimedia.org/r/686879 (https://phabricator.wikimedia.org/T281779) (owner: 10Ladsgroup) [21:04:51] 10SRE, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Implement static redirects from pipermail archives to hyperkitty archives - https://phabricator.wikimedia.org/T280731 (10Quiddity) Ah, I understand. I think your plan is acceptable - the old links still work, and the content is now available in the new s... [21:07:12] (03PS4) 10Zabe: scap: avoid using wfGetLB() [puppet] - 10https://gerrit.wikimedia.org/r/692370 [21:08:19] (03CR) 10Zabe: scap: avoid using wfGetLB() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/692370 (owner: 10Zabe) [21:10:10] (03CR) 10RhinosF1: [C: 03+1] [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [21:12:47] (03CR) 10Krinkle: "Do we know of use cases / users of this script currently? It doesn't look like the kind of thing you'd just want to run in production as-" [puppet] - 10https://gerrit.wikimedia.org/r/692370 (owner: 10Zabe) [21:14:50] (03CR) 10Legoktm: [C: 04-1] [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [21:15:39] (03CR) 10Bstorm: [C: 03+2] cloud nfs: Change primary cluster rate limits dramatically [puppet] - 10https://gerrit.wikimedia.org/r/691267 (https://phabricator.wikimedia.org/T218338) (owner: 10Bstorm) [21:17:27] (03CR) 10Bstorm: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/690055 (https://phabricator.wikimedia.org/T282264) (owner: 10Bstorm) [21:18:45] (03CR) 10Legoktm: "Hmm, I initially thought it was for dumps, but see now it's mysqldump. codesearch didn't find any uses, let me add DBAs as reviewers to ma" [puppet] - 10https://gerrit.wikimedia.org/r/692370 (owner: 10Zabe) [21:20:03] 10SRE, 10Analytics, 10Discovery, 10Event-Platform, and 2 others: Avoid accepting Kafka messages with whacky timestamps - https://phabricator.wikimedia.org/T282887 (10Milimetric) p:05High→03Medium [21:21:57] (03PS3) 10Zabe: robots.php: avoid using ContentHandler::getContentText() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692413 [21:25:52] Hey all - going to attempt to deploy a security patch for T260865 [21:46:45] !log Deployed security patch (and ran scap sync-l10n) for T260865 [21:46:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:47:34] (03CR) 10Bstorm: [C: 03+2] "Merging after discussion in IRC." [puppet] - 10https://gerrit.wikimedia.org/r/690055 (https://phabricator.wikimedia.org/T282264) (owner: 10Bstorm) [21:57:49] (03CR) 10Andrew Bogott: [C: 03+2] wmfkeystonehooks: Narrow our security group search to the new project [puppet] - 10https://gerrit.wikimedia.org/r/692378 (https://phabricator.wikimedia.org/T282876) (owner: 10Andrew Bogott) [21:58:18] (03PS1) 10Bstorm: cloudstore: remove the delete-after bit in the rsync code [puppet] - 10https://gerrit.wikimedia.org/r/692426 (https://phabricator.wikimedia.org/T224747) [21:59:23] (03CR) 10Bstorm: [C: 03+2] cloudstore: remove the delete-after bit in the rsync code [puppet] - 10https://gerrit.wikimedia.org/r/692426 (https://phabricator.wikimedia.org/T224747) (owner: 10Bstorm) [22:20:35] (03PS1) 10Bstorm: cloud-vps: enable the cert monitor for acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/692434 (https://phabricator.wikimedia.org/T282264) [22:25:36] (03CR) 10Legoktm: [C: 03+1] cloud-vps: enable the cert monitor for acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/692434 (https://phabricator.wikimedia.org/T282264) (owner: 10Bstorm) [22:26:01] (03CR) 10Bstorm: [C: 03+2] cloud-vps: enable the cert monitor for acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/692434 (https://phabricator.wikimedia.org/T282264) (owner: 10Bstorm) [22:59:07] (03PS3) 10Jsn.sherman: [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) [23:00:05] RoanKattouw, Niharika, and Urbanecm: May I have your attention please! Evening backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210517T2300) [23:00:05] No GERRIT patches in the queue for this window AFAICS. [23:00:16] 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Schema-change, 10User-notice: Mailman3 schema change: change utf8 columns to utf8mb4 - https://phabricator.wikimedia.org/T282621 (10Legoktm) How about 2021-05-19 06:00 UTC? Or any other day at that time Note: I reduced the lists of alters based on discussions w... [23:00:52] (03CR) 10Jsn.sherman: [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [23:03:20] (03PS1) 10Bstorm: paws: monitor the frontend certs maintained by acme-chief [puppet] - 10https://gerrit.wikimedia.org/r/692448 (https://phabricator.wikimedia.org/T282264) [23:11:46] (03CR) 10Urbanecm: [C: 04-1] "-1 for the indent" (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [23:14:30] (03PS4) 10Jsn.sherman: [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) [23:17:02] (03CR) 10Jsn.sherman: [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [23:20:18] legoktm: I'm about to merge https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/692356/ to unbreak T283003. I see you reviewed, so asking in case I missed sth there. [23:20:19] T283003: beta votewiki: Fatal error: Uncaught ExtensionDependencyError: TheWikipediaLibrary requires CentralAuth to be installed. - https://phabricator.wikimedia.org/T283003 [23:20:23] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [23:20:36] * legoktm looks again [23:21:08] (03CR) 10Legoktm: [C: 04-1] [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [23:21:17] E_TOOMANYGLOBALEXTENSIONS [23:21:24] damn [23:21:27] thanks [23:24:26] (03CR) 10Bstorm: [C: 03+2] "Seems to do the right thing. https://puppet-compiler.wmflabs.org/compiler1003/29603/alert1001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/692448 (https://phabricator.wikimedia.org/T282264) (owner: 10Bstorm) [23:25:15] * Urbanecm goes to update iwiki cache [23:25:55] (03PS1) 10Urbanecm: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692449 [23:25:57] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692449 (owner: 10Urbanecm) [23:26:25] (03PS5) 10Jsn.sherman: [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) [23:26:39] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692449 (owner: 10Urbanecm) [23:27:42] !log urbanecm@deploy1002 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 55s) [23:27:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:37] (03CR) 10Legoktm: [C: 03+1] "woot" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [23:29:11] (03CR) 10Urbanecm: [C: 03+2] "let's ship it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [23:29:13] (03CR) 10Jsn.sherman: [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [23:29:58] (03Merged) 10jenkins-bot: [BETA CLUSTER] TheWikipediaLibrary requires CentralAuth and GlobalPreferences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692356 (https://phabricator.wikimedia.org/T283003) (owner: 10Jsn.sherman) [23:33:19] !log urbanecm@deploy1002 update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 01s) [23:33:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:34:52] (03PS1) 10Urbanecm: Update interwiki cache for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692451 [23:35:57] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692451 (owner: 10Urbanecm) [23:36:12] (03Abandoned) 10Urbanecm: Update EmacsWiki in Interwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/577740 (https://phabricator.wikimedia.org/T227053) (owner: 10Fomafix) [23:36:53] (03Merged) 10jenkins-bot: Update interwiki cache for Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692451 (owner: 10Urbanecm) [23:42:19] (03CR) 10Krinkle: [C: 04-1] multiversion: enhance buildDBList output (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [23:42:54] (03PS1) 10Urbanecm: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org (part 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/692452 (https://phabricator.wikimedia.org/T280400) [23:43:17] (03CR) 10Krinkle: [C: 04-1] multiversion: enhance buildDBList output (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 (owner: 10Hashar) [23:44:36] (03PS2) 10Urbanecm: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org (part 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/690680 (https://phabricator.wikimedia.org/T280400) [23:52:14] PROBLEM - BGP status on cr3-knams is CRITICAL: BGP CRITICAL - AS1257/IPv4: Active - Tele2, AS1257/IPv6: Active - Tele2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status