[00:00:20] (03CR) 10jerkins-bot: [V: 04-1] Fix changes list "hide myself" feature [core] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689059 (https://phabricator.wikimedia.org/T282183) (owner: 10Tim Starling) [00:13:26] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [00:13:30] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 145, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [00:24:41] (03PS2) 10Tim Starling: Fix changes list "hide myself" feature [core] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689059 (https://phabricator.wikimedia.org/T282183) [00:25:04] (03PS2) 10Tim Starling: Fix changes list "hide myself" feature [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689060 (https://phabricator.wikimedia.org/T282183) [00:25:35] (03CR) 10Tim Starling: [C: 03+2] Fix changes list "hide myself" feature [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689060 (https://phabricator.wikimedia.org/T282183) (owner: 10Tim Starling) [00:25:47] (03CR) 10Tim Starling: [C: 03+2] Fix changes list "hide myself" feature [core] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689059 (https://phabricator.wikimedia.org/T282183) (owner: 10Tim Starling) [00:28:19] 10SRE, 10Patch-For-Review: try planet/people on bullseye / upgrade people.wikimedia.org backends to bullseye - https://phabricator.wikimedia.org/T280989 (10Dzahn) [00:28:27] 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Schema-change: Mailman3 schema change: change utf8 columns to utf8mb4 - https://phabricator.wikimedia.org/T282621 (10Legoktm) [00:38:01] (03PS1) 10Dzahn: DHCP: add people2002 MAC address and use bullseye installer [puppet] - 10https://gerrit.wikimedia.org/r/689375 (https://phabricator.wikimedia.org/T280989) [00:51:39] !log [people1002:/home] $ sudo find . -type d -name public_html -exec chmod 555 {} \; [00:51:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:54:03] !log made public_html dirs on people1002 readonly to make it obvious it is not the active backend anymore [00:54:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:56:58] (03Merged) 10jenkins-bot: Fix changes list "hide myself" feature [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689060 (https://phabricator.wikimedia.org/T282183) (owner: 10Tim Starling) [00:57:04] (03Merged) 10jenkins-bot: Fix changes list "hide myself" feature [core] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689059 (https://phabricator.wikimedia.org/T282183) (owner: 10Tim Starling) [01:02:54] (03CR) 10Dzahn: [C: 03+2] DHCP: add people2002 MAC address and use bullseye installer [puppet] - 10https://gerrit.wikimedia.org/r/689375 (https://phabricator.wikimedia.org/T280989) (owner: 10Dzahn) [01:02:59] (03PS2) 10Dzahn: DHCP: add people2002 MAC address and use bullseye installer [puppet] - 10https://gerrit.wikimedia.org/r/689375 (https://phabricator.wikimedia.org/T280989) [01:07:43] (03PS1) 10Dzahn: peopleweb: allow one last rsync before setting people1002 readonly [puppet] - 10https://gerrit.wikimedia.org/r/689380 [01:09:11] (03CR) 10Dzahn: [C: 03+2] peopleweb: allow one last rsync before setting people1002 readonly [puppet] - 10https://gerrit.wikimedia.org/r/689380 (owner: 10Dzahn) [01:09:17] (03PS2) 10Dzahn: peopleweb: allow one last rsync before setting people1002 readonly [puppet] - 10https://gerrit.wikimedia.org/r/689380 [01:16:44] 10SRE, 10Patch-For-Review: try planet/people on bullseye / upgrade people.wikimedia.org backends to bullseye - https://phabricator.wikimedia.org/T280989 (10Dzahn) 00:51 < mutante> !log [people1002:/home] $ sudo find . -type d -name public_html -exec chmod 555 {} \; 00:54 < mutante> !log made public_html dirs o... [01:17:04] !log tstarling@deploy1002 Synchronized php-1.37.0-wmf.4/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 16s) [01:17:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:17:08] T282183: Hiding own edits on watchlist and recentchanges results in hiding all IP edits. - https://phabricator.wikimedia.org/T282183 [01:18:53] (03PS1) 10Dzahn: peopleweb: set people1003 as rsync source, people2002 as new des [puppet] - 10https://gerrit.wikimedia.org/r/689387 [01:19:15] (03PS2) 10Dzahn: peopleweb: set people1003 as rsync source, people2002 as new des [puppet] - 10https://gerrit.wikimedia.org/r/689387 [01:19:30] (03PS3) 10Dzahn: peopleweb: set people1003 as rsync source, people2002 as new dest [puppet] - 10https://gerrit.wikimedia.org/r/689387 [01:19:34] !log tstarling@deploy1002 Synchronized php-1.37.0-wmf.5/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 08s) [01:19:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:27:06] (03CR) 10Dzahn: [C: 03+2] peopleweb: set people1003 as rsync source, people2002 as new dest [puppet] - 10https://gerrit.wikimedia.org/r/689387 (owner: 10Dzahn) [01:31:31] (03CR) 10Dzahn: [C: 03+2] backups: add people2002 to ignore file to avoid false positive monitoring alert [puppet] - 10https://gerrit.wikimedia.org/r/689259 (owner: 10Dzahn) [01:31:54] (03PS2) 10Dzahn: backups: add people2002 to ignore file to avoid false positive monitoring alert [puppet] - 10https://gerrit.wikimedia.org/r/689259 (https://phabricator.wikimedia.org/T280989) [01:35:48] !log people2002 - created new VM resembling people2001, signed puppet cert request, initial puppet run T280989 [01:35:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:35:52] T280989: try planet/people on bullseye / upgrade people.wikimedia.org backends to bullseye - https://phabricator.wikimedia.org/T280989 [01:36:37] (03PS1) 10Dzahn: site: add people2002 with peopleweb role [puppet] - 10https://gerrit.wikimedia.org/r/689407 (https://phabricator.wikimedia.org/T280989) [01:37:02] (03CR) 10jerkins-bot: [V: 04-1] site: add people2002 with peopleweb role [puppet] - 10https://gerrit.wikimedia.org/r/689407 (https://phabricator.wikimedia.org/T280989) (owner: 10Dzahn) [01:42:07] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host [01:42:08] !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host [01:42:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:42:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:48:25] (03PS1) 10Dzahn: site: add people2002 with insetup role [puppet] - 10https://gerrit.wikimedia.org/r/689412 (https://phabricator.wikimedia.org/T280989) [01:49:00] (03CR) 10Dzahn: [C: 03+2] site: add people2002 with insetup role [puppet] - 10https://gerrit.wikimedia.org/r/689412 (https://phabricator.wikimedia.org/T280989) (owner: 10Dzahn) [02:27:29] (03PS1) 10Andrew Bogott: Trove: inject a public key into new Trove VMs [puppet] - 10https://gerrit.wikimedia.org/r/689432 [02:28:28] (03CR) 10Andrew Bogott: [C: 03+2] Trove: inject a public key into new Trove VMs [puppet] - 10https://gerrit.wikimedia.org/r/689432 (owner: 10Andrew Bogott) [02:49:28] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [02:51:36] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [03:43:44] (03PS1) 10Andrew Bogott: trove-guestagent.conf: use /dev/sdb (scsi drivers) rather than vdb [puppet] - 10https://gerrit.wikimedia.org/r/689458 [03:44:40] (03CR) 10Andrew Bogott: [C: 03+2] trove-guestagent.conf: use /dev/sdb (scsi drivers) rather than vdb [puppet] - 10https://gerrit.wikimedia.org/r/689458 (owner: 10Andrew Bogott) [04:13:37] (03PS1) 10Andrew Bogott: OpenStack Trove: add quay.io credentials for client docker access [puppet] - 10https://gerrit.wikimedia.org/r/689467 [04:15:02] (03CR) 10jerkins-bot: [V: 04-1] OpenStack Trove: add quay.io credentials for client docker access [puppet] - 10https://gerrit.wikimedia.org/r/689467 (owner: 10Andrew Bogott) [04:21:52] (03PS1) 10Andrew Bogott: Added fake trove quay.io creds [labs/private] - 10https://gerrit.wikimedia.org/r/689471 [04:22:03] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Added fake trove quay.io creds [labs/private] - 10https://gerrit.wikimedia.org/r/689471 (owner: 10Andrew Bogott) [04:23:27] (03PS2) 10Andrew Bogott: OpenStack Trove: add quay.io credentials for client docker access [puppet] - 10https://gerrit.wikimedia.org/r/689467 [04:24:52] (03CR) 10jerkins-bot: [V: 04-1] OpenStack Trove: add quay.io credentials for client docker access [puppet] - 10https://gerrit.wikimedia.org/r/689467 (owner: 10Andrew Bogott) [04:27:04] (03PS3) 10Andrew Bogott: OpenStack Trove: add quay.io credentials for client docker access [puppet] - 10https://gerrit.wikimedia.org/r/689467 [04:28:40] (03CR) 10Andrew Bogott: [C: 03+2] OpenStack Trove: add quay.io credentials for client docker access [puppet] - 10https://gerrit.wikimedia.org/r/689467 (owner: 10Andrew Bogott) [04:36:22] !log importing archives of wikitech-l (T280322) [04:36:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:36:26] T280322: Upgrade mailing lists from mailman2 to 3 in batches - https://phabricator.wikimedia.org/T280322 [04:36:57] 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Upgrade mailing lists from mailman2 to 3 in batches - https://phabricator.wikimedia.org/T280322 (10Ladsgroup) [04:37:36] (03PS1) 10Marostegui: db2108: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/689478 (https://phabricator.wikimedia.org/T282535) [04:37:55] (03CR) 10Marostegui: [C: 03+2] production-m5.sql: Remove testmailman users [puppet] - 10https://gerrit.wikimedia.org/r/688975 (https://phabricator.wikimedia.org/T281548) (owner: 10Marostegui) [04:38:20] (03CR) 10Marostegui: [C: 03+2] db2108: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/689478 (https://phabricator.wikimedia.org/T282535) (owner: 10Marostegui) [04:38:36] 10SRE, 10Wikimedia-Mailing-lists, 10User-Ladsgroup: Upgrade mailing lists from mailman2 to 3 in batches - https://phabricator.wikimedia.org/T280322 (10Ladsgroup) Group H is basically done, hyperkitty import failed on wikitech-l and unblock-en-l, I try to import these two. wikien-l and unblock-zh had issues f... [04:38:58] !log Drop testing mailman3 databases T281548 [04:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:39:01] T281548: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 [04:41:03] 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Marostegui) Databases and users dropped. The DB side of things complete. [04:41:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2108 T282535', diff saved to https://phabricator.wikimedia.org/P15919 and previous config saved to /var/cache/conftool/dbconfig/20210512-044109-marostegui.json [04:41:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:41:13] T282535: Move db2108 from s2 to s7 - https://phabricator.wikimedia.org/T282535 [04:42:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2121 T282535', diff saved to https://phabricator.wikimedia.org/P15920 and previous config saved to /var/cache/conftool/dbconfig/20210512-044222-marostegui.json [04:42:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:47:06] (03PS1) 10Marostegui: mariadb: Move db2108 from s2 to s7. [puppet] - 10https://gerrit.wikimedia.org/r/689483 (https://phabricator.wikimedia.org/T282535) [04:47:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P15922 and previous config saved to /var/cache/conftool/dbconfig/20210512-044728-marostegui.json [04:47:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:49:05] (03CR) 10Marostegui: [C: 03+2] mariadb: Move db2108 from s2 to s7. [puppet] - 10https://gerrit.wikimedia.org/r/689483 (https://phabricator.wikimedia.org/T282535) (owner: 10Marostegui) [04:56:54] 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Schema-change, 10User-notice: Mailman3 schema change: change utf8 columns to utf8mb4 - https://phabricator.wikimedia.org/T282621 (10Ladsgroup) Wouldn't a message in tech news be enough? [05:00:43] !log Stop MySQL on labsdb1009 labsdb1010 labsdb1011 T282524 T282523 T282522 [05:00:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:49] T282524: decommission labsdb1011.eqiad.wmnet - https://phabricator.wikimedia.org/T282524 [05:00:49] T282522: decommission labsdb1009.eqiad.wmnet - https://phabricator.wikimedia.org/T282522 [05:00:50] T282523: decommission labsdb1010.eqiad.wmnet - https://phabricator.wikimedia.org/T282523 [05:03:50] 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Delete lists-next.wikimedia.org - https://phabricator.wikimedia.org/T281548 (10Legoktm) 05Open→03Resolved a:03Legoktm All done then! [05:09:48] (03PS1) 10Marostegui: dbproxy1019: Depool clouddb1013 [puppet] - 10https://gerrit.wikimedia.org/r/689491 (https://phabricator.wikimedia.org/T277867) [05:10:16] (03CR) 10jerkins-bot: [V: 04-1] dbproxy1019: Depool clouddb1013 [puppet] - 10https://gerrit.wikimedia.org/r/689491 (https://phabricator.wikimedia.org/T277867) (owner: 10Marostegui) [05:11:47] (03PS2) 10Marostegui: dbproxy1019: Depool clouddb1013 [puppet] - 10https://gerrit.wikimedia.org/r/689491 (https://phabricator.wikimedia.org/T277867) [05:17:34] (03CR) 10Marostegui: [C: 03+2] dbproxy1019: Depool clouddb1013 [puppet] - 10https://gerrit.wikimedia.org/r/689491 (https://phabricator.wikimedia.org/T277867) (owner: 10Marostegui) [05:20:47] (03PS1) 10Marostegui: Revert "dbproxy1019: Depool clouddb1013" [puppet] - 10https://gerrit.wikimedia.org/r/689064 [05:22:59] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1019: Depool clouddb1013" [puppet] - 10https://gerrit.wikimedia.org/r/689064 (owner: 10Marostegui) [05:29:20] (03PS1) 10Marostegui: db1074: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/689501 (https://phabricator.wikimedia.org/T281959) [05:30:11] (03CR) 10Marostegui: [C: 03+2] db1074: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/689501 (https://phabricator.wikimedia.org/T281959) (owner: 10Marostegui) [05:31:52] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15923 and previous config saved to /var/cache/conftool/dbconfig/20210512-053151-root.json [05:31:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:42:31] RECOVERY - MariaDB memory on clouddb1013 is OK: OK Memory 53% used https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting [05:46:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15924 and previous config saved to /var/cache/conftool/dbconfig/20210512-054655-root.json [05:46:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:52:46] (03CR) 10Marostegui: [C: 03+1] "I am fine with this, as long as this is temporary and we do work on moving wikitech to the standard app servers." [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [06:01:59] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15925 and previous config saved to /var/cache/conftool/dbconfig/20210512-060158-root.json [06:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:03:23] (03PS1) 10Marostegui: Revert "install_server: Reimage db2108" [puppet] - 10https://gerrit.wikimedia.org/r/689529 [06:03:30] (03PS2) 10Marostegui: Revert "install_server: Reimage db2108" [puppet] - 10https://gerrit.wikimedia.org/r/689529 [06:04:16] (03CR) 10Marostegui: [C: 03+2] Revert "install_server: Reimage db2108" [puppet] - 10https://gerrit.wikimedia.org/r/689529 (owner: 10Marostegui) [06:08:17] !log marostegui@cumin1001 dbctl commit (dc=all): 'Move db2148 to also serve vslow in s2 T282535', diff saved to https://phabricator.wikimedia.org/P15926 and previous config saved to /var/cache/conftool/dbconfig/20210512-060817-marostegui.json [06:08:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:08:21] T282535: Move db2108 from s2 to s7 - https://phabricator.wikimedia.org/T282535 [06:17:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15927 and previous config saved to /var/cache/conftool/dbconfig/20210512-061702-root.json [06:17:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:20:46] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2121 and db2108 in s7 T282535', diff saved to https://phabricator.wikimedia.org/P15928 and previous config saved to /var/cache/conftool/dbconfig/20210512-062046-marostegui.json [06:20:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:20:50] T282535: Move db2108 from s2 to s7 - https://phabricator.wikimedia.org/T282535 [06:21:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P15929 and previous config saved to /var/cache/conftool/dbconfig/20210512-062118-marostegui.json [06:21:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:06] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15930 and previous config saved to /var/cache/conftool/dbconfig/20210512-062506-root.json [06:25:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:32:25] (03PS1) 10Legoktm: Revert "Create buildPersonalPage method for SkinTemplate class, add menu item to personal menu.." [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689532 (https://phabricator.wikimedia.org/T276561) [06:40:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15931 and previous config saved to /var/cache/conftool/dbconfig/20210512-064009-root.json [06:40:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:03] (03PS1) 10Marostegui: install_server: Switch db1112 to buster [puppet] - 10https://gerrit.wikimedia.org/r/689523 (https://phabricator.wikimedia.org/T280492) [06:55:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15932 and previous config saved to /var/cache/conftool/dbconfig/20210512-065513-root.json [06:55:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:33] (03CR) 10Marostegui: [C: 03+2] install_server: Switch db1112 to buster [puppet] - 10https://gerrit.wikimedia.org/r/689523 (https://phabricator.wikimedia.org/T280492) (owner: 10Marostegui) [07:00:20] (03PS1) 10Marostegui: db1079: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/689524 (https://phabricator.wikimedia.org/T282079) [07:01:16] (03CR) 10Marostegui: [C: 03+2] db1079: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/689524 (https://phabricator.wikimedia.org/T282079) (owner: 10Marostegui) [07:02:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15933 and previous config saved to /var/cache/conftool/dbconfig/20210512-070202-marostegui.json [07:02:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:02:37] 10SRE, 10ops-codfw, 10DC-Ops, 10Discovery-Search: hw troubleshooting: failure to power up for elastic2043.codfw.wmnet - https://phabricator.wikimedia.org/T281327 (10RKemper) @Papaul Per the above, let's go ahead and open the case with Dell for the failure [07:09:24] (03PS1) 10Ryan Kemper: wdqs: hack issue blocking reimage on some hosts [puppet] - 10https://gerrit.wikimedia.org/r/689525 (https://phabricator.wikimedia.org/T280382) [07:10:17] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15934 and previous config saved to /var/cache/conftool/dbconfig/20210512-071017-root.json [07:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:11:40] (03CR) 10Ryan Kemper: "I haven't tested this to see if it works - is there an easy way to?" [puppet] - 10https://gerrit.wikimedia.org/r/689525 (https://phabricator.wikimedia.org/T280382) (owner: 10Ryan Kemper) [07:25:08] (03PS1) 10Giuseppe Lavagetto: Fix an error with the new networkpolicy common_templates [deployment-charts] - 10https://gerrit.wikimedia.org/r/689666 [07:25:38] (03CR) 10jerkins-bot: [V: 04-1] Fix an error with the new networkpolicy common_templates [deployment-charts] - 10https://gerrit.wikimedia.org/r/689666 (owner: 10Giuseppe Lavagetto) [07:35:41] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [07:41:49] (03PS2) 10Muehlenhoff: motd: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671098 [07:44:07] 10SRE, 10Prod-Kubernetes, 10serviceops, 10Kubernetes, and 2 others: Upgrade Calico - https://phabricator.wikimedia.org/T207804 (10JMeybohm) [07:44:34] 10SRE, 10Prod-Kubernetes, 10serviceops, 10Kubernetes: Set resource requests and limits for calico PODs - https://phabricator.wikimedia.org/T277877 (10JMeybohm) 05Open→03Resolved Calico components are running with resource definitions in all clusters now. [07:46:20] (03CR) 10Muehlenhoff: [C: 03+2] motd: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671098 (owner: 10Muehlenhoff) [07:53:57] (03PS2) 10Muehlenhoff: standard_packages: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671095 [07:54:21] (03PS1) 10Jcrespo: dbbackups: Disable temporarily rw-backups, enable ro-backups [puppet] - 10https://gerrit.wikimedia.org/r/689672 (https://phabricator.wikimedia.org/T282249) [07:55:27] (03PS1) 10Hashar: multiversion: enhance buildDBList output [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689673 [07:57:36] (03CR) 10Vgutierrez: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/685497 (https://phabricator.wikimedia.org/T281673) (owner: 10Jbond) [08:00:37] 10SRE, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo - https://phabricator.wikimedia.org/T282589 (10Elitre) [08:00:44] (03PS10) 10Elukey: WIP - Add istio base images build support [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/688211 (https://phabricator.wikimedia.org/T278192) [08:03:50] (03CR) 10Muehlenhoff: [C: 03+2] standard_packages: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671095 (owner: 10Muehlenhoff) [08:05:09] (03PS2) 10Muehlenhoff: apt: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671092 [08:08:31] (03Abandoned) 10Alexandros Kosiaris: base: Remove the jessie if clause, move packages to array [puppet] - 10https://gerrit.wikimedia.org/r/685740 (owner: 10Alexandros Kosiaris) [08:09:25] (03PS5) 10Matthias Mullie: Enable Extension:MediaSearch on testcommons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682102 (https://phabricator.wikimedia.org/T265939) [08:09:27] (03PS2) 10Matthias Mullie: Enable Extension:MediaSearch on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682105 (https://phabricator.wikimedia.org/T265939) [08:10:13] (03CR) 10Matthias Mullie: Enable Extension:MediaSearch on commons (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/682105 (https://phabricator.wikimedia.org/T265939) (owner: 10Matthias Mullie) [08:11:21] (03CR) 10Filippo Giunchedi: [C: 03+1] rsyslog: add ecs_170 template [puppet] - 10https://gerrit.wikimedia.org/r/688502 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [08:12:25] (03PS11) 10Elukey: WIP - Add istio base images build support [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/688211 (https://phabricator.wikimedia.org/T278192) [08:13:54] (03CR) 10Jbond: [C: 03+2] P:trafficserver::backend: update the source of the ATS trusted ca bundle [puppet] - 10https://gerrit.wikimedia.org/r/685497 (https://phabricator.wikimedia.org/T281673) (owner: 10Jbond) [08:14:07] (03PS1) 10Volans: cumin: remove jessie from distro aliases [puppet] - 10https://gerrit.wikimedia.org/r/689686 (https://phabricator.wikimedia.org/T224549) [08:15:53] (03CR) 10Muehlenhoff: [C: 03+2] apt: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671092 (owner: 10Muehlenhoff) [08:16:22] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/689686 (https://phabricator.wikimedia.org/T224549) (owner: 10Volans) [08:17:51] (03CR) 10Volans: [C: 03+2] cumin: remove jessie from distro aliases [puppet] - 10https://gerrit.wikimedia.org/r/689686 (https://phabricator.wikimedia.org/T224549) (owner: 10Volans) [08:18:16] moritzm: can I merge your patch too? [08:18:22] apt: Remove support for jessie [08:19:12] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, just as speculation and non-blocking: does openstack support json logging natively?" [puppet] - 10https://gerrit.wikimedia.org/r/689262 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [08:19:29] (03PS2) 10Muehlenhoff: aptrepo: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671091 [08:19:44] volans: ack, please go [08:19:56] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [08:19:57] great, doing [08:20:04] {done} [08:22:39] (03CR) 10Gehel: [C: 03+1] "LGTM. This will require a cluster restart after deployment." [puppet] - 10https://gerrit.wikimedia.org/r/688309 (owner: 10ZPapierski) [08:23:01] !log rolling restart of ats [08:23:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:53] (03CR) 10Muehlenhoff: [C: 03+2] aptrepo: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671091 (owner: 10Muehlenhoff) [08:25:10] (03CR) 10Volans: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/689525 (https://phabricator.wikimedia.org/T280382) (owner: 10Ryan Kemper) [08:28:36] (03PS2) 10Muehlenhoff: base: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671103 [08:31:21] (03PS2) 10Giuseppe Lavagetto: Fix the networkpolicy helpers [deployment-charts] - 10https://gerrit.wikimedia.org/r/689666 [08:34:09] (03PS1) 10Marostegui: instances.yaml: Remove db1074 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/689692 (https://phabricator.wikimedia.org/T281959) [08:35:12] (03PS2) 10Kormat: mariadb: Promote db1173 to s6 eqiad master. [puppet] - 10https://gerrit.wikimedia.org/r/686505 (https://phabricator.wikimedia.org/T282124) [08:35:42] (03PS2) 10Kormat: wmnet: Update s6-master to db1173 [dns] - 10https://gerrit.wikimedia.org/r/686513 (https://phabricator.wikimedia.org/T282124) [08:36:02] (03CR) 10Marostegui: [C: 03+2] instances.yaml: Remove db1074 from dbctl [puppet] - 10https://gerrit.wikimedia.org/r/689692 (https://phabricator.wikimedia.org/T281959) (owner: 10Marostegui) [08:36:36] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Fix the networkpolicy helpers [deployment-charts] - 10https://gerrit.wikimedia.org/r/689666 (owner: 10Giuseppe Lavagetto) [08:38:17] (03Merged) 10jenkins-bot: Fix the networkpolicy helpers [deployment-charts] - 10https://gerrit.wikimedia.org/r/689666 (owner: 10Giuseppe Lavagetto) [08:38:23] (03Restored) 10Matthias Mullie: Enable media change tags on wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/674882 (https://phabricator.wikimedia.org/T266067) (owner: 10Matthias Mullie) [08:38:29] (03CR) 10Muehlenhoff: [C: 03+2] base: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671103 (owner: 10Muehlenhoff) [08:40:08] (03PS2) 10Matthias Mullie: Enable media change tags on wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/674882 (https://phabricator.wikimedia.org/T266067) [08:41:44] (03PS2) 10Muehlenhoff: debian: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671097 [08:43:14] (03CR) 10jerkins-bot: [V: 04-1] debian: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671097 (owner: 10Muehlenhoff) [08:43:36] (03PS1) 10Jbond: admin::useres: add curl-fe/be/lvs aliases [puppet] - 10https://gerrit.wikimedia.org/r/689693 [08:45:37] 10SRE, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo - https://phabricator.wikimedia.org/T282589 (10Aklapper) It would be most helpful to have only one request per one ticket. See https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Req... [08:46:11] (03CR) 10Jbond: [C: 03+2] admin::useres: add curl-fe/be/lvs aliases [puppet] - 10https://gerrit.wikimedia.org/r/689693 (owner: 10Jbond) [08:47:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove db1074 from dbctl T281959', diff saved to https://phabricator.wikimedia.org/P15935 and previous config saved to /var/cache/conftool/dbconfig/20210512-084755-marostegui.json [08:47:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15936 and previous config saved to /var/cache/conftool/dbconfig/20210512-084757-root.json [08:47:58] T281959: decommission db1074.eqiad.wmnet - https://phabricator.wikimedia.org/T281959 [08:47:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:50] (03PS1) 10Jbond: admin: user file jbond: use full hostname [puppet] - 10https://gerrit.wikimedia.org/r/689696 [08:51:08] (03PS1) 10Marostegui: Revert "db1121: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/689539 [08:52:04] (03CR) 10Marostegui: [C: 03+2] Revert "db1121: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/689539 (owner: 10Marostegui) [08:52:44] (03PS2) 10Jbond: admin: user file jbond: use full hostname [puppet] - 10https://gerrit.wikimedia.org/r/689696 [08:53:34] (03PS10) 10Giuseppe Lavagetto: Add canary support in scaffolding [deployment-charts] - 10https://gerrit.wikimedia.org/r/685748 (https://phabricator.wikimedia.org/T282148) (owner: 10Effie Mouzeli) [08:53:36] (03PS19) 10Giuseppe Lavagetto: Helm chart to run MediaWiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/670220 (https://phabricator.wikimedia.org/T265327) [08:53:39] (03CR) 10Jbond: [C: 03+2] admin: user file jbond: use full hostname [puppet] - 10https://gerrit.wikimedia.org/r/689696 (owner: 10Jbond) [08:56:10] (03CR) 10Marostegui: [C: 03+1] mariadb: Promote db1173 to s6 eqiad master. [puppet] - 10https://gerrit.wikimedia.org/r/686505 (https://phabricator.wikimedia.org/T282124) (owner: 10Kormat) [08:56:27] (03CR) 10Marostegui: [C: 03+1] wmnet: Update s6-master to db1173 [dns] - 10https://gerrit.wikimedia.org/r/686513 (https://phabricator.wikimedia.org/T282124) (owner: 10Kormat) [09:00:58] (03PS1) 10Jbond: admin: jbond fix alias commands [puppet] - 10https://gerrit.wikimedia.org/r/689700 [09:03:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15937 and previous config saved to /var/cache/conftool/dbconfig/20210512-090301-root.json [09:03:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:14] (03CR) 10DannyS712: [C: 04-1] "Please wait for clarification on whether crats should still be able to revoke the rights" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689321 (https://phabricator.wikimedia.org/T282624) (owner: 10Zabe) [09:05:23] (03CR) 10Jbond: [C: 03+2] admin: jbond fix alias commands [puppet] - 10https://gerrit.wikimedia.org/r/689700 (owner: 10Jbond) [09:05:44] (03PS3) 10Jbond: debian: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671097 (owner: 10Muehlenhoff) [09:07:19] (03CR) 10jerkins-bot: [V: 04-1] debian: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671097 (owner: 10Muehlenhoff) [09:10:05] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission for hosts db1074.eqiad.wmnet [09:10:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:34] (03PS1) 10Marostegui: mariadb: Decommission db1074 [puppet] - 10https://gerrit.wikimedia.org/r/689702 (https://phabricator.wikimedia.org/T281959) [09:14:31] (03CR) 10Marostegui: [C: 03+2] mariadb: Decommission db1074 [puppet] - 10https://gerrit.wikimedia.org/r/689702 (https://phabricator.wikimedia.org/T281959) (owner: 10Marostegui) [09:17:08] (03PS1) 10Muehlenhoff: Remove remaining debian::codename tests for jessie [puppet] - 10https://gerrit.wikimedia.org/r/689703 [09:18:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15938 and previous config saved to /var/cache/conftool/dbconfig/20210512-091804-root.json [09:18:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:00] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1074.eqiad.wmnet [09:20:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:11] (03PS12) 10Elukey: WIP - Add istio base images build support [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/688211 (https://phabricator.wikimedia.org/T278192) [09:20:51] 10ops-eqiad, 10DBA, 10decommission-hardware: decommission db1074.eqiad.wmnet - https://phabricator.wikimedia.org/T281959 (10Marostegui) [09:20:57] 10ops-eqiad, 10DBA, 10decommission-hardware: decommission db1074.eqiad.wmnet - https://phabricator.wikimedia.org/T281959 (10Marostegui) This is ready for #dc-ops [09:21:29] 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission db1074.eqiad.wmnet - https://phabricator.wikimedia.org/T281959 (10Marostegui) [09:21:50] 10SRE, 10DBA, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [09:23:23] PROBLEM - k8s API server requests latencies on kubestagemaster2001 is CRITICAL: instance=10.192.48.10 verb=PATCH https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:28:09] RECOVERY - k8s API server requests latencies on kubestagemaster2001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [09:28:17] (03PS1) 10Kosta Harlan: AddLink: Auto-advance to save dialog on mobile with "No" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689540 (https://phabricator.wikimedia.org/T282424) [09:32:28] (03PS1) 10Ladsgroup: prometheus: Migrate node_varnishd_mmap_count cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/689715 (https://phabricator.wikimedia.org/T273673) [09:32:41] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/29522/" [puppet] - 10https://gerrit.wikimedia.org/r/689703 (owner: 10Muehlenhoff) [09:33:04] (03CR) 10Ladsgroup: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/689715 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [09:33:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15939 and previous config saved to /var/cache/conftool/dbconfig/20210512-093308-root.json [09:33:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15940 and previous config saved to /var/cache/conftool/dbconfig/20210512-093333-marostegui.json [09:33:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:42] (03CR) 10Muehlenhoff: [C: 03+2] Remove remaining debian::codename tests for jessie [puppet] - 10https://gerrit.wikimedia.org/r/689703 (owner: 10Muehlenhoff) [09:34:32] (03PS4) 10Muehlenhoff: debian: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671097 [09:36:08] (03PS1) 10Volans: Build artifacts for Debian bullseye [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/689716 [09:36:38] (03CR) 10Muehlenhoff: [C: 03+2] debian: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/671097 (owner: 10Muehlenhoff) [09:36:48] (03PS1) 10Kosta Harlan: Add a link: change how RecommendedLinkToolbarDialog determines when to update the current recommendation [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689541 [09:39:37] (03CR) 10Vgutierrez: [C: 03+1] prometheus: Migrate node_varnishd_mmap_count cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/689715 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [09:41:58] (03PS1) 10Kosta Harlan: Add a link: fix link inspector calculations for RTL [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689542 (https://phabricator.wikimedia.org/T282506) [09:42:23] (03PS1) 10Muehlenhoff: Update a few comments related to jessie [puppet] - 10https://gerrit.wikimedia.org/r/689717 [09:42:54] (03PS2) 10Volans: Build artifacts for Debian bullseye [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/689716 (https://phabricator.wikimedia.org/T281596) [09:44:02] (03PS2) 10Muehlenhoff: Update a few comments related to jessie [puppet] - 10https://gerrit.wikimedia.org/r/689717 [09:45:02] (03PS1) 10Kosta Harlan: Add a link: explicitly set annotation icon's margin for mixed directionality [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689543 (https://phabricator.wikimedia.org/T282401) [09:48:10] (03CR) 10Volans: Update a few comments related to jessie (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/689717 (owner: 10Muehlenhoff) [09:51:35] (03CR) 10Ayounsi: [C: 03+1] Build artifacts for Debian bullseye [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/689716 (https://phabricator.wikimedia.org/T281596) (owner: 10Volans) [09:52:03] (03PS1) 10Ayounsi: Remove Ariel from network devices [homer/public] - 10https://gerrit.wikimedia.org/r/689718 [09:52:49] (03CR) 10Volans: openldap: Convert the weekday cross-validate-accounts from cron to systemd. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/688423 (owner: 10RLazarus) [09:53:22] (03CR) 10Volans: [V: 03+2 C: 03+2] Build artifacts for Debian bullseye [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/689716 (https://phabricator.wikimedia.org/T281596) (owner: 10Volans) [09:55:09] !log volans@cumin2002 START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002 [09:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:30] (03PS3) 10Muehlenhoff: Update a few comments related to jessie [puppet] - 10https://gerrit.wikimedia.org/r/689717 [09:55:49] (03PS1) 10Kosta Harlan: Add a link: open help panel's suggested-edits panel instead of home [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689544 (https://phabricator.wikimedia.org/T278488) [09:56:16] (03CR) 10Muehlenhoff: Update a few comments related to jessie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/689717 (owner: 10Muehlenhoff) [10:00:27] (03CR) 10Giuseppe Lavagetto: Helm chart to run MediaWiki (0311 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/670220 (https://phabricator.wikimedia.org/T265327) (owner: 10Giuseppe Lavagetto) [10:00:50] (03CR) 10Giuseppe Lavagetto: Helm chart to run MediaWiki (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/670220 (https://phabricator.wikimedia.org/T265327) (owner: 10Giuseppe Lavagetto) [10:01:10] !log reboot poolcounter2003 and poolcounter2004 [10:01:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:17] (03PS20) 10Giuseppe Lavagetto: Helm chart to run MediaWiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/670220 (https://phabricator.wikimedia.org/T265327) [10:01:36] !log jiji@cumin1001 START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet [10:01:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:27] (03CR) 10jerkins-bot: [V: 04-1] Helm chart to run MediaWiki [deployment-charts] - 10https://gerrit.wikimedia.org/r/670220 (https://phabricator.wikimedia.org/T265327) (owner: 10Giuseppe Lavagetto) [10:07:36] (03CR) 10Muehlenhoff: Update a few comments related to jessie (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/689717 (owner: 10Muehlenhoff) [10:10:32] (03PS1) 10Marostegui: site.pp: Remove labsdb comments [puppet] - 10https://gerrit.wikimedia.org/r/689732 (https://phabricator.wikimedia.org/T282662) [10:11:29] (03CR) 10Marostegui: [C: 03+2] site.pp: Remove labsdb comments [puppet] - 10https://gerrit.wikimedia.org/r/689732 (https://phabricator.wikimedia.org/T282662) (owner: 10Marostegui) [10:11:46] moritzm: ok to merge your change? [10:14:11] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet [10:14:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:05] PROBLEM - Check systemd state on poolcounter2003 is CRITICAL: CRITICAL - degraded: The following units failed: ifup@ens5.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:17:14] (03CR) 10Alexandros Kosiaris: [C: 04-1] "This pro" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/685748 (https://phabricator.wikimedia.org/T282148) (owner: 10Effie Mouzeli) [10:26:34] (03PS6) 10Hashar: [WMF] script to build our plugins [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684411 [10:28:43] (03CR) 10Muehlenhoff: [C: 03+2] Update a few comments related to jessie [puppet] - 10https://gerrit.wikimedia.org/r/689717 (owner: 10Muehlenhoff) [10:29:56] !log jiji@cumin1001 START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet [10:29:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:06] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet [10:31:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:32:05] !log volans@cumin2002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002 [10:32:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:32:53] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] profile:mariadb:core: Hack in access from labwebs to s6 [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [10:34:37] RECOVERY - Check systemd state on poolcounter2003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:39:19] (03PS1) 10Muehlenhoff: Remove/update more obsolete comments related to jessie [puppet] - 10https://gerrit.wikimedia.org/r/689747 [10:42:46] (03PS1) 10Muehlenhoff: poolcounter::client::python: Remove now obsolete check [puppet] - 10https://gerrit.wikimedia.org/r/689748 [10:43:49] (03PS2) 10Vgutierrez: acme_chief: Improve OCSPResponse error handling [software/acme-chief] - 10https://gerrit.wikimedia.org/r/689068 (https://phabricator.wikimedia.org/T282490) [10:46:32] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922 [10:46:33] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922 [10:46:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:46:36] T276922: cloudvirt1038: PCIe error - https://phabricator.wikimedia.org/T276922 [10:46:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:14] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/689747 (owner: 10Muehlenhoff) [10:52:30] (03PS1) 10Volans: Update debian bullseye frozen requirements [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/689754 [10:52:39] (03CR) 10Muehlenhoff: [C: 03+2] Remove/update more obsolete comments related to jessie [puppet] - 10https://gerrit.wikimedia.org/r/689747 (owner: 10Muehlenhoff) [10:53:35] (03PS1) 10Volans: python_deploy: refactor to not use --relocatable [puppet] - 10https://gerrit.wikimedia.org/r/689755 [10:54:11] (03CR) 10Volans: "Tested on cumin2002, homer works fine with the symlink to the venv." [puppet] - 10https://gerrit.wikimedia.org/r/689755 (owner: 10Volans) [10:59:25] (03PS1) 10Vgutierrez: Release 0.30 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/689756 (https://phabricator.wikimedia.org/T282490) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for European mid-day backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210512T1100). [11:00:04] kostajh and Zabe: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:12] I can deploy today [11:00:19] o/ [11:00:33] kostajh: let me know if i should merge your backports, or wait for more feedback in the slack discussion [11:01:04] hi [11:01:21] Urbanecm: I think you can merge them [11:01:26] (03PS2) 10Urbanecm: zhwikinews: Allow sysops to grant/revoke transwiki group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/671084 (https://phabricator.wikimedia.org/T273405) (owner: 10Gerrit Patch Uploader) [11:01:34] (03PS3) 10Urbanecm: zhwikinews: Allow sysops to grant/revoke transwiki group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/671084 (https://phabricator.wikimedia.org/T273405) (owner: 10Gerrit Patch Uploader) [11:01:44] okay, doing kostajh [11:01:47] (03CR) 10Urbanecm: [C: 03+2] AddLink: Auto-advance to save dialog on mobile with "No" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689540 (https://phabricator.wikimedia.org/T282424) (owner: 10Kosta Harlan) [11:01:49] (03CR) 10Urbanecm: [C: 03+2] Add a link: change how RecommendedLinkToolbarDialog determines when to update the current recommendation [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689541 (owner: 10Kosta Harlan) [11:01:51] (03CR) 10Urbanecm: [C: 03+2] Add a link: fix link inspector calculations for RTL [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689542 (https://phabricator.wikimedia.org/T282506) (owner: 10Kosta Harlan) [11:01:54] (03CR) 10Urbanecm: [C: 03+2] Add a link: explicitly set annotation icon's margin for mixed directionality [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689543 (https://phabricator.wikimedia.org/T282401) (owner: 10Kosta Harlan) [11:01:57] (03CR) 10Urbanecm: [C: 03+2] Add a link: open help panel's suggested-edits panel instead of home [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689544 (https://phabricator.wikimedia.org/T278488) (owner: 10Kosta Harlan) [11:02:00] (03CR) 10Urbanecm: [C: 03+2] zhwikinews: Allow sysops to grant/revoke transwiki group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/671084 (https://phabricator.wikimedia.org/T273405) (owner: 10Gerrit Patch Uploader) [11:02:22] (03Abandoned) 10Urbanecm: Allow sysop to add/remove transwiki for zhwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/660795 (https://phabricator.wikimedia.org/T273405) (owner: 10Hamish) [11:02:28] (03CR) 10Jbond: [C: 04-1] "-1 is for the git.diff error. other comments are just questions and suggestions." (0313 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/685721 (owner: 10Giuseppe Lavagetto) [11:03:26] (03Merged) 10jenkins-bot: zhwikinews: Allow sysops to grant/revoke transwiki group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/671084 (https://phabricator.wikimedia.org/T273405) (owner: 10Gerrit Patch Uploader) [11:04:32] Zabe: pulled onto mwdebug1001, please check [11:04:50] (03PS1) 10Ayounsi: Add Joanna to data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/689757 (https://phabricator.wikimedia.org/T282661) [11:05:09] (03CR) 10Jbond: [C: 04-1] "update, i just noticed i got errors when running `rake run_locally`" [deployment-charts] - 10https://gerrit.wikimedia.org/r/685721 (owner: 10Giuseppe Lavagetto) [11:05:32] (03PS1) 10Volans: sre.deploy.python-code: run as user the commands [cookbooks] - 10https://gerrit.wikimedia.org/r/689758 [11:05:41] (03CR) 10Ayounsi: [C: 03+2] Add Joanna to data.yaml [puppet] - 10https://gerrit.wikimedia.org/r/689757 (https://phabricator.wikimedia.org/T282661) (owner: 10Ayounsi) [11:05:45] Urbanecm: works the supposed way [11:06:21] thanks, syncing [11:09:25] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 9939edb27f8a43def7fefe1eae734b078dea003a: zhwikinews: Allow sysops to grant/revoke transwiki group (T273405) (duration: 02m 17s) [11:09:27] Zabe: should be live now [11:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:28] T273405: Make zhwikinews administrator obtains the grant/delete right of the cross-wiki importer - https://phabricator.wikimedia.org/T273405 [11:09:40] yes, thanks :) [11:11:36] (03CR) 10Muehlenhoff: [C: 03+2] poolcounter::client::python: Remove now obsolete check [puppet] - 10https://gerrit.wikimedia.org/r/689748 (owner: 10Muehlenhoff) [11:12:15] kostajh: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/689543 has failed test, https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-selenium-docker/50772/. Can you have a look [11:12:16] ? [11:12:22] Urbanecm: looking [11:13:32] Urbanecm: that looks like selenium being flaky. Could you recheck it please? [11:13:47] certainly. Thought so, just wanted to double confirm. [11:14:46] (03CR) 10Urbanecm: Add a link: explicitly set annotation icon's margin for mixed directionality [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689543 (https://phabricator.wikimedia.org/T282401) (owner: 10Kosta Harlan) [11:14:49] (03CR) 10Urbanecm: [C: 03+2] Add a link: explicitly set annotation icon's margin for mixed directionality [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689543 (https://phabricator.wikimedia.org/T282401) (owner: 10Kosta Harlan) [11:15:12] (03Merged) 10jenkins-bot: AddLink: Auto-advance to save dialog on mobile with "No" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689540 (https://phabricator.wikimedia.org/T282424) (owner: 10Kosta Harlan) [11:15:15] (03Merged) 10jenkins-bot: Add a link: change how RecommendedLinkToolbarDialog determines when to update the current recommendation [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689541 (owner: 10Kosta Harlan) [11:15:17] (03PS1) 10Ayounsi: Netbox: add alerting to the Netbox report [puppet] - 10https://gerrit.wikimedia.org/r/689759 [11:15:19] (03Merged) 10jenkins-bot: Add a link: fix link inspector calculations for RTL [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689542 (https://phabricator.wikimedia.org/T282506) (owner: 10Kosta Harlan) [11:15:21] (03CR) 10jerkins-bot: [V: 04-1] Add a link: explicitly set annotation icon's margin for mixed directionality [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689543 (https://phabricator.wikimedia.org/T282401) (owner: 10Kosta Harlan) [11:15:35] (03CR) 10Urbanecm: [C: 03+2] Add a link: explicitly set annotation icon's margin for mixed directionality [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689543 (https://phabricator.wikimedia.org/T282401) (owner: 10Kosta Harlan) [11:15:37] (03PS1) 10Cathal Mooney: Added cmooney user to Icinga conf [puppet] - 10https://gerrit.wikimedia.org/r/689760 [11:15:48] attempt #2 [11:16:31] (03PS1) 10Muehlenhoff: tendril: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/689761 [11:16:48] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/689761 (owner: 10Muehlenhoff) [11:16:57] (03CR) 10Cathal Mooney: "hey hope this looks ok." [puppet] - 10https://gerrit.wikimedia.org/r/689760 (owner: 10Cathal Mooney) [11:18:38] (03CR) 10Ayounsi: [C: 03+1] Added cmooney user to Icinga conf [puppet] - 10https://gerrit.wikimedia.org/r/689760 (owner: 10Cathal Mooney) [11:22:34] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15942 and previous config saved to /var/cache/conftool/dbconfig/20210512-112234-root.json [11:22:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:16] (03Merged) 10jenkins-bot: Add a link: open help panel's suggested-edits panel instead of home [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689544 (https://phabricator.wikimedia.org/T278488) (owner: 10Kosta Harlan) [11:25:55] (03CR) 10Cathal Mooney: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/689760 (owner: 10Cathal Mooney) [11:26:28] (03Merged) 10jenkins-bot: Add a link: explicitly set annotation icon's margin for mixed directionality [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689543 (https://phabricator.wikimedia.org/T282401) (owner: 10Kosta Harlan) [11:27:27] kostajh: everything should be at mwdebug1001, can you take a quick look? [11:27:43] (03CR) 10Marostegui: [C: 03+1] tendril: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/689761 (owner: 10Muehlenhoff) [11:27:52] Urbanecm: yep looking [11:27:58] thanks [11:30:20] Urbanecm: hmm, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/689544 is there for sure? [11:30:35] (03PS1) 10Muehlenhoff: Enable single-signout for tendril [puppet] - 10https://gerrit.wikimedia.org/r/689770 [11:30:37] and confirming that it's mwdebug1001? [11:30:48] let me check [11:31:03] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/689770 (owner: 10Muehlenhoff) [11:31:20] 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudnet2004-dev - https://phabricator.wikimedia.org/T275676 (10aborrero) 05Resolved→03Open hey @Papaul could you please clarify the secondary interface in netbox? I see this: {F34451716} I guess when i... [11:31:55] yes, that commit is at mwdebug1001 kostajh [11:33:16] (03CR) 10Muehlenhoff: [C: 03+2] tendril: Remove support for jessie [puppet] - 10https://gerrit.wikimedia.org/r/689761 (owner: 10Muehlenhoff) [11:33:28] (03PS1) 10Arturo Borrero Gonzalez: cloud: replace cloudnet2003-dev with cloudnet2004-dev [puppet] - 10https://gerrit.wikimedia.org/r/689777 (https://phabricator.wikimedia.org/T281381) [11:34:21] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloud: replace cloudnet2003-dev with cloudnet2004-dev [puppet] - 10https://gerrit.wikimedia.org/r/689777 (https://phabricator.wikimedia.org/T281381) (owner: 10Arturo Borrero Gonzalez) [11:34:33] Urbanecm: well, I don't see the expected behavior from that or from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/689540, which is odd [11:34:45] (03PS2) 10Muehlenhoff: Enable single-signout for tendril [puppet] - 10https://gerrit.wikimedia.org/r/689770 [11:34:53] i see it's JS only change. did you try with debug=1? [11:35:03] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/689770 (owner: 10Muehlenhoff) [11:36:03] (03CR) 10Kormat: [C: 04-1] profile:mariadb:core: Hack in access from labwebs to s6 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [11:36:52] I have my browser set to bypass cache but sure, let me check [11:37:20] iirc resource loader modules are cached on the server too, so it might take a while to actually send you the new ones [11:37:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15943 and previous config saved to /var/cache/conftool/dbconfig/20210512-113737-root.json [11:37:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:07] 10SRE: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff) [11:40:36] 10SRE: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff This is completed \o/ [11:40:56] Urbanecm: perhaps that's it. Anyway, given that this is all behind a feature flag and should not in theory impact user on growth experiments wikis, I think it's OK to sync. I'm still waiting for ?debug=1 RL to finish loading :) [11:41:00] (03CR) 10Cathal Mooney: "https://puppet-compiler.wmflabs.org/compiler1002/29525/" [puppet] - 10https://gerrit.wikimedia.org/r/689760 (owner: 10Cathal Mooney) [11:41:06] okay, syncing :) [11:43:08] 10SRE: Handle archival of jessie suite in Debian archive - https://phabricator.wikimedia.org/T257019 (10MoritzMuehlenhoff) 05Open→03Declined No longer needed. [11:43:33] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 6cc2530: c268d08: b89592e: 7620953: 8fd7610: GrowthExperiments backports (duration: 01m 17s) [11:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:38] here you go kostajh [11:43:40] anything else? [11:44:05] (03PS8) 10Kormat: mariadb: Convert pt-heartbeat to a systemd service. [puppet] - 10https://gerrit.wikimedia.org/r/665324 (https://phabricator.wikimedia.org/T252528) [11:44:29] (03PS1) 10Muehlenhoff: Remove obsolete Hiera file [puppet] - 10https://gerrit.wikimedia.org/r/689784 (https://phabricator.wikimedia.org/T282576) [11:44:31] (03PS1) 10Ladsgroup: prometheus: Migrate node_sge cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/689785 (https://phabricator.wikimedia.org/T273673) [11:44:37] Urbanecm: no, thank you! [11:44:43] any time :) [11:44:53] (03CR) 10Cathal Mooney: [C: 03+2] Added cmooney user to Icinga conf [puppet] - 10https://gerrit.wikimedia.org/r/689760 (owner: 10Cathal Mooney) [11:48:56] (03PS1) 10Jbond: install_server: add new installer to test raid0 configuration: [puppet] - 10https://gerrit.wikimedia.org/r/689786 (https://phabricator.wikimedia.org/T280382) [11:50:06] (03PS1) 10Ladsgroup: prometheus: Remove absented crons [puppet] - 10https://gerrit.wikimedia.org/r/689787 (https://phabricator.wikimedia.org/T273673) [11:50:48] (03CR) 10Jbond: "This looks like it might be generally useful, from what i have read it would only be harmful for servers with raid0 that was build < kerne" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/689525 (https://phabricator.wikimedia.org/T280382) (owner: 10Ryan Kemper) [11:50:55] (03CR) 10Jbond: [C: 04-1] wdqs: hack issue blocking reimage on some hosts [puppet] - 10https://gerrit.wikimedia.org/r/689525 (https://phabricator.wikimedia.org/T280382) (owner: 10Ryan Kemper) [11:51:46] (03PS2) 10Ladsgroup: prometheus: Migrate node_sge cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/689785 (https://phabricator.wikimedia.org/T273673) [11:52:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15944 and previous config saved to /var/cache/conftool/dbconfig/20210512-115242-root.json [11:52:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:54] (03PS2) 10Ladsgroup: prometheus: Remove absented crons [puppet] - 10https://gerrit.wikimedia.org/r/689787 (https://phabricator.wikimedia.org/T273673) [11:53:01] (03PS3) 10Ladsgroup: prometheus: Remove absented crons [puppet] - 10https://gerrit.wikimedia.org/r/689787 (https://phabricator.wikimedia.org/T273673) [11:54:00] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] prometheus: Migrate node_sge cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/689785 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [11:54:30] (03CR) 10Marostegui: [C: 03+1] "it should be fine by now, once we start using a bullseye host for our daily client operations this might need to be changed to 10.4 in cas" [puppet] - 10https://gerrit.wikimedia.org/r/686393 (https://phabricator.wikimedia.org/T276589) (owner: 10Jcrespo) [11:54:34] (03PS1) 10Matthias Mullie: Add MediaSearch assessment filter map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689788 (https://phabricator.wikimedia.org/T276257) [11:57:47] (03CR) 10Jbond: [C: 03+1] "this LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/665324 (https://phabricator.wikimedia.org/T252528) (owner: 10Kormat) [12:01:48] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/689770 (owner: 10Muehlenhoff) [12:02:20] (03CR) 10Jbond: [C: 03+1] "lgtm" [homer/public] - 10https://gerrit.wikimedia.org/r/689718 (owner: 10Ayounsi) [12:07:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15945 and previous config saved to /var/cache/conftool/dbconfig/20210512-120746-root.json [12:07:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P15946 and previous config saved to /var/cache/conftool/dbconfig/20210512-121004-marostegui.json [12:10:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:02] (03CR) 10Volans: "nit inline, LGTM otherwise" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/689759 (owner: 10Ayounsi) [12:13:11] (03PS2) 10Ayounsi: Netbox: add alerting to the Netbox report [puppet] - 10https://gerrit.wikimedia.org/r/689759 [12:13:53] (03CR) 10Ayounsi: [C: 03+1] "Great version numbers." [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/689754 (owner: 10Volans) [12:14:31] (03CR) 10Ayounsi: [C: 03+2] Netbox: add alerting to the Netbox report (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/689759 (owner: 10Ayounsi) [12:15:08] (03CR) 10Volans: [V: 03+2 C: 03+2] Update debian bullseye frozen requirements [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/689754 (owner: 10Volans) [12:15:44] (03PS1) 10Kosta Harlan: Add Link: refine exclusion rules for finding link text matches [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689545 (https://phabricator.wikimedia.org/T267694) [12:18:20] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=netbox_device_statistics site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:20:37] (03PS1) 10Volans: sre.hosts.decommission: add repo to search list [cookbooks] - 10https://gerrit.wikimedia.org/r/689797 (https://phabricator.wikimedia.org/T281314) [12:24:13] PROBLEM - HTTPS-policy on policy.wikimedia.org is CRITICAL: SSL CRITICAL - Certificate policy.wikimedia.org expired https://phabricator.wikimedia.org/tag/wmf-legal/ [12:26:59] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/689755 (owner: 10Volans) [12:27:23] (03CR) 10Volans: [C: 03+2] python_deploy: refactor to not use --relocatable [puppet] - 10https://gerrit.wikimedia.org/r/689755 (owner: 10Volans) [12:32:23] (03CR) 10Jbond: [C: 03+1] "CR lgtm but not familiar enough with deploy to comment on the correct user" [cookbooks] - 10https://gerrit.wikimedia.org/r/689758 (owner: 10Volans) [12:36:44] (03CR) 10Volans: "> Patch Set 1: Code-Review+1" [cookbooks] - 10https://gerrit.wikimedia.org/r/689758 (owner: 10Volans) [12:38:40] (03PS2) 10Elukey: prometheus: Migrate node_varnishd_mmap_count cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/689715 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [12:39:48] (03PS7) 10Hashar: [WMF] script to build our plugins [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684411 [12:41:16] (03CR) 10jerkins-bot: [V: 04-1] [WMF] script to build our plugins [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684411 (owner: 10Hashar) [12:42:01] !log aborrero@cumin2001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE [12:42:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:48] 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for JStephenson1980 - https://phabricator.wikimedia.org/T282521 (10JStephenson) My contract contact person is Kassia Echavarri-Queen kechavarriqueen@wikimedia.org (Community Resources Director) [12:44:23] !log aborrero@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE [12:44:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15947 and previous config saved to /var/cache/conftool/dbconfig/20210512-124736-root.json [12:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:50:00] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:52:56] (03PS1) 10Arturo Borrero Gonzalez: openstack: neutron: more cloudgw cleanups [puppet] - 10https://gerrit.wikimedia.org/r/689831 (https://phabricator.wikimedia.org/T270704) [12:54:23] (03CR) 10jerkins-bot: [V: 04-1] openstack: neutron: more cloudgw cleanups [puppet] - 10https://gerrit.wikimedia.org/r/689831 (https://phabricator.wikimedia.org/T270704) (owner: 10Arturo Borrero Gonzalez) [12:55:26] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC: https://puppet-compiler.wmflabs.org/compiler1001/29526/" [puppet] - 10https://gerrit.wikimedia.org/r/689831 (https://phabricator.wikimedia.org/T270704) (owner: 10Arturo Borrero Gonzalez) [12:56:14] (03PS2) 10Arturo Borrero Gonzalez: openstack: neutron: more cloudgw cleanups [puppet] - 10https://gerrit.wikimedia.org/r/689831 (https://phabricator.wikimedia.org/T270704) [12:56:22] 10SRE, 10DBA, 10Traffic: dbtree.wm.o stopped working after enforcing Puppet CA issued certs for ATS backend origin servers - https://phabricator.wikimedia.org/T282531 (10jbond) 05Open→03Resolved a:05Vgutierrez→03jbond This is working again [12:56:51] (03CR) 10Jbond: [C: 03+2] O:debmonitor::server: Switch debmonitor.wikimedia.org ssl to cfssl [puppet] - 10https://gerrit.wikimedia.org/r/685576 (https://phabricator.wikimedia.org/T281673) (owner: 10Jbond) [12:57:59] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: neutron: more cloudgw cleanups [puppet] - 10https://gerrit.wikimedia.org/r/689831 (https://phabricator.wikimedia.org/T270704) (owner: 10Arturo Borrero Gonzalez) [12:58:33] (03CR) 10Volans: [C: 03+2] sre.deploy.python-code: run as user the commands [cookbooks] - 10https://gerrit.wikimedia.org/r/689758 (owner: 10Volans) [13:00:00] 10SRE, 10DBA, 10Traffic: dbtree.wm.o stopped working after enforcing Puppet CA issued certs for ATS backend origin servers - https://phabricator.wikimedia.org/T282531 (10Marostegui) Thank you both! [13:02:15] (03Merged) 10jenkins-bot: sre.deploy.python-code: run as user the commands [cookbooks] - 10https://gerrit.wikimedia.org/r/689758 (owner: 10Volans) [13:02:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15948 and previous config saved to /var/cache/conftool/dbconfig/20210512-130239-root.json [13:02:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:58] !log volans@cumin2002 START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002 [13:06:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:46] !log volans@cumin2002 END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002 [13:06:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:04] RECOVERY - HTTPS-policy on policy.wikimedia.org is OK: SSL OK - Certificate policy.wikimedia.org valid until 2021-07-13 21:55:16 +0000 (expires in 62 days) https://phabricator.wikimedia.org/tag/wmf-legal/ [13:13:19] !log aborrero@cumin2001 START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE [13:13:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:23] !log aborrero@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE [13:15:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15949 and previous config saved to /var/cache/conftool/dbconfig/20210512-131745-root.json [13:17:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:20] (03CR) 10Elukey: [C: 03+2] prometheus: Migrate node_varnishd_mmap_count cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/689715 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [13:21:54] (03CR) 10RLazarus: [V: 03+1] openldap: Convert the weekday cross-validate-accounts from cron to systemd. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/688423 (owner: 10RLazarus) [13:23:33] (03CR) 10RLazarus: [C: 03+1] "🎉" [puppet] - 10https://gerrit.wikimedia.org/r/689686 (https://phabricator.wikimedia.org/T224549) (owner: 10Volans) [13:24:16] (03CR) 10Muehlenhoff: [C: 03+2] Enable single-signout for tendril [puppet] - 10https://gerrit.wikimedia.org/r/689770 (owner: 10Muehlenhoff) [13:26:18] (03CR) 10Volans: "reply inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/688423 (owner: 10RLazarus) [13:26:27] (03PS1) 10Cathal Mooney: Changed username in Icinga cgi.cfg [puppet] - 10https://gerrit.wikimedia.org/r/689854 [13:31:02] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good. These are passed raw to the Icinga CGIs and they expect your CN in the exact capitalisation found in that config file. (Which " [puppet] - 10https://gerrit.wikimedia.org/r/689854 (owner: 10Cathal Mooney) [13:32:03] (03CR) 10Muehlenhoff: "Confirmed to work as expected with tendril" [puppet] - 10https://gerrit.wikimedia.org/r/689770 (owner: 10Muehlenhoff) [13:32:49] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15950 and previous config saved to /var/cache/conftool/dbconfig/20210512-133248-root.json [13:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:34] (03CR) 10Cathal Mooney: "If you wouldn't mind giving this a +1 Arzhel thanks." [puppet] - 10https://gerrit.wikimedia.org/r/689854 (owner: 10Cathal Mooney) [13:34:30] (03PS8) 10Hashar: [WMF] script to build our plugins [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684411 [13:35:52] (03CR) 10Cathal Mooney: "https://puppet-compiler.wmflabs.org/compiler1001/29527/" [puppet] - 10https://gerrit.wikimedia.org/r/689854 (owner: 10Cathal Mooney) [13:36:24] (03PS1) 10Muehlenhoff: Remove unused legacy service aliases [puppet] - 10https://gerrit.wikimedia.org/r/689857 [13:36:36] (03PS2) 10Muehlenhoff: Remove unused legacy service aliases [puppet] - 10https://gerrit.wikimedia.org/r/689857 [13:36:54] (03CR) 10jerkins-bot: [V: 04-1] [WMF] script to build our plugins [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/684411 (owner: 10Hashar) [13:37:16] (03CR) 10Ayounsi: [C: 03+1] Changed username in Icinga cgi.cfg [puppet] - 10https://gerrit.wikimedia.org/r/689854 (owner: 10Cathal Mooney) [13:41:57] (03CR) 10Cathal Mooney: [C: 03+2] Changed username in Icinga cgi.cfg [puppet] - 10https://gerrit.wikimedia.org/r/689854 (owner: 10Cathal Mooney) [13:44:23] (03PS1) 10Hashar: download_bower: download to GERRIT_CACHE_HOME when set [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/689859 [13:44:26] 10SRE, 10GitLab (Initialization), 10Release-Engineering-Team (Doing), 10User-brennen: Define auth strategy for GitLab - https://phabricator.wikimedia.org/T274461 (10jbond) > with EE its possible we could have a hybrid set up with SSO used for user authentication and ldap used for group syncing Just wanted... [13:46:09] (03PS2) 10RLazarus: openldap: Convert the weekday cross-validate-accounts from cron to systemd. [puppet] - 10https://gerrit.wikimedia.org/r/688423 [13:46:38] (03CR) 10jerkins-bot: [V: 04-1] download_bower: download to GERRIT_CACHE_HOME when set [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/689859 (owner: 10Hashar) [13:46:43] (03CR) 10RLazarus: openldap: Convert the weekday cross-validate-accounts from cron to systemd. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/688423 (owner: 10RLazarus) [13:48:22] (03PS2) 10Ottomata: refine - finalize WikidataCompletionSearchClicks migration to event platform [puppet] - 10https://gerrit.wikimedia.org/r/685911 (https://phabricator.wikimedia.org/T282140) [13:48:24] (03PS1) 10Ottomata: refine - Ensure mediawiki_job refine is absent [puppet] - 10https://gerrit.wikimedia.org/r/689866 (https://phabricator.wikimedia.org/T281605) [13:49:24] (03PS2) 10Ottomata: refine - Ensure mediawiki_job refine is absent [puppet] - 10https://gerrit.wikimedia.org/r/689866 (https://phabricator.wikimedia.org/T281605) [13:49:31] (03CR) 10Volans: [C: 03+1] "LGTM, thanks" [puppet] - 10https://gerrit.wikimedia.org/r/688423 (owner: 10RLazarus) [13:51:46] (03CR) 10Ottomata: [C: 03+2] refine - Ensure mediawiki_job refine is absent [puppet] - 10https://gerrit.wikimedia.org/r/689866 (https://phabricator.wikimedia.org/T281605) (owner: 10Ottomata) [13:53:32] XioNoX: ok to puppet merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/689854/ ? [13:53:39] would ping cathal mooney but i don't know IRC nick! :) [13:54:16] it looks ok so i'm puppet merging just FYI [13:54:26] ottomata: yep [13:54:35] topranks: ^ [13:55:09] ottomata: Hi it's me Cathal. [13:55:20] (03PS1) 10Marostegui: dbproxy1019: Depool clouddb1014 for upgrade. [puppet] - 10https://gerrit.wikimedia.org/r/689868 (https://phabricator.wikimedia.org/T277867) [13:55:22] hello its me andrew otto! nice to meetcha [13:55:27] sorry I was delayed between submitting CR and doing the merge. [13:55:34] s'ok happens all the time, just double checeking [13:55:41] by the time I tried to do the puppet-merge yours was running. [13:55:45] :) [13:55:56] should just be 4 lines changed in cgi.cfg for Icinga. [13:56:01] (03CR) 10Marostegui: [C: 03+2] dbproxy1019: Depool clouddb1014 for upgrade. [puppet] - 10https://gerrit.wikimedia.org/r/689868 (https://phabricator.wikimedia.org/T277867) (owner: 10Marostegui) [13:56:20] Nice to meet you too :) [13:57:06] !log uploaded wmfmariadbpy 0.6.1 for bullseye [13:57:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:51] (03PS1) 10Ottomata: refine - Remove absented refine_mediawiki_job [puppet] - 10https://gerrit.wikimedia.org/r/689870 (https://phabricator.wikimedia.org/T281605) [13:58:23] (03PS1) 10Marostegui: Revert "dbproxy1019: Depool clouddb1014 for upgrade." [puppet] - 10https://gerrit.wikimedia.org/r/689886 [13:59:03] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1019: Depool clouddb1014 for upgrade." [puppet] - 10https://gerrit.wikimedia.org/r/689886 (owner: 10Marostegui) [13:59:31] (03CR) 10Ottomata: [C: 03+2] refine - Remove absented refine_mediawiki_job [puppet] - 10https://gerrit.wikimedia.org/r/689870 (https://phabricator.wikimedia.org/T281605) (owner: 10Ottomata) [14:01:27] !log Upgraded mysql on clouddb1014 [14:01:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:00] (03PS3) 10Ottomata: refine - finalize WikidataCompletionSearchClicks migration to event platform [puppet] - 10https://gerrit.wikimedia.org/r/685911 (https://phabricator.wikimedia.org/T282140) [14:02:19] !log Upgrad mysql on clouddb1015 [14:02:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:28] (03PS1) 10Marostegui: dbproxy1019: Depool clouddb1015 [puppet] - 10https://gerrit.wikimedia.org/r/689877 (https://phabricator.wikimedia.org/T277867) [14:03:32] (03CR) 10Marostegui: [C: 03+2] dbproxy1019: Depool clouddb1015 [puppet] - 10https://gerrit.wikimedia.org/r/689877 (https://phabricator.wikimedia.org/T277867) (owner: 10Marostegui) [14:03:34] (03PS1) 10Urbanecm: enwiki: Growth features: Change help panel links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689878 (https://phabricator.wikimedia.org/T281896) [14:03:54] (03PS1) 10Marostegui: Revert "dbproxy1019: Depool clouddb1015" [puppet] - 10https://gerrit.wikimedia.org/r/689887 [14:04:07] (03CR) 10Ottomata: [C: 03+2] refine - finalize WikidataCompletionSearchClicks migration to event platform [puppet] - 10https://gerrit.wikimedia.org/r/685911 (https://phabricator.wikimedia.org/T282140) (owner: 10Ottomata) [14:04:28] (03CR) 10Giuseppe Lavagetto: Add canary support in scaffolding (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/685748 (https://phabricator.wikimedia.org/T282148) (owner: 10Effie Mouzeli) [14:04:57] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1019: Depool clouddb1015" [puppet] - 10https://gerrit.wikimedia.org/r/689887 (owner: 10Marostegui) [14:08:49] (03CR) 10Hashar: "recheck with job setting XDG_DATA_HOME=/tmp/data" [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/689859 (owner: 10Hashar) [14:13:25] 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti202[56] - https://phabricator.wikimedia.org/T282603 (10MoritzMuehlenhoff) [14:14:50] 10SRE, 10Commons, 10MediaWiki-Uploading, 10Structured Data Engineering, and 3 others: Various errors when trying to upload large files (Could not acquire lock, Service Temporarily Unavailable, 503 Backend fetch failed, 502 Next Hop Connection Failed) - https://phabricator.wikimedia.org/T280926 (10CBogen) [14:16:21] (03PS1) 10Zabe: Disable Education Program namespaces in cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689883 (https://phabricator.wikimedia.org/T282691) [14:19:41] (03PS4) 10Andrew Bogott: profile:mariadb:core: Hack in access from labwebs to s6 [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) [14:23:01] (03CR) 10Andrew Bogott: "https://puppet-compiler.wmflabs.org/compiler1001/29528/" [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [14:29:24] 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti202[56] - https://phabricator.wikimedia.org/T282603 (10RobH) a:05MoritzMuehlenhoff→03Papaul [14:29:27] 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti202[56] - https://phabricator.wikimedia.org/T282603 (10RobH) a:05Papaul→03Jclark-ctr [14:30:10] 10SRE, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install ganeti202[56] - https://phabricator.wikimedia.org/T282603 (10RobH) a:05Jclark-ctr→03Papaul [14:31:08] (03CR) 10Giuseppe Lavagetto: Add diff tasks to rake (0312 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/685721 (owner: 10Giuseppe Lavagetto) [14:36:45] (03CR) 10Giuseppe Lavagetto: "> Patch Set 5:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/685721 (owner: 10Giuseppe Lavagetto) [14:36:47] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/689857 (owner: 10Muehlenhoff) [14:40:28] (03PS5) 10Kormat: profile:mariadb:core: Hack in access from labwebs to s6 [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [14:41:23] 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudnet2004-dev - https://phabricator.wikimedia.org/T275676 (10aborrero) [14:41:54] (03CR) 10jerkins-bot: [V: 04-1] profile:mariadb:core: Hack in access from labwebs to s6 [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [14:45:04] !log aborrero@cumin2001 START - Cookbook sre.hosts.decommission for hosts cloudnet2003-dev.codfw.wmnet [14:45:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:53] (03PS6) 10Kormat: profile:mariadb:core: Hack in access from labwebs to s6 [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [14:54:56] (03CR) 10Ottomata: [C: 03+1] "We should merge this, yes?" [puppet] - 10https://gerrit.wikimedia.org/r/677822 (owner: 10Elukey) [14:55:01] (03PS1) 10Mforns: analytics:refinery:job:data_purge: remove -skipTrash from drop_event [puppet] - 10https://gerrit.wikimedia.org/r/689925 (https://phabricator.wikimedia.org/T273789) [14:55:20] (03CR) 10Elukey: [V: 03+1] "Yep!" [puppet] - 10https://gerrit.wikimedia.org/r/677822 (owner: 10Elukey) [14:56:21] (03CR) 10jerkins-bot: [V: 04-1] profile:mariadb:core: Hack in access from labwebs to s6 [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [14:57:49] (03CR) 10Zabe: [C: 04-1] "still in discussion" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689321 (https://phabricator.wikimedia.org/T282624) (owner: 10Zabe) [14:59:33] (03CR) 10Ebernhardson: "Certainly fine for relforge, this kind of limit protects a system that gets arbitrary user queries but relforge doesn't. Separately, no s" [puppet] - 10https://gerrit.wikimedia.org/r/688309 (owner: 10ZPapierski) [15:00:42] (03PS7) 10Kormat: profile:mariadb:core: Hack in access from labwebs to s6 [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [15:04:52] (03PS8) 10Kormat: profile:mariadb:core: Hack in access from labwebs to s6 [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [15:07:44] (03CR) 10Kormat: [V: 03+1] "PCC SUCCESS (DIFF 3 NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29531/console" [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [15:08:59] (03CR) 10Kormat: [V: 03+1 C: 03+1] "Ok, this LGTM now :)" [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [15:09:25] (03CR) 10Legoktm: [C: 03+1] sre.hosts.decommission: add repo to search list [cookbooks] - 10https://gerrit.wikimedia.org/r/689797 (https://phabricator.wikimedia.org/T281314) (owner: 10Volans) [15:12:15] (03PS2) 10Volans: sre.hosts.decommission: add repo to search list [cookbooks] - 10https://gerrit.wikimedia.org/r/689797 (https://phabricator.wikimedia.org/T281314) [15:14:15] (03CR) 10RLazarus: [C: 03+2] openldap: Convert the weekday cross-validate-accounts from cron to systemd. [puppet] - 10https://gerrit.wikimedia.org/r/688423 (owner: 10RLazarus) [15:14:39] (03CR) 10Hashar: "recheck after fixing the archiving config" [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/689859 (owner: 10Hashar) [15:15:55] (03CR) 10Kosta Harlan: [C: 03+1] linkrecommendation: Match gunicorn status code in statsd [deployment-charts] - 10https://gerrit.wikimedia.org/r/685788 (owner: 10Alexandros Kosiaris) [15:16:43] 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Schema-change, 10User-notice: Mailman3 schema change: change utf8 columns to utf8mb4 - https://phabricator.wikimedia.org/T282621 (10LSobanski) a:03Marostegui Assigning to Manuel for review, can be moved to Blocked afterwards until the announcement goes out. [15:17:34] (03PS6) 10Giuseppe Lavagetto: Add diff tasks to rake [deployment-charts] - 10https://gerrit.wikimedia.org/r/685721 [15:17:36] (03PS2) 10Giuseppe Lavagetto: Rakefile: split more of it into submodules [deployment-charts] - 10https://gerrit.wikimedia.org/r/688265 [15:21:07] (03CR) 10Jdlrobson: "Reverts gonna cause some more issues for us in Vector. I have an alternative patch coming." [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689532 (https://phabricator.wikimedia.org/T276561) (owner: 10Legoktm) [15:24:35] (03CR) 10Jdlrobson: [C: 04-1] Revert "Create buildPersonalPage method for SkinTemplate class, add menu item to personal menu.." [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689532 (https://phabricator.wikimedia.org/T276561) (owner: 10Legoktm) [15:26:20] (03PS1) 10Jdlrobson: Modern keys must be unset [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689889 (https://phabricator.wikimedia.org/T282646) [15:26:43] (03PS2) 10Mforns: Migrate VirtualPageView to EventPlatform on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689205 (https://phabricator.wikimedia.org/T238138) [15:26:58] (03CR) 10Jdlrobson: [C: 04-1] "Let's backport https://gerrit.wikimedia.org/r/c/mediawiki/core/+/689889 instead so we don't break the testing planned for Vector this week" [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689532 (https://phabricator.wikimedia.org/T276561) (owner: 10Legoktm) [15:27:20] (03PS3) 10Mforns: Migrate VirtualPageView to EventPlatform on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689205 (https://phabricator.wikimedia.org/T238138) [15:29:56] 10SRE: Provide a pxe-bootable rescue image - https://phabricator.wikimedia.org/T78135 (10fgiunchedi) [15:29:56] (03PS4) 10Mforns: Migrate VirtualPageView to EventPlatform on group 0 and 1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689205 (https://phabricator.wikimedia.org/T238138) [15:30:08] 10SRE, 10Continuous-Integration-Infrastructure, 10observability, 10Goal, 10Release-Engineering-Team (Seen): Add Prometheus exporter to Jenkins instances - https://phabricator.wikimedia.org/T182759 (10fgiunchedi) [15:44:55] 10SRE, 10Wikimedia-Mailing-lists: 'Held Unsubscriptions' keeps sending email notifications in Mailman3 - https://phabricator.wikimedia.org/T282319 (10Legoktm) @Ciell can you forward me one of the emails you're getting about the held unsubscription? legoktm@wikimedia.org [15:44:57] 10SRE, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo - https://phabricator.wikimedia.org/T282589 (10Elitre) >>! In T282589#7081117, @Aklapper wrote: > It would be most helpful to have only one request per one ticket. See https://wikitech.wik... [15:47:31] (03CR) 10Volans: [C: 03+2] sre.hosts.decommission: add repo to search list [cookbooks] - 10https://gerrit.wikimedia.org/r/689797 (https://phabricator.wikimedia.org/T281314) (owner: 10Volans) [15:51:12] !log aborrero@cumin2001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudnet2003-dev.codfw.wmnet [15:51:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:54] (03Merged) 10jenkins-bot: sre.hosts.decommission: add repo to search list [cookbooks] - 10https://gerrit.wikimedia.org/r/689797 (https://phabricator.wikimedia.org/T281314) (owner: 10Volans) [15:53:53] (03CR) 10Andrew Bogott: [C: 03+1] "lgtm and sorry this was so much trouble" [puppet] - 10https://gerrit.wikimedia.org/r/689092 (https://phabricator.wikimedia.org/T282209) (owner: 10Andrew Bogott) [15:55:32] 10SRE, 10Wikimedia-Mailing-lists: 'Held Unsubscriptions' keeps sending email notifications in Mailman3 - https://phabricator.wikimedia.org/T282319 (10Ciell) Forwarded an example reminder. [15:56:39] (03PS1) 10Giuseppe Lavagetto: Builder: use the full image tag, not just the name when pulling [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/689945 [15:58:49] (03CR) 10Hashar: "recheck XDG_DATA_HOME set in the image" [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/689859 (owner: 10Hashar) [15:59:25] (03CR) 10jerkins-bot: [V: 04-1] Builder: use the full image tag, not just the name when pulling [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/689945 (owner: 10Giuseppe Lavagetto) [16:00:08] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Zotero and citoid alive) is CRITICAL: Test Zotero and citoid alive returned the unexpected status 503 (expecting: 200) https://wikitech.wikimedia.org/wiki/Citoid [16:00:34] 10SRE, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo - https://phabricator.wikimedia.org/T282589 (10Keegan) Keegan [16:02:34] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [16:07:49] (03PS1) 10Arturo Borrero Gonzalez: cloudnet2003-dev: decomission [puppet] - 10https://gerrit.wikimedia.org/r/689948 (https://phabricator.wikimedia.org/T282696) [16:09:22] (03PS1) 10Ahmon Dancy: docker_registry_ha: Rename style.css to registry-homepage-builder.css [puppet] - 10https://gerrit.wikimedia.org/r/689949 [16:09:39] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudnet2003-dev: decomission [puppet] - 10https://gerrit.wikimedia.org/r/689948 (https://phabricator.wikimedia.org/T282696) (owner: 10Arturo Borrero Gonzalez) [16:10:37] (03PS1) 10Hnowlan: New envoy upstream version 1.15.5 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/689950 [16:15:08] !log including envoyproxy_1.15.5-1_amd64.changes with reprepro [16:15:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:04] (03PS1) 10RLazarus: openldap: Exit status 1 from cross-validate-accounts if errors are found. [puppet] - 10https://gerrit.wikimedia.org/r/689951 [16:17:38] (03CR) 10jerkins-bot: [V: 04-1] openldap: Exit status 1 from cross-validate-accounts if errors are found. [puppet] - 10https://gerrit.wikimedia.org/r/689951 (owner: 10RLazarus) [16:18:45] (03PS2) 10RLazarus: openldap: Exit status 1 from cross-validate-accounts if errors are found. [puppet] - 10https://gerrit.wikimedia.org/r/689951 [16:21:00] 10SRE, 10Wikimedia-Mailing-lists, 10Upstream: Mailman 3: per-list language preferences don't work - https://phabricator.wikimedia.org/T282279 (10Legoktm) Filed upstream as https://gitlab.com/mailman/postorius/-/issues/522 >>! In T282279#7071843, @Tgr wrote: > Is the PATCH request internal? In the browser, I... [16:22:15] (03CR) 10Legoktm: [C: 03+2] "PCC shows a no-op: https://puppet-compiler.wmflabs.org/compiler1003/29532/registry2003.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/689949 (owner: 10Ahmon Dancy) [16:23:54] 10SRE, 10GitLab (Initialization), 10Release-Engineering-Team (Doing), 10User-brennen: Define auth strategy for GitLab - https://phabricator.wikimedia.org/T274461 (10brennen) > Just wanted to make a comment that SAML syncing may also be possible directly in EE Just for clarity: CE is a hard requirement. >... [16:25:39] 10SRE, 10DBA, 10Wikimedia-Mailing-lists, 10Schema-change, 10User-notice: Mailman3 schema change: change utf8 columns to utf8mb4 - https://phabricator.wikimedia.org/T282621 (10Legoktm) >>! In T282621#7080773, @Ladsgroup wrote: > Wouldn't a message in tech news be enough? tbh unless we expect it to be lik... [16:37:09] (03CR) 10Cwhite: [C: 03+2] logstash: collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/686803 (https://phabricator.wikimedia.org/T281507) (owner: 10Cwhite) [16:38:17] (03PS1) 10Herron: wip [puppet] - 10https://gerrit.wikimedia.org/r/689977 [16:39:39] (03PS1) 10Ahmon Dancy: registry-homepage-builder.py: Add footer identifying the generating program [puppet] - 10https://gerrit.wikimedia.org/r/689980 [16:40:09] (03CR) 10jerkins-bot: [V: 04-1] registry-homepage-builder.py: Add footer identifying the generating program [puppet] - 10https://gerrit.wikimedia.org/r/689980 (owner: 10Ahmon Dancy) [16:42:09] (03PS2) 10Ahmon Dancy: registry-homepage-builder.py: Add footer identifying the generating program [puppet] - 10https://gerrit.wikimedia.org/r/689980 [16:42:39] (03CR) 10jerkins-bot: [V: 04-1] registry-homepage-builder.py: Add footer identifying the generating program [puppet] - 10https://gerrit.wikimedia.org/r/689980 (owner: 10Ahmon Dancy) [16:43:20] (03PS3) 10Ahmon Dancy: registry-homepage-builder.py: Add footer identifying the generating program [puppet] - 10https://gerrit.wikimedia.org/r/689980 [16:43:33] Cursed whitespace complaints! [16:48:04] (03PS1) 10Cwhite: Revert "logstash: collect kaios_app.error stream into logstash clienterror input" [puppet] - 10https://gerrit.wikimedia.org/r/689891 [16:48:15] (03CR) 10Cwhite: [V: 03+2 C: 03+2] Revert "logstash: collect kaios_app.error stream into logstash clienterror input" [puppet] - 10https://gerrit.wikimedia.org/r/689891 (owner: 10Cwhite) [16:48:54] 10SRE, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo (deadline EOD Monday 17) - https://phabricator.wikimedia.org/T282589 (10Elitre) [16:49:12] 10SRE, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo (deadline EOD Monday 17) - https://phabricator.wikimedia.org/T282589 (10Elitre) [16:49:47] 10SRE, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo (deadline EOD Monday 17) - https://phabricator.wikimedia.org/T282589 (10Elitre) [16:51:29] (03CR) 10Cwhite: "Error: Invalid topics: [codfw\.mediawiki\.client\.error|codfw\.kaios_app\.error]" [puppet] - 10https://gerrit.wikimedia.org/r/689891 (owner: 10Cwhite) [16:52:29] 10SRE, 10Analytics, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo (deadline EOD Monday 17) - https://phabricator.wikimedia.org/T282589 (10elukey) [16:56:44] (03PS1) 10Cwhite: logstash: collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/689986 (https://phabricator.wikimedia.org/T281507) [16:58:52] (03CR) 10Cwhite: [C: 03+2] logstash: collect kaios_app.error stream into logstash clienterror input [puppet] - 10https://gerrit.wikimedia.org/r/689986 (https://phabricator.wikimedia.org/T281507) (owner: 10Cwhite) [17:00:04] (03Abandoned) 10Jdlrobson: Revert "Create buildPersonalPage method for SkinTemplate class, add menu item to personal menu.." [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689532 (https://phabricator.wikimedia.org/T276561) (owner: 10Legoktm) [17:05:02] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1001/29535/" [puppet] - 10https://gerrit.wikimedia.org/r/689977 (https://phabricator.wikimedia.org/T281266) (owner: 10Herron) [17:05:41] (03CR) 10Jbond: "LGTM however I'm still not confident the splat is right. (feel free to just tell me it is and move on ;))" (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/685721 (owner: 10Giuseppe Lavagetto) [17:07:14] (03PS1) 10Zabe: Update wordmark and tagline for kawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689987 (https://phabricator.wikimedia.org/T278251) [17:08:38] (03CR) 10Jbond: Add diff tasks to rake (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/685721 (owner: 10Giuseppe Lavagetto) [17:09:35] (03CR) 10Volans: [C: 03+1] "I didn't test it but LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/689951 (owner: 10RLazarus) [17:09:55] (03CR) 10Herron: [C: 03+2] mail: move default mail relay config out of standard module [puppet] - 10https://gerrit.wikimedia.org/r/686633 (https://phabricator.wikimedia.org/T232343) (owner: 10Herron) [17:11:18] (03CR) 10RLazarus: [C: 03+2] openldap: Exit status 1 from cross-validate-accounts if errors are found. [puppet] - 10https://gerrit.wikimedia.org/r/689951 (owner: 10RLazarus) [17:13:31] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/689857 (owner: 10Muehlenhoff) [17:16:31] (03PS1) 10Volans: various: use git -C instead of cd && git [cookbooks] - 10https://gerrit.wikimedia.org/r/689988 [17:16:33] 10SRE, 10GitLab (Initialization), 10Release-Engineering-Team (Doing), 10User-brennen: Define auth strategy for GitLab - https://phabricator.wikimedia.org/T274461 (10jbond) >>! In T274461#7082643, @brennen wrote: >> Just wanted to make a comment that SAML syncing may also be possible directly in EE > > Jus... [17:18:25] (03CR) 10Jbond: [C: 03+1] "lgtm" [cookbooks] - 10https://gerrit.wikimedia.org/r/689988 (owner: 10Volans) [17:21:34] PROBLEM - Check systemd state on mwmaint1002 is CRITICAL: CRITICAL - degraded: The following units failed: daily_account_consistency_check.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:22:09] (03PS1) 10Herron: logstash: add logstash101[012] to elk7 cluster as ES backends [puppet] - 10https://gerrit.wikimedia.org/r/689994 (https://phabricator.wikimedia.org/T281266) [17:23:55] (03PS5) 10Ottomata: Migrate VirtualPageView to EventPlatform on group 0 and 1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689205 (https://phabricator.wikimedia.org/T238138) (owner: 10Mforns) [17:27:32] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1002/29536/" [puppet] - 10https://gerrit.wikimedia.org/r/689994 (https://phabricator.wikimedia.org/T281266) (owner: 10Herron) [17:29:11] (03CR) 10Herron: [C: 03+2] "good catch thanks" [puppet] - 10https://gerrit.wikimedia.org/r/689784 (https://phabricator.wikimedia.org/T282576) (owner: 10Muehlenhoff) [17:32:19] (03CR) 10Herron: [C: 03+1] "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/689160 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [17:37:16] jouncebot: next [17:37:17] In 0 hour(s) and 22 minute(s): Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210512T1800) [17:37:17] In 0 hour(s) and 22 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210512T1800) [17:41:03] (03PS3) 10DCausse: [WIP] rdf-streaming-updater application mode experiment [deployment-charts] - 10https://gerrit.wikimedia.org/r/686550 [18:00:04] dancy and brennen: That opportune time is upon us again. Time for a Train log triage with CPT deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210512T1800). [18:00:04] RoanKattouw, Niharika, and Urbanecm: How many deployers does it take to do Morning backport window deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210512T1800). [18:00:04] Tgr, jdlrobson, Urbanecm, and Zabe: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:13] i can deploy today! [18:00:14] o/ [18:00:18] o/ [18:00:22] Jdlrobson: around? [18:00:36] (03CR) 10Urbanecm: [C: 03+2] Add Link: refine exclusion rules for finding link text matches [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689545 (https://phabricator.wikimedia.org/T267694) (owner: 10Kosta Harlan) [18:01:11] (03CR) 10Urbanecm: [C: 03+2] enwiki: Growth features: Change help panel links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689878 (https://phabricator.wikimedia.org/T281896) (owner: 10Urbanecm) [18:01:42] 10SRE, 10ops-codfw, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudnet2004-dev - https://phabricator.wikimedia.org/T275676 (10wiki_willy) Hi @aborrero - just a heads up, Papaul is on paternity leave and is scheduled to return on the 24th. Thanks, Willy [18:02:29] (03CR) 10Urbanecm: [C: 04-1] "Can you run the SVGs through an optimizer please?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689987 (https://phabricator.wikimedia.org/T278251) (owner: 10Zabe) [18:02:59] (03Merged) 10jenkins-bot: enwiki: Growth features: Change help panel links [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689878 (https://phabricator.wikimedia.org/T281896) (owner: 10Urbanecm) [18:03:17] Urbanecm: present [18:03:20] meeting over ran :) [18:03:21] thanks [18:04:40] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] "forcemerging as this fails on GrowthExpreriments test in master" [core] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689889 (https://phabricator.wikimedia.org/T282646) (owner: 10Jdlrobson) [18:05:38] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 11defd4181103598222df34d9f1aa6dc428f66cd: enwiki: Growth features: Change help panel links (T281896) (duration: 01m 23s) [18:05:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:43] T281896: Deploy Growth features on English Wikipedia - https://phabricator.wikimedia.org/T281896 [18:06:22] Jdlrobson: pulled onto mwdebug1001, can you test? [18:07:03] (03PS3) 10Zabe: Update wordmark and tagline for kawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689987 (https://phabricator.wikimedia.org/T278251) [18:07:32] Urbanecm: testing [18:08:27] Urbanecm: confirmed - please sync! [18:08:29] (03CR) 10Zabe: "I'm not that much into svgs, I think they are optimized now ?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689987 (https://phabricator.wikimedia.org/T278251) (owner: 10Zabe) [18:08:30] syncing [18:11:00] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.5/includes/skins/SkinTemplate.php: 7f1491337d1eef2629fea8031f066c490ea86987: Modern keys must be unset (T282646) (duration: 01m 08s) [18:11:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:04] T282646: User page shows up as article tab in MonoBook, Timeless skins - https://phabricator.wikimedia.org/T282646 [18:11:10] synced Jdlrobson [18:11:32] (03CR) 10Urbanecm: [C: 03+2] Disable Education Program namespaces in cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689883 (https://phabricator.wikimedia.org/T282691) (owner: 10Zabe) [18:11:36] (03PS2) 10Urbanecm: Disable Education Program namespaces in cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689883 (https://phabricator.wikimedia.org/T282691) (owner: 10Zabe) [18:11:40] (03CR) 10Urbanecm: [C: 03+2] Disable Education Program namespaces in cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689883 (https://phabricator.wikimedia.org/T282691) (owner: 10Zabe) [18:12:25] thanks Urbanecm ! [18:12:28] any time [18:12:40] bug gone! phew! [18:13:04] great! [18:13:58] (03Merged) 10jenkins-bot: Disable Education Program namespaces in cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689883 (https://phabricator.wikimedia.org/T282691) (owner: 10Zabe) [18:14:00] (03PS4) 10Urbanecm: Update wordmark and tagline for kawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689987 (https://phabricator.wikimedia.org/T278251) (owner: 10Zabe) [18:14:05] (03CR) 10Urbanecm: [C: 03+2] Update wordmark and tagline for kawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689987 (https://phabricator.wikimedia.org/T278251) (owner: 10Zabe) [18:14:25] Zabe: can you check the cswiki patch at mwdebug1001? [18:15:02] Urbanecm: works the supposed waay [18:16:22] (03Merged) 10jenkins-bot: Update wordmark and tagline for kawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689987 (https://phabricator.wikimedia.org/T278251) (owner: 10Zabe) [18:16:42] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [18:16:43] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 0cd3297b79e92bd39c0cebd1591a14591f57ecb0: Disable Education Program namespaces in cswiki (T282691) (duration: 01m 15s) [18:16:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:47] T282691: Remove namespace 446 and 447 from cswiki - https://phabricator.wikimedia.org/T282691 [18:18:45] Zabe: can you test the second patch at mwdebug1001 please? [18:18:55] yes [18:19:35] please do then :) [18:19:58] Urbanecm: works :) [18:20:23] thanks [18:20:51] tgr_: your patch has a test error that was discussed earlier in slack ("20:04:56 Error: Call to undefined method Mock_PageViewService_963a7339::supports()"). https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php72-docker/63691/console [18:21:19] thanks Zabe, syncing [18:21:37] Urbanecm: don't all patches have that now? [18:22:00] i'm bit surprised it started to happen it for our own repo too [18:22:12] but yes, all other MW patches do [18:22:33] I imagine it's from the extension-gate test, not the normal test [18:22:49] right [18:22:52] (03PS2) 10Dzahn: site: add peoplweb role to people2002 [puppet] - 10https://gerrit.wikimedia.org/r/689407 (https://phabricator.wikimedia.org/T280989) [18:23:09] I don't see the error though [18:23:36] you meant https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/689545 right? [18:23:51] positive [18:24:03] there's an error listed for that in https://integration.wikimedia.org/zuul/ [18:24:22] RECOVERY - nova-compute proc minimum on cloudvirt1038 is OK: PROCS OK: 1 process with regex args ^/usr/bin/pytho[n].* /usr/bin/nova-compute https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting [18:25:41] !log urbanecm@deploy1002 sync-file aborted: eb65aff2eccec58f14721958f2b9218266eedeb4: Update wordmark and tagline for kawiki (T278251) (duration: 00m 00s) [18:25:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:45] T278251: Change wordmark and tagline for Georgian Wikipedia (kawiki) - https://phabricator.wikimedia.org/T278251 [18:26:53] !log urbanecm@deploy1002 Synchronized static/images/mobile/: eb65aff2eccec58f14721958f2b9218266eedeb4: Update wordmark and tagline for kawiki (T278251; 1/2) (duration: 01m 06s) [18:26:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:18] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: eb65aff2eccec58f14721958f2b9218266eedeb4: Update wordmark and tagline for kawiki (T278251; 2/2) (duration: 01m 09s) [18:28:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:24] should be live Zabe [18:29:42] (03CR) 10Dzahn: [C: 03+2] site: add peoplweb role to people2002 [puppet] - 10https://gerrit.wikimedia.org/r/689407 (https://phabricator.wikimedia.org/T280989) (owner: 10Dzahn) [18:30:03] Urbanecm: T282720 is the CI break fix [18:30:06] T282720: GrowthExperiments / SuggestedEditsTest: Call to undefined method Mock_PageViewService_963a7339::supports() - https://phabricator.wikimedia.org/T282720 [18:30:21] looking [18:30:24] T282720 I mean [18:30:37] duh, Linux clipboards [18:30:45] those two IDs are same [18:30:48] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/690007 [18:30:57] I just got a very short connection failed browsing phab [18:31:48] and also kostajh just +2'ed it, so...i think that's good [18:34:20] (03CR) 10jerkins-bot: [V: 04-1] Add Link: refine exclusion rules for finding link text matches [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689545 (https://phabricator.wikimedia.org/T267694) (owner: 10Kosta Harlan) [18:37:18] thanks [18:37:48] np [18:38:12] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] "passed everything but gate, which is fixed already. merging." [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689545 (https://phabricator.wikimedia.org/T267694) (owner: 10Kosta Harlan) [18:38:20] (03PS1) 10Gergő Tisza: Skip SuggestedEditsTest when PageViewInfo is not installed [extensions/GrowthExperiments] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689893 (https://phabricator.wikimedia.org/T282720) [18:38:43] (03PS1) 10Gergő Tisza: Skip SuggestedEditsTest when PageViewInfo is not installed [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689894 (https://phabricator.wikimedia.org/T282720) [18:38:58] tgr_: i already forcemerged it btw, but adding it to wmf.* makes probably sense as well [18:39:06] pulled onto mwdebug1001, can you have a look? [18:39:17] https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/689893 / https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/689894 [18:39:34] (03CR) 10Urbanecm: [C: 03+2] Skip SuggestedEditsTest when PageViewInfo is not installed [extensions/GrowthExperiments] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689893 (https://phabricator.wikimedia.org/T282720) (owner: 10Gergő Tisza) [18:39:36] (03CR) 10Urbanecm: [C: 03+2] Skip SuggestedEditsTest when PageViewInfo is not installed [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689894 (https://phabricator.wikimedia.org/T282720) (owner: 10Gergő Tisza) [18:40:21] that's for 689545 right? It doesn't touch anything that's exposed to production users, no need to test it [18:40:43] oka,y syncing [18:42:20] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 3999be113362b4cdf0aecb3597bbe42ea06cec7a: Add Link: refine exclusion rules for finding link text matches (duration: 01m 08s) [18:42:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:35] should be live tgr_ [18:42:37] anything else? [18:44:15] can you merge and pull https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/689893 / https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/689894 ? [18:44:24] it doesn't need to be synced [18:44:24] doing :) [18:44:33] yup yup, just to unbreak gate for wmf.* [18:46:47] (Primary outbound port utilisation over 80% #page) firing: Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org [18:48:12] !log rsyncing home dirs of people1003 over to people2002 as well (T280989) [18:48:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:16] T280989: try planet/people on bullseye / upgrade people.wikimedia.org backends to bullseye - https://phabricator.wikimedia.org/T280989 [18:50:17] tgr_: it _also_ fails tests [18:51:06] those and similar https://www.irccloud.com/pastebin/CXXcxYJq/ [18:51:47] (Primary outbound port utilisation over 80% #page) resolved: Primary outbound port utilisation over 80% #page - https://alerts.wikimedia.org [18:59:20] Urbanecm: can you force-merge it? I'll see if I can fix the other test error [18:59:28] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [18:59:30] !log [Elastic] Restarted `*search*` services on `elastic2058` [18:59:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:59:39] (why didn't that show up immediately?) [19:00:04] dancy and brennen: #bothumor My software never has bugs. It just develops random features. Rise for MediaWiki train - American Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210512T1900). [19:00:44] !log ryankemper@cumin2001 START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563 [19:00:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:47] T280563: Reboot elasticsearch* and relforge* to apply kernel security updates - https://phabricator.wikimedia.org/T280563 [19:00:56] !log T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin2001` tmux session `elastic_restarts` [19:00:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:06] (03PS1) 10Ahmon Dancy: group1 wikis to 1.37.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/690013 [19:02:08] (03CR) 10Ahmon Dancy: [C: 03+2] group1 wikis to 1.37.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/690013 (owner: 10Ahmon Dancy) [19:03:49] (03Merged) 10jenkins-bot: group1 wikis to 1.37.0-wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/690013 (owner: 10Ahmon Dancy) [19:05:20] !log T280382 T281437 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2007.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage` [19:05:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:05:24] T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382 [19:05:24] T281437: hw troubleshooting: ssh unreachable for wdqs2007.codfw.wmnet - https://phabricator.wikimedia.org/T281437 [19:05:38] !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5 [19:05:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:23] (03CR) 10jerkins-bot: [V: 04-1] Skip SuggestedEditsTest when PageViewInfo is not installed [extensions/GrowthExperiments] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689893 (https://phabricator.wikimedia.org/T282720) (owner: 10Gergő Tisza) [19:06:45] !log dancy@deploy1002 Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s) [19:06:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:10] !log ryankemper@cumin2001 END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563 [19:07:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:13] T280563: Reboot elasticsearch* and relforge* to apply kernel security updates - https://phabricator.wikimedia.org/T280563 [19:08:03] (03CR) 10jerkins-bot: [V: 04-1] Skip SuggestedEditsTest when PageViewInfo is not installed [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689894 (https://phabricator.wikimedia.org/T282720) (owner: 10Gergő Tisza) [19:08:11] (03PS1) 10Ahmon Dancy: group1 wikis to 1.37.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/690014 [19:08:13] (03CR) 10Ahmon Dancy: [C: 03+2] group1 wikis to 1.37.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/690014 (owner: 10Ahmon Dancy) [19:08:40] RECOVERY - Check systemd state on elastic2058 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:09:05] (03Merged) 10jenkins-bot: group1 wikis to 1.37.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/690014 (owner: 10Ahmon Dancy) [19:09:25] !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer [19:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:33] !log T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage` [19:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:46] dancy: heads up that MediaWiki tests are broken (GrowthExperiments was added to the gate and it didn't go well). I don't think it affects the train. [19:09:59] deleting that last line from SAL, slightly wrong tmux session [19:10:00] !log T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage` [19:10:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:10:17] tgr_: Thx [19:10:42] !log dancy@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4 [19:10:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:32] Train has been rolled back to group0 for https://phabricator.wikimedia.org/T282735 [19:11:49] !log dancy@deploy1002 Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s) [19:11:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:53] 10SRE, 10ops-codfw: Degraded RAID on wdqs2007 - https://phabricator.wikimedia.org/T282068 (10RKemper) a:03RKemper [19:14:00] 10SRE, 10ops-codfw: Degraded RAID on wdqs2007 - https://phabricator.wikimedia.org/T282068 (10RKemper) Reimaging `wdqs2007` to see if that resolve this warning [19:14:36] (03PS1) 10Dzahn: DHCP: remove people1002 and people2001 [puppet] - 10https://gerrit.wikimedia.org/r/690021 (https://phabricator.wikimedia.org/T280989) [19:15:00] !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [19:15:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:21] (03CR) 10Dzahn: [C: 03+2] DHCP: remove people1002 and people2001 [puppet] - 10https://gerrit.wikimedia.org/r/690021 (https://phabricator.wikimedia.org/T280989) (owner: 10Dzahn) [19:15:29] !log ryankemper@cumin1001 START - Cookbook sre.wdqs.data-transfer [19:15:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:50] !log T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage` [19:15:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:53] T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382 [19:19:13] (03PS4) 10Dave Pifke: logstash: move kafka input configs to profile::logstash::kafka_inputs [puppet] - 10https://gerrit.wikimedia.org/r/683695 (https://phabricator.wikimedia.org/T233134) (owner: 10Herron) [19:20:20] (03CR) 10Dave Pifke: "Rebased to fix merge conflict after Ibf711af8518." [puppet] - 10https://gerrit.wikimedia.org/r/683695 (https://phabricator.wikimedia.org/T233134) (owner: 10Herron) [19:21:40] 10SRE, 10ops-codfw, 10Discovery, 10Discovery-Search (Current work): elastic2033 without bootable devices available - https://phabricator.wikimedia.org/T281621 (10RKemper) 05Open→03Resolved [19:22:44] 10SRE, 10ops-codfw, 10Discovery, 10Discovery-Search (Current work): elastic2033 without bootable devices available - https://phabricator.wikimedia.org/T281621 (10RKemper) Looks like I didn't comment back here but I re-enabled puppet on May 5. The host has been healthy since then. [19:24:33] (03CR) 10Legoktm: [C: 03+2] registry-homepage-builder.py: Add footer identifying the generating program (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/689980 (owner: 10Ahmon Dancy) [19:25:16] !log ryankemper@cumin2001 START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE [19:25:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:37] 10SRE, 10MediaWiki-extensions-CodeReview, 10Wikimedia-production-error: Exec error "Possibly missing executable file: svn diff" from Special:Code - https://phabricator.wikimedia.org/T204801 (10Krinkle) 05Open→03Declined Doesnt' cause an exception or other prod error. The shell output is logged for analys... [19:27:23] !log ryankemper@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE [19:27:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:36] (03PS2) 10Hashar: Builder: use the full image tag, not just the name when pulling [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/689945 (owner: 10Giuseppe Lavagetto) [19:28:59] (03CR) 10Hashar: [C: 03+1] "I have fixed the test to reflect the new reality :]" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/689945 (owner: 10Giuseppe Lavagetto) [19:35:50] (03PS1) 10Kosta Harlan: Add a link: select annotation view when acceptance changes on desktop [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689898 (https://phabricator.wikimedia.org/T282175) [19:39:12] (03CR) 10Dave Pifke: "Beta puppet is broken because this is cherry-picked and now generates errors because of Icad66f70 and Ic6e4e2c9. It's been over a year si" [puppet] - 10https://gerrit.wikimedia.org/r/439774 (owner: 10Alex Monk) [19:42:52] tgr_, kostajh: I think we're going to need to wait for the Wikibase mock change for anything else to work, but thereafter things should be OK? Eh. Sorry for the disruption. [19:43:19] James_F: yeah, sounds right [19:46:25] of course selenium will fail [19:46:45] Oh, fun. [19:47:11] The idea of making selenium tests non-voting (and thus almost entirely ignored) is more and more tempting. [19:48:00] Ah, that's just the test pipeline the gate one is still going. [19:48:05] See https://integration.wikimedia.org/zuul/#q=690032 [19:48:54] due to https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/683914 [19:51:09] can I trigger the gate job manually for a patch? [19:53:59] tgr_: not sure. I assume the Wikibase patch needs a backport to wmf.4 + wmf.5 as well? [19:54:02] tgr_: It's possible to trigger a specific job maually, but it's a real mess. [19:54:07] kostajh: It will, yes. [19:54:18] I guess we can just see if https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/689995 merges [19:54:30] Oh, you force-merged it. [19:54:31] Eurgh. [19:54:39] That'll screw up CI for another hour then. [19:55:39] (03PS1) 10Jforrester: repo: Mock getLazyConnectionRef [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689899 (https://phabricator.wikimedia.org/T282731) [19:55:53] (03PS1) 10Jforrester: repo: Mock getLazyConnectionRef [extensions/Wikibase] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689900 (https://phabricator.wikimedia.org/T282731) [19:56:01] (03PS1) 10Ahmon Dancy: WIP: registry-homepage-builder.py: Include tag digests [puppet] - 10https://gerrit.wikimedia.org/r/690033 [19:56:12] sorry. Wasn't sure if I can get it out normally, although I imagine the Selenium errors would have gone away with enough tries. [19:57:31] we need to finish backporting the other fix too (689894/689893) and that willl also need a force merge [19:58:13] or I guess it could have the Wikibase fix as a dependency? [19:58:17] tgr_: No, PageViewInfo is now there. [19:58:37] Yeah. [19:59:06] in wmf.4/5? I don't see it. [19:59:22] https://gerrit.wikimedia.org/r/q/project:mediawiki%252Fextensions%252FGrowthExperiments+branch:wmf%252F1.37.0-wmf.4+status:merged [19:59:52] 10SRE, 10Analytics, 10LDAP-Access-Requests, 10CommRel-Specialists-Support (Apr-Jun-2021): Please grant CRS access to Superset/Turnilo (deadline EOD Monday 17) - https://phabricator.wikimedia.org/T282589 (10Aklapper) >>! In T282589#7082462, @Elitre wrote: > Are you requesting this to happen as the person in... [20:00:05] dancy and brennen: May I have your attention please! MediaWiki train - American Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210512T1900) [20:00:05] chrisalbon and accraze: That opportune time is upon us again. Time for a Services – Graphoid / ORES deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210512T2000). [20:00:21] jouncebot is very confused [20:00:29] Hmm. Yes. [20:00:32] I thought that was fixed? [20:00:42] in master, yes [20:02:11] No, I meant I thought that jouncebot was fixed. [20:03:07] (03CR) 10Kosta Harlan: [C: 03+1] repo: Mock getLazyConnectionRef [extensions/Wikibase] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689900 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [20:03:13] (03CR) 10Kosta Harlan: [C: 03+1] repo: Mock getLazyConnectionRef [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689899 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [20:06:35] (03CR) 10jerkins-bot: [V: 04-1] Add a link: select annotation view when acceptance changes on desktop [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689898 (https://phabricator.wikimedia.org/T282175) (owner: 10Kosta Harlan) [20:12:12] (03CR) 10Gergő Tisza: "recheck" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689898 (https://phabricator.wikimedia.org/T282175) (owner: 10Kosta Harlan) [20:14:40] tgr_: ^ that one will need the wikibase wmf.5 patch to merge first, I think [20:14:52] Yeah. [20:16:20] > jouncebot is very confused -- kind of confused, but the overlapping deploy windows on [[wikitech:Deployments]] is mostly to blame. [20:18:03] there is a 2 hour train window from 19:00-21:00 and also the ORES window from 20:00-21:00. The new jouncebot code "sees" all active windows at an announce checkpoint which made it double announce the train window when the overlapping ORES window started. [20:18:24] oh, right. I thought I was clicking on the master one. [20:19:03] hi, just verifying that i should not do any config deployments right now, right? [20:19:06] i saw the train was blocked [20:19:19] but dunno if that also means it is clear to do a config deploy? [20:19:27] anyway should we just force-merge the wmf.4/wmf.5 patches, or is there a better way? [20:19:47] tgr_: Probably. :-( [20:19:50] the GrowthExperiments and Wikibase CI fix will each depend on each other, I imagine [20:19:52] (03CR) 10jerkins-bot: [V: 04-1] repo: Mock getLazyConnectionRef [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689899 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [20:19:54] But not without a deploy. [20:20:08] DOn't merge into production branches without deploying, RelEng will just hate you. [20:20:22] they need to be pulled to the deploy host, sure [20:20:55] Assuming the train's not being deployed right now? [20:20:59] dancy: ottomata's question is for you, I think [20:22:41] sigh… there’s another wikibase test error https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php72-docker/63720/console [20:23:18] But just for wmf.4? [20:28:44] (03PS2) 10Mforns: analytics:refinery:job:data_purge: improve drop-el-unsanitized-events [puppet] - 10https://gerrit.wikimedia.org/r/689925 (https://phabricator.wikimedia.org/T273789) [20:32:39] (03PS3) 10Mforns: analytics:refinery:job:data_purge: improve drop-el-unsanitized-events [puppet] - 10https://gerrit.wikimedia.org/r/689925 (https://phabricator.wikimedia.org/T273789) [20:33:28] !log ryankemper@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [20:33:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:00] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/689994 (https://phabricator.wikimedia.org/T281266) (owner: 10Herron) [20:34:38] PROBLEM - WDQS high update lag on wdqs1011 is CRITICAL: 4567 ge 3600 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [20:35:30] kostajh, tgr_: I think we should just land the Wikibase and GrowthExperiments CI fixes in wmf.4/5 now, yes. [20:35:41] (03CR) 10Ottomata: [C: 03+2] "All data from event database will be purged after 90 days woohoo!" [puppet] - 10https://gerrit.wikimedia.org/r/689925 (https://phabricator.wikimedia.org/T273789) (owner: 10Mforns) [20:36:21] (03CR) 10Cwhite: [C: 03+1] "LGTM \o/" [puppet] - 10https://gerrit.wikimedia.org/r/689977 (https://phabricator.wikimedia.org/T281266) (owner: 10Herron) [20:37:12] James_F: ack, will merge [20:37:59] (03CR) 10Gergő Tisza: [V: 03+2] Skip SuggestedEditsTest when PageViewInfo is not installed [extensions/GrowthExperiments] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689893 (https://phabricator.wikimedia.org/T282720) (owner: 10Gergő Tisza) [20:38:01] (03CR) 10Gergő Tisza: [V: 03+2] Skip SuggestedEditsTest when PageViewInfo is not installed [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689894 (https://phabricator.wikimedia.org/T282720) (owner: 10Gergő Tisza) [20:38:42] (03CR) 10Gergő Tisza: [V: 03+2] "Forcing, this and T282731 are both CI breaks." [extensions/GrowthExperiments] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689893 (https://phabricator.wikimedia.org/T282720) (owner: 10Gergő Tisza) [20:38:46] (03CR) 10Gergő Tisza: [V: 03+2] "Forcing, this and T282731 are both CI breaks." [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689894 (https://phabricator.wikimedia.org/T282720) (owner: 10Gergő Tisza) [20:39:15] dancy: heads up that we are merging extension backports (affects tests only) [20:40:11] (03CR) 10Gergő Tisza: "recheck" [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689899 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [20:40:15] (03CR) 10Gergő Tisza: "recheck" [extensions/Wikibase] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689900 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [20:40:53] the new CI error is T282744 [20:40:53] T282744: Wikibase\...\EditFilterHookRunnerTest: Context should not be altered - https://phabricator.wikimedia.org/T282744 [20:41:37] Yeah, is that caused by GrowthExperiments, do you know, or one of the other repos we added? [20:44:18] not in any obvious way. What are the other repos? [20:44:43] Growth > PageViewInfo > Graph and Growth > PageImages [20:44:56] If any repo was going to break a hook contract, I'd blame Graph. [20:45:13] But I'm surprised that Wikibase is asserting how MW hooks are handled. [20:46:48] hm, none of those implement EditFilterMergedContent. [20:47:16] yeah, testing the side effect of your hooks by just invoking the hook runner is not wise. [20:50:41] (03PS1) 10Ottomata: data_purge - Refactor drop-el-unsanitized-events in to drop_event job [puppet] - 10https://gerrit.wikimedia.org/r/690038 (https://phabricator.wikimedia.org/T273789) [20:51:15] I can reproduce it locally, and see that it's just an issue with wmf.4, so let me see what bisect says [20:51:25] Inteeresting. [20:51:27] has to be GrowthExperiments, the others only have a few hooks and those definitely aren't invoked [20:51:43] maybe UserGetPreferences gets triggered somehow? [20:52:06] oh, no idea then, I was only looking at master [20:52:13] (03CR) 10Mforns: [C: 03+1] data_purge - Refactor drop-el-unsanitized-events in to drop_event job [puppet] - 10https://gerrit.wikimedia.org/r/690038 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [20:52:36] (03CR) 10Ottomata: [C: 03+2] data_purge - Refactor drop-el-unsanitized-events in to drop_event job [puppet] - 10https://gerrit.wikimedia.org/r/690038 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [20:57:13] !log starting new drop_event data purge job to drop all event data older than 90 days in the Hive event database - T273789 [20:57:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:17] T273789: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 [21:00:01] I think we should just remove the context part of that test, in any case. It's bound to break eventually - what it actually tests is not "context wasn't changed" but "context getters weren't invoked". And there are all kinds of hooks triggered during that test (EditFilterMergedContent, UserGetReservedNames...) [21:00:37] sounds fine to me, bisect is not cooperating for me at the moment anyway ("Some good revs are not ancestors of the bad rev") [21:03:18] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/GrowthExperiments/+/a3226bd368ea91b65c0daf2745f484fc42a4616b%5E%21/includes/Config/ConfigHooks.php is what removed the getConfig() call from ConfigHooks after wmf.4 [21:03:51] (03PS1) 10DLynch: PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689901 (https://phabricator.wikimedia.org/T281409) [21:04:21] (03PS1) 10DLynch: PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689902 (https://phabricator.wikimedia.org/T281409) [21:05:07] (03CR) 10Jforrester: "Note that this'll fail until T282744 is fixed (wmf.4-specific)." [extensions/WikiEditor] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689901 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [21:06:38] (03CR) 10DLynch: "> Patch Set 1:" [extensions/WikiEditor] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689901 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [21:09:09] 10SRE, 10ops-eqiad, 10Data-Services, 10cloud-services-team (Kanban): Move maps and scratch on cloudstore1008/9 to a DRBD failover similar to labstore1004/5 - https://phabricator.wikimedia.org/T224747 (10wiki_willy) a:05wiki_willy→03Jclark-ctr Hi @Bstorm - @Jclark-ctr is going to check it out and possib... [21:09:52] (03PS1) 10Volans: homer: disable diff checker on cumin2001 [puppet] - 10https://gerrit.wikimedia.org/r/690041 [21:10:42] RECOVERY - WDQS high update lag on wdqs1011 is OK: (C)3600 ge (W)1200 ge 830.4 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [21:11:08] (03CR) 10Gergő Tisza: [C: 03+2] repo: Mock getLazyConnectionRef [extensions/Wikibase] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689900 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [21:12:45] kostajh: could you test https://gerrit.wikimedia.org/r/690042 ? I don't have Wikibase in working order locally [21:13:41] do I need to squash it with the other patch or will Zuul gate them together if I stack them in Gerrit and +2 the same time? [21:15:07] (03PS1) 10Bstorm: cloud drbd: change default for this cluster to eno3 [puppet] - 10https://gerrit.wikimedia.org/r/690043 (https://phabricator.wikimedia.org/T224747) [21:16:49] (03CR) 10Bstorm: [C: 03+2] cloud drbd: change default for this cluster to eno3 [puppet] - 10https://gerrit.wikimedia.org/r/690043 (https://phabricator.wikimedia.org/T224747) (owner: 10Bstorm) [21:17:54] tgr_: looking [21:18:21] tgr_: fwiw, the problematic lines are in ConfigHooks.php [21:18:30] https://www.irccloud.com/pastebin/WvQfhRtj/ [21:19:39] (03PS1) 10Kosta Harlan: EditFilterHookRunnerTest: do not check for context changes [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689903 (https://phabricator.wikimedia.org/T282744) [21:20:01] tgr_: that works [21:26:31] (03PS5) 10Dave Pifke: logstash: move kafka input configs to profile::logstash::kafka_inputs [puppet] - 10https://gerrit.wikimedia.org/r/683695 (https://phabricator.wikimedia.org/T233134) (owner: 10Herron) [21:28:56] 10SRE, 10ops-eqiad, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Move maps and scratch on cloudstore1008/9 to a DRBD failover similar to labstore1004/5 - https://phabricator.wikimedia.org/T224747 (10Jclark-ctr) Updated brook with Correct ports being used. has link [21:29:18] (03PS2) 10Volans: homer: disable diff checker on cumin2001 [puppet] - 10https://gerrit.wikimedia.org/r/690041 [21:32:56] tgr_ / James_F I'm signing off. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/689903 should be enough to finish unbreaking everything (thank you tgr_!) [21:33:34] (03CR) 10jerkins-bot: [V: 04-1] PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689901 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [21:34:18] 10SRE, 10ops-eqiad, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Move maps and scratch on cloudstore1008/9 to a DRBD failover similar to labstore1004/5 - https://phabricator.wikimedia.org/T224747 (10Bstorm) a:05Jclark-ctr→03Bstorm Assigning back to myself to finish up the task... [21:34:41] (03CR) 10Volans: [V: 03+1] "Compiler results at https://puppet-compiler.wmflabs.org/compiler1002/29538/ seems good to me." [puppet] - 10https://gerrit.wikimedia.org/r/690041 (owner: 10Volans) [21:37:27] (03PS2) 10Gergő Tisza: EditFilterHookRunnerTest: do not check for context changes [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689903 (https://phabricator.wikimedia.org/T282744) (owner: 10Kosta Harlan) [21:39:11] (03Abandoned) 10Gergő Tisza: repo: Mock getLazyConnectionRef [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689899 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [21:39:22] (03Abandoned) 10Gergő Tisza: EditFilterHookRunnerTest: do not check for context changes [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689903 (https://phabricator.wikimedia.org/T282744) (owner: 10Kosta Harlan) [21:40:30] (03PS2) 10Dwisehaupt: Add new payments hosts to monitoring [puppet] - 10https://gerrit.wikimedia.org/r/682186 (https://phabricator.wikimedia.org/T266481) [21:45:45] (03CR) 10Gergő Tisza: "Squashed to I1e587e785a698214f6901ef8542d4012a350eda7." [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689899 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [21:45:51] (03CR) 10Gergő Tisza: "Squashed to I1e587e785a698214f6901ef8542d4012a350eda7." [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689903 (https://phabricator.wikimedia.org/T282744) (owner: 10Kosta Harlan) [21:46:27] (03PS6) 10Dave Pifke: logstash: move kafka input configs to profile::logstash::kafka_inputs [puppet] - 10https://gerrit.wikimedia.org/r/683695 (https://phabricator.wikimedia.org/T233134) (owner: 10Herron) [21:48:01] (03CR) 10jerkins-bot: [V: 04-1] logstash: move kafka input configs to profile::logstash::kafka_inputs [puppet] - 10https://gerrit.wikimedia.org/r/683695 (https://phabricator.wikimedia.org/T233134) (owner: 10Herron) [21:52:11] (03CR) 10jerkins-bot: [V: 04-1] repo: Mock getLazyConnectionRef [extensions/Wikibase] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689900 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [21:54:30] !log T280382 `wdqs1012.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv` [21:54:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:35] T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382 [21:56:03] !log ryankemper@cumin2001 START - Cookbook sre.wdqs.data-transfer [21:56:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:56:10] !log T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage` [21:56:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:56:26] (03CR) 10Dave Pifke: [C: 04-1] "I rebased and cherry-picked this in deployment-prep to unblock the people who need a working logstash in that environment." [puppet] - 10https://gerrit.wikimedia.org/r/683695 (https://phabricator.wikimedia.org/T233134) (owner: 10Herron) [22:01:23] !log ryankemper@cumin2001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [22:01:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:10:31] 10SRE, 10GitLab (Initialization), 10Release-Engineering-Team (Doing), 10User-brennen: Define auth strategy for GitLab - https://phabricator.wikimedia.org/T274461 (10thcipriani) >>! In T274461#7079233, @jbond wrote: > from what i see and i think @Sergey.Trofimovsky.SF confirmed, this is simply not possible... [22:15:06] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 236, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:15:10] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 81, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:15:59] (03CR) 10Gergő Tisza: [C: 03+2] "Flaky Selenium test." [extensions/Wikibase] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689900 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [22:19:24] (03CR) 10Gergő Tisza: "recheck" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689898 (https://phabricator.wikimedia.org/T282175) (owner: 10Kosta Harlan) [22:25:56] 10SRE, 10ops-eqiad, 10Data-Services, 10Epic, 10cloud-services-team (Hardware): Move labstore1004 and labstore1005 to 10G Ethernet - https://phabricator.wikimedia.org/T266198 (10Bstorm) [22:26:39] 10SRE, 10ops-eqiad, 10Data-Services, 10Epic, 10cloud-services-team (Hardware): Move labstore1004 and labstore1005 to 10G Ethernet - https://phabricator.wikimedia.org/T266198 (10Bstorm) The new cable is connected and confirmed working. I'll make a new task for the reconfig and retiring of the old cable. [22:27:12] (03CR) 10MGChecker: [C: 04-1] "Needs further discussion" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/689321 (https://phabricator.wikimedia.org/T282624) (owner: 10Zabe) [22:28:41] 10SRE, 10ops-codfw: Degraded RAID on wdqs2007 - https://phabricator.wikimedia.org/T282068 (10RKemper) 05Open→03Resolved [22:30:20] 10SRE, 10ops-eqiad, 10Data-Services, 10Epic, 10cloud-services-team (Hardware): Move labstore1004 and labstore1005 to 10G Ethernet - https://phabricator.wikimedia.org/T266198 (10Bstorm) 05Open→03Resolved [22:41:36] (03Merged) 10jenkins-bot: repo: Mock getLazyConnectionRef [extensions/Wikibase] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689900 (https://phabricator.wikimedia.org/T282731) (owner: 10Jforrester) [22:44:39] (03PS2) 10Ryan Kemper: install_server: add new installer to test raid0 configuration: [puppet] - 10https://gerrit.wikimedia.org/r/689786 (https://phabricator.wikimedia.org/T280382) (owner: 10Jbond) [22:49:08] (03PS1) 10Dwisehaupt: Monitor services for new donor_prefs flow [puppet] - 10https://gerrit.wikimedia.org/r/690053 (https://phabricator.wikimedia.org/T125272) [22:49:18] (03CR) 10Gergő Tisza: "recheck" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689898 (https://phabricator.wikimedia.org/T282175) (owner: 10Kosta Harlan) [22:49:22] (03CR) 10Ryan Kemper: "@jbond This approach looks great, but I don't fully understand ttyS0 vs ttyS1 here. Clearly s0 has the new layout and S1 doesn't, so what " [puppet] - 10https://gerrit.wikimedia.org/r/689786 (https://phabricator.wikimedia.org/T280382) (owner: 10Jbond) [22:52:29] (03PS2) 10DLynch: PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689901 (https://phabricator.wikimedia.org/T281409) [23:00:05] RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210512T2300). [23:00:05] kemayo: A patch you scheduled for Evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:16] wmf.5 is clear. wmf.4 still broken (courtesy of Wikibase tests taking 30min and then resulting in bogus Selenium errors more often than not). [23:00:39] Present. Mine's for .5 -- I do have an equivalent for .4, but as you say it can't be merged. [23:00:48] tgr_: ok to deploy for me? [23:01:01] yeah, thanks [23:01:11] i can deploy today then :) [23:01:40] (03CR) 10Urbanecm: [C: 03+2] PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689902 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [23:01:58] I've been repeatedly reloading the backport-CI-fixes patch and sighing at the 30 minute old gate-and-submit message for a bit now. :D [23:03:38] they take 30-40 minutes so it should resolve soon [23:12:41] (03CR) 10Urbanecm: [C: 03+2] PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689902 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [23:13:26] i can't make jenkins to spawn the jobs :/ [23:14:08] (03PS2) 10Urbanecm: PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689902 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [23:14:30] trying to remove the depends-on, the wmf.5 version was merged already [23:14:41] (03CR) 10Urbanecm: [C: 03+2] PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689902 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [23:14:44] Worth a shot [23:15:09] it!s doing something now [23:15:16] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 238, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:15:24] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 83, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [23:17:16] (03CR) 10jerkins-bot: [V: 04-1] PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689901 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [23:17:19] welp, failed again. [23:17:56] that!s wmf.4 tgr_ [23:17:58] That one's my .4 patch. [23:18:19] the wmf.5 one is so far so good [23:18:32] I updated its commit message half an hour or so ago to depend-on the CI backport, so presumably it just coincidentally finished up now. [23:18:41] no, I mean the wmf.4 CI backport. [23:19:08] Ah, right. Sorry, context confusion. [23:19:54] shall we just force merge it? [23:20:14] the CI one? [23:20:21] or the one I'm deploying? [23:21:04] The Wikibase one. It will slow down CI, but compared to waiting half an hour for it, we wouldn't be much worse off. [23:21:55] as long as it'll fix CI, I think it's fine [23:22:27] If we force-merge the wikibase CI one for .4, can I squeeze https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiEditor/+/689901 into this window as well? Or should I just go ahead and schedule it for tomorrow? [23:22:53] i'm happy to do the .4 too Kemayo [23:23:14] Cool, thanks. [23:24:22] Matters slightly more since the train is delayed. I had initially thought it'd be a wash since .5 would be out everywhere tomorrow anyway. [23:27:06] !log ryankemper@cumin2001 START - Cookbook sre.wdqs.data-transfer [23:27:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:12] !log T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` [23:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:15] T280382: WDQS hosts low on /srv disk space - https://phabricator.wikimedia.org/T280382 [23:27:34] bleh, amending log line: [23:27:35] !log T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2001` tmux session `wdqs_reimage` [23:27:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:32] Kemayo: since your patch is about eventlogging fix, it can't be easily tested, right? [23:30:14] Urbanecm: Unfortunately, yeah. It's completely server-side, so all the normal avenues of testing are closed. [23:30:40] Once it's deployed I can watch the rate of event validation errors decrease, basically. [23:30:46] got it [23:30:53] i'll just sync it then, once it merges [23:31:00] Sounds good. [23:34:52] (03CR) 10Dave Pifke: [C: 04-1] "With this applied, the Kafka input plugins on deployment-logstash03 are present but are failing to load due to an java.io.EOF exception." [puppet] - 10https://gerrit.wikimedia.org/r/683695 (https://phabricator.wikimedia.org/T233134) (owner: 10Herron) [23:36:35] (03Merged) 10jenkins-bot: PHP VisualEditorFeatureUse logging: properly record session id [extensions/WikiEditor] (wmf/1.37.0-wmf.5) - 10https://gerrit.wikimedia.org/r/689902 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [23:36:59] \o/ [23:37:43] tgr_: how's wmf.4 fix going? [23:37:56] doing now [23:38:16] I think a force merge would have interrupted the gate check [23:38:38] although not if it's on a different branch... didn't think that through [23:39:38] anyway, done. [23:39:41] thanks [23:40:11] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.5/extensions/WikiEditor/includes/WikiEditorHooks.php: ef4139628a36eb8b747c610c8d769a802faf2fc3: PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 08s) [23:40:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:40:15] T281409: Post-deployment QA: verify ad blocking instrumentation working as expected - https://phabricator.wikimedia.org/T281409 [23:40:27] I'll update the files on the deploy host. [23:40:37] considering the wikieditor backport passed CI in master and wmf.5, I'm going to forcemerge it too for wmf.4 [23:41:20] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] "passed CI in master and wmf.5, forcemerging" [extensions/WikiEditor] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/689901 (https://phabricator.wikimedia.org/T281409) (owner: 10DLynch) [23:41:38] 🎉 [23:42:02] (03PS1) 10Bstorm: toolforge: re-enable toolforge certificate monitor [puppet] - 10https://gerrit.wikimedia.org/r/690055 (https://phabricator.wikimedia.org/T282264) [23:43:42] hm, I can't fetch that Wikibase patch on deploy1002 [23:44:04] I thought force merges are immediate? [23:44:09] they are [23:44:40] (03CR) 10Bstorm: "I suspect the original monitor was checking on a redirect as well, so it might not care at all about that. In that case, I could make moni" [puppet] - 10https://gerrit.wikimedia.org/r/690055 (https://phabricator.wikimedia.org/T282264) (owner: 10Bstorm) [23:44:53] tgr_: you forcemerged master branch [23:44:55] not wmf.4 [23:45:26] ugh [23:45:38] it has wmf.4 as topic [23:45:40] but branch is master [23:46:08] see screen https://usercontent.irccloud-cdn.com/file/T7t28wB7/image.png [23:46:18] Well, that's confusing. [23:46:47] syncing Kemayo's wmf.4 backport [23:47:06] I pushed it with git review wmf/1.37.0-wmf.4, is that not the right syntax? [23:47:21] the man page says it is [23:47:43] but yeah, certainly not where it ended up [23:48:02] i do git review -R in those cases [23:48:06] and it definitely worked [23:48:15] (in the past, for me, i mean) [23:48:30] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.4/extensions/WikiEditor/includes/WikiEditorHooks.php: 2f6af514c49d47bbec5ce51f9f7263015e039003? PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 07s) [23:48:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:48:34] T281409: Post-deployment QA: verify ad blocking instrumentation working as expected - https://phabricator.wikimedia.org/T281409 [23:48:48] Kemayo: your backports should be live. Can you amend the calendar to add the wmf.4 one too? [23:49:39] Urbanecm: done [23:49:43] wait, how did that even merge for master? Should have been an edit conflicts, those patches were already merged on master earlier [23:49:52] no idea [23:49:58] Kemayo: thanks. Anything else I can do for you? [23:50:23] Urbanecm: Nothing else I need, thanks for being accommodating! I'll let you get back to the painful CI stuff. [23:50:51] Okay, great :). Bye! [23:52:08] I guess another force merge can't hurt, I'll just cherry-pick it to the right place [23:52:40] yeah, sounds good. Worth reviewing what the master branch actually did, IMO. [23:52:46] (03PS1) 10Gergő Tisza: Backport CI fixes [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/690066 (https://phabricator.wikimedia.org/T282731) [23:55:22] it's an empty merge commit [23:55:32] https://github.com/wikimedia/Wikibase/commit/00009978ed4723e7404fe24ab1bd7c559fc061f8 [23:56:25] interesting [23:56:49] (03CR) 10Gergő Tisza: [V: 03+2 C: 03+2] "Correct branch this time." [extensions/Wikibase] (wmf/1.37.0-wmf.4) - 10https://gerrit.wikimedia.org/r/690066 (https://phabricator.wikimedia.org/T282731) (owner: 10Gergő Tisza) [23:59:42] seems like there was some kind of git-review error and I forgot to add the branch when I pushed it again.