[00:00:04] RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Evening backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201106T0000). Please do the needful. [00:00:04] No GERRIT patches in the queue for this window AFAICS. [00:01:03] RECOVERY - Check systemd state on grafana1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:02:17] just in case, noting that i'm in the midst of a scap sync-world. [00:08:20] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install ml-deploy100[1-4] - https://phabricator.wikimedia.org/T267050 (10RobH) [00:08:21] (03PS1) 10Bstorm: toolforge bastion: reduce number of wheel-of-misfortune runs [puppet] - 10https://gerrit.wikimedia.org/r/639641 (https://phabricator.wikimedia.org/T266300) [00:20:52] 10Operations, 10Commons, 10SRE-swift-storage, 10Patch-For-Review: Recently more broken files (premature end of file at 5MB size) that were cross-wiki uploaded to Commons - https://phabricator.wikimedia.org/T266903 (10tstarling) I was able to reproduce this bug by uploading a large file to my local test wik... [00:21:27] (03CR) 10Bstorm: [C: 03+2] toolforge bastion: tweak email wording for process killer [puppet] - 10https://gerrit.wikimedia.org/r/639620 (https://phabricator.wikimedia.org/T266300) (owner: 10BryanDavis) [00:52:59] !log brennen@deploy1001 Finished scap: Synchronizing to pick up i18n for [[gerrit:639505]]. Will resume moving train to group1 on Monday morning (US) (T263182) (duration: 69m 02s) [00:53:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:53:06] T263182: 1.36.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T263182 [00:56:55] (03PS6) 10Ladsgroup: [WIP] varnish: Improve wording of the browser security error a bit [puppet] - 10https://gerrit.wikimedia.org/r/637850 (https://phabricator.wikimedia.org/T241656) [01:05:26] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudcephmon200[12] - https://phabricator.wikimedia.org/T267378 (10Andrew) >>! In T267378#6607781, @RobH wrote: > @andrew: "We only need OS partitions for these." Does this mean just a normal raid10 lvm setup of the 4 disks or what? I'... [01:05:40] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudcephmon200[12] - https://phabricator.wikimedia.org/T267378 (10Papaul) VALN ID 2105 and the VLAN it self is not created yet. @ayounsi is taking care of that. [01:16:46] 10Operations, 10SRE-Access-Requests, 10Trust-and-Safety: Requesting access to restricted production access and analytics-privatedata-users for Zxane Soo - https://phabricator.wikimedia.org/T267312 (10ZS) Team, thanks for working on this. I note that I have to input my own SSH key. which will be the one below... [01:18:23] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 18371728 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:19:47] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 26814856 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:21:51] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 19097152 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:25:03] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 796960 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:27:05] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1268272 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:32:00] (03PS7) 10Ladsgroup: [WIP] varnish: Improve wording of the browser security error a bit [puppet] - 10https://gerrit.wikimedia.org/r/637850 (https://phabricator.wikimedia.org/T241656) [01:56:22] 10Operations, 10MassMessage, 10WMF-JobQueue, 10Platform Team Workboards (Clinic Duty Team): Same MassMessage is being sent more than once - https://phabricator.wikimedia.org/T93049 (10Quiddity) One more new example of a duplication. [[https://commons.wikimedia.org/w/index.php?title=Commons:Village_pump&di... [03:33:12] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack to version 7 - https://phabricator.wikimedia.org/T234854 (10colewhite) >>! In T234854#6605499, @jcrespo wrote: > If you click at the links I sent at T234854#6439791 you can see that I get (as of this writing) no errors on DBQuery on th... [03:48:14] !log About to begin wdqs deploy, tests passing on canary `wdqs1003` [03:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:48:23] !log ryankemper@deploy1001 Started deploy [wdqs/wdqs@27a5c54]: 0.3.54 [03:48:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:51:36] !log Tests passing on canary `wdqs1003` following initial deployment, proceeding with deploy to rest of fleet [03:51:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:59:45] !log ryankemper@deploy1001 Finished deploy [wdqs/wdqs@27a5c54]: 0.3.54 (duration: 11m 22s) [03:59:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:00:52] !log `query.wikidata.org` looks good following deploy, proceeding to post-deploy steps [04:00:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:01:26] !log Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'` [04:01:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:01:52] !log Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'` [04:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:02:28] !log Restarting wdqs categories one host at a time across all wdqs production instances: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'` (in progress) [04:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:15:19] (03CR) 10Ryan Kemper: [C: 03+2] cirrus: temporarily disable saneitizer [puppet] - 10https://gerrit.wikimedia.org/r/637809 (https://phabricator.wikimedia.org/T266911) (owner: 10Ryan Kemper) [04:19:47] (03CR) 10Ryan Kemper: [C: 03+2] cirrus: fix shard_size thresholds [puppet] - 10https://gerrit.wikimedia.org/r/636811 (https://phabricator.wikimedia.org/T265908) (owner: 10Ryan Kemper) [04:31:45] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: Reshard commonswiki_file elasticsearch index - https://phabricator.wikimedia.org/T260083 (10RKemper) Housekeeping note: see https://phabricator.wikimedia.org/T265908 for the patch that changes the alert thresholds, which should clear the al... [04:36:33] !log Finished restarting wdqs categories one host at a time across all wdqs production instances [04:36:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:38:11] !log [Deploy finished] WDQS deploy is complete; the service is healthy per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=1604633917530&to=1604637475930 [04:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:51:07] 10Operations, 10Machine Learning Platform, 10ORES, 10Okapi, and 3 others: ORES redis: max number of clients reached... - https://phabricator.wikimedia.org/T263910 (10Ladsgroup) This will very likely fix it: https://github.com/wikimedia/ores/pull/352 I can deploy it if @akosiaris or @calbon approve the PR [06:08:14] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic, 10JavaScript, and 2 others: Javascript errors: Unable to add datalayers to map - https://phabricator.wikimedia.org/T267296 (10RolandUnger) It is not the first time that the map server is not working properly (T231964, T226412). But these malfu... [06:41:28] (03PS1) 10Marostegui: mariadb: Set db11[51-76] in setup. [puppet] - 10https://gerrit.wikimedia.org/r/639668 (https://phabricator.wikimedia.org/T267043) [06:43:12] (03PS2) 10Marostegui: mariadb: Set db11[51-76] in setup. [puppet] - 10https://gerrit.wikimedia.org/r/639668 (https://phabricator.wikimedia.org/T267043) [06:44:02] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 2020-11-29) rack/setup/install db214[234] - https://phabricator.wikimedia.org/T267041 (10Marostegui) [06:45:20] (03CR) 10Marostegui: [C: 03+2] mariadb: Set db11[51-76] in setup. [puppet] - 10https://gerrit.wikimedia.org/r/639668 (https://phabricator.wikimedia.org/T267043) (owner: 10Marostegui) [06:46:22] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10Marostegui) I have merged the puppet changes needed for the initial installation (puppet for `insetup` and the partman recipe). Pending merg... [06:47:57] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10Marostegui) [06:51:38] (03PS1) 10Marostegui: mariadb: Initial setup for db214[234] [puppet] - 10https://gerrit.wikimedia.org/r/639669 (https://phabricator.wikimedia.org/T267041) [06:52:26] (03CR) 10Marostegui: [C: 03+2] mariadb: Initial setup for db214[234] [puppet] - 10https://gerrit.wikimedia.org/r/639669 (https://phabricator.wikimedia.org/T267041) (owner: 10Marostegui) [06:53:21] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 2020-11-29) rack/setup/install db214[234] - https://phabricator.wikimedia.org/T267041 (10Marostegui) I have merged the puppet changes needed for the initial installation (puppet for insetup and the partman recipe). Pending merges f... [06:53:25] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 2020-11-29) rack/setup/install db214[234] - https://phabricator.wikimedia.org/T267041 (10Marostegui) [07:12:24] (03PS1) 10Marostegui: orchestrator.conf: Add PromotionIgnoreHostnameFilters [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) [07:25:44] 10Operations, 10ops-eqiad, 10netops: Network blip for mw hosts in rack C3 (eqiad) - https://phabricator.wikimedia.org/T267242 (10elukey) 05Open→03Resolved This seems auto-resolved, let's reopen in case it re-happens. [08:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201106T0800) [08:14:34] !log installing openldap security updates on stretch/buster (client-side tools/libs only, slapd updates already deployed) [08:14:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:35] 10Operations, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Enable CAS authentication for Grafana - https://phabricator.wikimedia.org/T262512 (10MoritzMuehlenhoff) There's now: - A separate vhost grafana-rw.wikimedia.org using CAS to be used for editing dashboards and internal settings - grafana.... [08:58:44] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 3 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10hashar) [09:00:00] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack to version 7 - https://phabricator.wikimedia.org/T234854 (10jcrespo) > I have amended the dashboards for now; please have a look. Thanks, they work now. As a minor issue, one thing I noticed is that page load in the old ones takes 3... [09:02:13] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 3 others: Upgrade memcached cluster to Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10hashar) [09:12:50] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [09:12:50] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [09:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:13:04] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [09:13:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:15:04] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:15:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:31] !log installing libsndfile security updates [09:32:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:18] (03PS1) 10JMeybohm: Add kubernetes 1.16 to the list of tested versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/639736 (https://phabricator.wikimedia.org/T266032) [09:35:21] (03CR) 10David Caro: [C: 03+2] toolforge bastion: reduce number of wheel-of-misfortune runs [puppet] - 10https://gerrit.wikimedia.org/r/639641 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm) [09:38:06] (03CR) 10jerkins-bot: [V: 04-1] Add kubernetes 1.16 to the list of tested versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/639736 (https://phabricator.wikimedia.org/T266032) (owner: 10JMeybohm) [09:40:46] (03CR) 10JMeybohm: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/639736 (https://phabricator.wikimedia.org/T266032) (owner: 10JMeybohm) [09:48:10] (03CR) 10Kormat: [C: 04-1] orchestrator.conf: Add PromotionIgnoreHostnameFilters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) (owner: 10Marostegui) [09:49:45] (03CR) 10Jbond: [C: 03+2] confd: pass srv_dns directly instead of loading confd::srv_dns [puppet] - 10https://gerrit.wikimedia.org/r/617716 (https://phabricator.wikimedia.org/T247956) (owner: 10Jbond) [09:54:58] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) 05Resolved→03Open Is it possible one of the memory stick needs reseating? ` 462 - Uncorrectable Memory Error Threshold Exceeded (Processor 2, DIMM 6). The DIMM is mapped out... [09:57:09] !log installing spice security updates [09:57:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:46] 10Operations, 10Technical-blog-posts, 10Traffic: 2nd part of blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T266857 (10ema) >>! In T266857#6607325, @srodlund wrote: > @ema just checking in on this. Do you have a draft you are currently working on?... [10:02:04] PROBLEM - Check systemd state on elastic1063 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:05:32] 10Operations, 10ops-eqiad, 10Discovery-Search: Memory issue on elastic1063 caused elasticsearch to be killed - https://phabricator.wikimedia.org/T265113 (10dcausse) 05Resolved→03Open @Cmjohnson thanks for the intervention! But sadly it happened again today: ` [Fri Nov 6 09:49:40 2020] {1}[Hardware Erro... [10:06:40] !log restarted elastic on elastic1063 (T265113) [10:06:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:46] T265113: Memory issue on elastic1063 caused elasticsearch to be killed - https://phabricator.wikimedia.org/T265113 [10:07:00] RECOVERY - Check systemd state on elastic1063 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:07:34] (03CR) 10Alexandros Kosiaris: [C: 03+1] Add kubernetes 1.16 to the list of tested versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/639736 (https://phabricator.wikimedia.org/T266032) (owner: 10JMeybohm) [10:12:19] (03CR) 10Jbond: [C: 03+2] new module: debian [puppet] - 10https://gerrit.wikimedia.org/r/635356 (owner: 10Jbond) [10:17:07] (03PS1) 10Marostegui: orchestrator.conf: Decrease ReasonableMaintenanceReplicationLagSeconds [puppet] - 10https://gerrit.wikimedia.org/r/639739 (https://phabricator.wikimedia.org/T265990) [10:17:36] (03CR) 10JMeybohm: [C: 03+2] Add kubernetes 1.16 to the list of tested versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/639736 (https://phabricator.wikimedia.org/T266032) (owner: 10JMeybohm) [10:17:43] (03PS1) 10Jbond: adduser: migrate away from os_version [puppet] - 10https://gerrit.wikimedia.org/r/639740 [10:20:14] (03CR) 10Jbond: [C: 03+2] adduser: migrate away from os_version [puppet] - 10https://gerrit.wikimedia.org/r/639740 (owner: 10Jbond) [10:20:31] (03Merged) 10jenkins-bot: Add kubernetes 1.16 to the list of tested versions [deployment-charts] - 10https://gerrit.wikimedia.org/r/639736 (https://phabricator.wikimedia.org/T266032) (owner: 10JMeybohm) [10:22:17] (03PS1) 10Jbond: bacula: switch to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639741 [10:23:01] (03CR) 10jerkins-bot: [V: 04-1] bacula: switch to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639741 (owner: 10Jbond) [10:28:03] (03PS16) 10Ayounsi: Add CSV import to ProvisionServerNetwork script [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635849 [10:28:05] (03PS17) 10Ayounsi: ProvisionServerNetwork, cleanup and standardize logs format [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635853 (https://phabricator.wikimedia.org/T265339) [10:29:59] (03CR) 10Ayounsi: Add CSV import to ProvisionServerNetwork script (0310 comments) [software/netbox-extras] - 10https://gerrit.wikimedia.org/r/635849 (owner: 10Ayounsi) [10:42:42] 10Operations, 10ops-esams: Relined maintenance PA-315002 - https://phabricator.wikimedia.org/T267391 (10ayounsi) p:05Triage→03Medium [10:45:54] 10Operations, 10ops-esams: Relined maintenance PA-315002 - https://phabricator.wikimedia.org/T267391 (10ayounsi) [10:47:47] 10Operations, 10ops-eqiad, 10Analytics: analytics1046 stuck in booting - https://phabricator.wikimedia.org/T267392 (10elukey) [10:49:02] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [10:49:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:27] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [10:49:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:33] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "This will add the prefix to key requests for parsercache keys, but at the moment only one codfw host and mwdebug1001 have the feature enab" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636095 (https://phabricator.wikimedia.org/T264604) (owner: 10Aaron Schulz) [10:52:29] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:52:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:48] (03PS2) 10Jbond: bacula: switch to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639741 [10:54:02] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [10:54:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:21] (03CR) 10jerkins-bot: [V: 04-1] bacula: switch to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639741 (owner: 10Jbond) [11:04:35] 10Operations, 10serviceops: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Joe) For the ICU transition it's crucially important that no machine with write access to the databases gets updated before the date of the migration. So please do not test this on the eqiad mwdeb... [11:09:28] !log uploaded openjdk-8 8u272-b10-1~deb10u1 to buster-wikimedia/component/jdk [11:09:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:04] (03PS2) 10Marostegui: orchestrator.conf: Add PromotionIgnoreHostnameFilters [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) [11:19:15] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [11:19:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:14] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [11:20:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:42] 10Operations, 10netops, 10cloud-services-team (Kanban): Enable L3 routing on cloudsw nodes - https://phabricator.wikimedia.org/T265288 (10aborrero) >>! In T265288#6607126, @nskaggs wrote: > Given the short timelines for SRE in this case, I'm ok with changing now and again later if it comes to it. I do belie... [11:22:23] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [11:22:24] (03PS1) 10Jbond: base:standard_packages: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) [11:22:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:52] (03CR) 10Jbond: [C: 03+2] pcc: update PCC cli so that it posts to the gerrit change [puppet] - 10https://gerrit.wikimedia.org/r/636652 (owner: 10Jbond) [11:23:44] (03CR) 10jerkins-bot: [V: 04-1] base:standard_packages: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [11:24:04] (03PS1) 10Muehlenhoff: Add missing Hiera settings for profile::java on releases* [puppet] - 10https://gerrit.wikimedia.org/r/639748 [11:24:14] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [11:24:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:07] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639748 (owner: 10Muehlenhoff) [11:25:28] (03PS2) 10Jbond: base:standard_packages: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) [11:26:52] 10Operations, 10netops, 10cloud-services-team (Kanban): Enable L3 routing on cloudsw nodes - https://phabricator.wikimedia.org/T265288 (10aborrero) For the D day: `lang=shell-session # create new subnet root@cloudcontrol1004:~# neutron subnet-create --gateway 185.15.56.241 --name cloud-instances-transport1-... [11:28:21] (03PS3) 10Jbond: base:standard_packages: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) [11:29:41] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26342" [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [11:30:09] !log joining maps2005 to cassandra cluster [11:30:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:05] RECOVERY - cassandra service on maps2005 is OK: OK - cassandra is active https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [11:46:15] (03PS1) 10Hnowlan: maps: add hieradata for new hosts [puppet] - 10https://gerrit.wikimedia.org/r/639749 (https://phabricator.wikimedia.org/T266820) [11:46:42] (03PS4) 10Jbond: base:standard_packages: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) [11:47:55] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [11:47:58] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26343" [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [11:47:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:57] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [11:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:13] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [11:50:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:11] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [11:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:18] (03PS5) 10Jbond: base:standard_packages: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) [11:55:44] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26344" [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [11:56:19] (03CR) 10Jbond: [V: 03+1] "See comments for latests PCC. Entries shown in the diff which look like `Class[Packages::Zsh]` are related to the switch from require_pack" [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [11:57:26] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/639201 (https://phabricator.wikimedia.org/T267186) (owner: 10Jbond) [12:00:15] (03PS3) 10Jbond: bacula: switch to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639741 [12:01:56] (03CR) 10Jbond: [C: 03+2] bacula: switch to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639741 (owner: 10Jbond) [12:02:07] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/639202 (https://phabricator.wikimedia.org/T267186) (owner: 10Jbond) [12:09:51] (03PS1) 10Muehlenhoff: Add a Hiera option to enable ICU63 component [puppet] - 10https://gerrit.wikimedia.org/r/639751 (https://phabricator.wikimedia.org/T264991) [12:11:12] (03CR) 10jerkins-bot: [V: 04-1] Add a Hiera option to enable ICU63 component [puppet] - 10https://gerrit.wikimedia.org/r/639751 (https://phabricator.wikimedia.org/T264991) (owner: 10Muehlenhoff) [12:11:14] (03PS1) 10Jbond: apt: migrate from os_version to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639752 [12:12:36] (03CR) 10jerkins-bot: [V: 04-1] apt: migrate from os_version to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639752 (owner: 10Jbond) [12:12:39] 10Operations, 10netops, 10cloud-services-team (Kanban): Enable L3 routing on cloudsw nodes - https://phabricator.wikimedia.org/T265288 (10aborrero) @ayounsi I made a diagram based on yours to update the one in our docs (https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Neutron) Please verify these... [12:12:56] (03PS2) 10Muehlenhoff: Add a Hiera option to enable ICU63 component [puppet] - 10https://gerrit.wikimedia.org/r/639751 (https://phabricator.wikimedia.org/T264991) [12:15:32] 10Operations, 10netops, 10cloud-services-team (Kanban): Enable L3 routing on cloudsw nodes - https://phabricator.wikimedia.org/T265288 (10aborrero) Also, I don't see the CIDR object created in netbox for `185.15.56.240/29 `. Could you please create it? Or I can create it, whatever you prefer! https://netbox... [12:22:26] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10Jgiannelos) I can also reproduce this on prod: GETing the prod endpoint to render a pdf for the article "Do... [12:23:10] 10Operations, 10Puppet, 10Patch-For-Review: Puppet Proposal to remove require_package - https://phabricator.wikimedia.org/T266479 (10jbond) p:05Triage→03Low [12:24:25] 10Operations, 10Puppet, 10User-jbond: Puppet clane up Parent task - https://phabricator.wikimedia.org/T267395 (10jbond) p:05Triage→03Medium [12:24:48] 10Operations, 10Puppet, 10Patch-For-Review: Puppet Proposal to remove require_package - https://phabricator.wikimedia.org/T266479 (10jbond) [12:24:50] 10Operations, 10Puppet, 10User-jbond: Puppet clane up Parent task - https://phabricator.wikimedia.org/T267395 (10jbond) [12:25:34] (03CR) 10Muehlenhoff: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639751 (https://phabricator.wikimedia.org/T264991) (owner: 10Muehlenhoff) [12:28:40] 10Operations, 10Puppet, 10User-jbond: Puppet clean up Parent task - https://phabricator.wikimedia.org/T267395 (10Peachey88) [12:29:09] 10Operations, 10Puppet, 10User-jbond: Replace os_version with debian::codename - https://phabricator.wikimedia.org/T267396 (10jbond) [12:29:38] (03PS6) 10Jbond: base:standard_packages: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) [12:29:57] (03CR) 10Kormat: [C: 03+1] orchestrator.conf: Decrease ReasonableMaintenanceReplicationLagSeconds [puppet] - 10https://gerrit.wikimedia.org/r/639739 (https://phabricator.wikimedia.org/T265990) (owner: 10Marostegui) [12:30:26] 10Operations, 10Puppet, 10Patch-For-Review, 10User-jbond: Replace os_version with debian::codename - https://phabricator.wikimedia.org/T267396 (10jbond) p:05Triage→03Low [12:31:14] (03CR) 10Marostegui: [C: 03+1] dbtools: Add host-to-instance (031 comment) [software] - 10https://gerrit.wikimedia.org/r/639470 (owner: 10Kormat) [12:31:23] (03CR) 10Marostegui: [C: 03+2] orchestrator.conf: Decrease ReasonableMaintenanceReplicationLagSeconds [puppet] - 10https://gerrit.wikimedia.org/r/639739 (https://phabricator.wikimedia.org/T265990) (owner: 10Marostegui) [12:32:07] (03CR) 10Kormat: [C: 04-1] orchestrator.conf: Add PromotionIgnoreHostnameFilters (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) (owner: 10Marostegui) [12:32:26] (03PS1) 10Jbond: cassandra: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) [12:32:48] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26345" [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [12:33:18] (03CR) 10jerkins-bot: [V: 04-1] cassandra: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [12:33:18] (03CR) 10Kormat: [C: 03+2] dbtools: Add host-to-instance [software] - 10https://gerrit.wikimedia.org/r/639470 (owner: 10Kormat) [12:33:38] (03Merged) 10jenkins-bot: dbtools: Add host-to-instance [software] - 10https://gerrit.wikimedia.org/r/639470 (owner: 10Kormat) [12:34:27] (03PS2) 10Jbond: apt: migrate from os_version to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639752 [12:34:51] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10Jgiannelos) Something that could be interesting for debugging purposes is that on eg. `en.wikipedia.org` fo... [12:35:17] PROBLEM - Kartotherian LVS codfw #page on kartotherian.svc.codfw.wmnet is CRITICAL: /osm-intl/info.json (tile service info for osm-intl) is CRITICAL: Test tile service info for osm-intl returned the unexpected status 400 (expecting: 200) https://wikitech.wikimedia.org/wiki/Maps%23Kartotherian [12:35:33] (03PS2) 10Jbond: cassandra: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) [12:35:50] (03CR) 10jerkins-bot: [V: 04-1] apt: migrate from os_version to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639752 (owner: 10Jbond) [12:35:52] (03PS3) 10Marostegui: orchestrator.conf: Add PromotionIgnoreHostnameFilters [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) [12:36:07] akosiaris: i guess you haven't merged the CR to disabling paging for maps, yet? [12:36:08] (03CR) 10jerkins-bot: [V: 04-1] cassandra: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [12:36:15] (03CR) 10Marostegui: orchestrator.conf: Add PromotionIgnoreHostnameFilters (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) (owner: 10Marostegui) [12:36:20] kormat: nope, but I guess that's my cue [12:36:33] RECOVERY - Kartotherian LVS codfw #page on kartotherian.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Maps%23Kartotherian [12:37:33] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Multiple +1 from people from multiple teams plus the team that will assume presumably ownership, so merging. Many thanks to all" [puppet] - 10https://gerrit.wikimedia.org/r/639154 (owner: 10Alexandros Kosiaris) [12:38:04] (03CR) 10Kormat: [C: 04-1] orchestrator.conf: Add PromotionIgnoreHostnameFilters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) (owner: 10Marostegui) [12:38:04] I got the recovery page before the alarm page o_O [12:38:08] (03PS2) 10Alexandros Kosiaris: kartotherian: Don't page SREs on failure [puppet] - 10https://gerrit.wikimedia.org/r/639154 (https://phabricator.wikimedia.org/T267339) [12:38:17] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] kartotherian: Don't page SREs on failure [puppet] - 10https://gerrit.wikimedia.org/r/639154 (https://phabricator.wikimedia.org/T267339) (owner: 10Alexandros Kosiaris) [12:38:53] sobanski: ? that's... weird [12:39:18] It is, indeed [12:39:25] anyway, pages are off now for this service. Let's see if there is something we can do for it however right now. [12:41:07] network traffic has definitely increased, maybe some cassandra replication happening [12:41:12] but otherwise it does look ok [12:42:08] (03PS4) 10Marostegui: orchestrator.conf: Add PromotionIgnoreHostnameFilters [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) [12:42:11] (03CR) 10Marostegui: "Added clouddb as well as those will replace labsdb" [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) (owner: 10Marostegui) [12:43:10] (03CR) 10Hashar: [C: 03+1] Add missing Hiera settings for profile::java on releases* [puppet] - 10https://gerrit.wikimedia.org/r/639748 (owner: 10Muehlenhoff) [12:43:54] codfw is down a noe [12:43:56] *node [12:44:24] (03CR) 10Kormat: [C: 03+1] orchestrator.conf: Add PromotionIgnoreHostnameFilters [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) (owner: 10Marostegui) [12:44:25] so it wouln't be surprising if it gets a little overloaded. However, the errors manifesting as 400s makes no sense [12:44:34] (03CR) 10Marostegui: [C: 03+2] orchestrator.conf: Add PromotionIgnoreHostnameFilters [puppet] - 10https://gerrit.wikimedia.org/r/639670 (https://phabricator.wikimedia.org/T265990) (owner: 10Marostegui) [12:44:50] It's not much below the capacity it's been at for weeks given that maps2002 has been terminally unwell [12:45:12] just waiting on one CR to add a new, more powerful node to cassandra [12:45:25] what CR? [12:46:34] akosiaris: https://gerrit.wikimedia.org/r/c/operations/puppet/+/639749/ [12:47:29] (03PS3) 10Jbond: cassandra: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) [12:47:40] (03CR) 10Effie Mouzeli: [C: 03+1] Add a Hiera option to enable ICU63 component [puppet] - 10https://gerrit.wikimedia.org/r/639751 (https://phabricator.wikimedia.org/T264991) (owner: 10Muehlenhoff) [12:47:54] (03PS4) 10Hnowlan: maps: add maps100[5-8] and maps1010 [puppet] - 10https://gerrit.wikimedia.org/r/638125 [12:47:57] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] toolforge bastion: reduce number of wheel-of-misfortune runs [puppet] - 10https://gerrit.wikimedia.org/r/639641 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm) [12:48:20] (03CR) 10Muehlenhoff: [C: 03+2] Add missing Hiera settings for profile::java on releases* [puppet] - 10https://gerrit.wikimedia.org/r/639748 (owner: 10Muehlenhoff) [12:49:29] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26347" [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [12:51:13] (03CR) 10Hnowlan: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26348" [puppet] - 10https://gerrit.wikimedia.org/r/638125 (owner: 10Hnowlan) [12:52:10] (03PS4) 10Jbond: cassandra: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) [12:53:36] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26349" [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [12:53:52] (03CR) 10Alexandros Kosiaris: [C: 03+1] maps: add hieradata for new hosts [puppet] - 10https://gerrit.wikimedia.org/r/639749 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan) [12:54:35] 10Operations, 10Wikidata, 10Wikidata Query Builder, 10User-Addshore: Deploy WDQS query builder to microsites - https://phabricator.wikimedia.org/T266703 (10Addshore) @Dzahn so the "querybuilder" is a separate frontend, that we would want to deploy to `query.wikidata.org/querybuilder`. Do you have any views... [12:55:58] (03PS3) 10Jbond: apt: migrate from os_version to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639752 [12:56:41] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] openstack: Enable support for nested VMs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/638146 (owner: 10Ahmon Dancy) [12:57:16] (03CR) 10Hnowlan: [C: 03+2] maps: add hieradata for new hosts [puppet] - 10https://gerrit.wikimedia.org/r/639749 (https://phabricator.wikimedia.org/T266820) (owner: 10Hnowlan) [13:01:28] (03CR) 10Jbond: [V: 03+1] "PCC shows noop" [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [13:01:58] (03CR) 10Jbond: [C: 03+2] apt: migrate from os_version to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639752 (owner: 10Jbond) [13:05:15] !log started cassandra bootstrap of maps2005 [13:05:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:42] (03PS1) 10Jbond: cdh: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639758 (https://phabricator.wikimedia.org/T267396) [13:09:05] (03CR) 10jerkins-bot: [V: 04-1] cdh: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639758 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [13:09:19] RECOVERY - Check systemd state on maps2006 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:10:27] (03PS2) 10Jbond: cdh: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639758 (https://phabricator.wikimedia.org/T267396) [13:11:36] (03CR) 10Hnowlan: [C: 03+1] cassandra: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [13:12:44] (03PS3) 10Jbond: cdh: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639758 (https://phabricator.wikimedia.org/T267396) [13:12:47] (03PS1) 10Jbond: certgen: migrate to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639759 (https://phabricator.wikimedia.org/T267396) [13:13:05] (03PS2) 10Jbond: certgen: migrate to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639759 (https://phabricator.wikimedia.org/T267396) [13:13:41] (03CR) 10Jbond: [V: 03+1 C: 03+2] cassandra: replace os_version with debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639757 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [13:14:10] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26352" [puppet] - 10https://gerrit.wikimedia.org/r/639758 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [13:14:39] (03CR) 10Jbond: [C: 03+2] certgen: migrate to debian::codename [puppet] - 10https://gerrit.wikimedia.org/r/639759 (https://phabricator.wikimedia.org/T267396) (owner: 10Jbond) [13:14:54] (03PS1) 10Marostegui: orchestrator.conf: PromotionIgnoreHostnameFilters fix regex [puppet] - 10https://gerrit.wikimedia.org/r/639760 [13:17:21] (03CR) 10Marostegui: [C: 03+2] orchestrator.conf: PromotionIgnoreHostnameFilters fix regex [puppet] - 10https://gerrit.wikimedia.org/r/639760 (owner: 10Marostegui) [13:18:30] 10Operations, 10DBA, 10Orchestrator, 10CAS-SSO, 10User-Kormat: orchestrator: Support SSO - https://phabricator.wikimedia.org/T266106 (10Marostegui) [13:19:07] 10Operations, 10DBA, 10Orchestrator: orchestrator: Use ssl for talking to db servers - https://phabricator.wikimedia.org/T267401 (10Kormat) [13:21:09] (03PS1) 10Jbond: codesearch: replace os_version and require_packages [puppet] - 10https://gerrit.wikimedia.org/r/639761 (https://phabricator.wikimedia.org/T266479) [13:21:34] 10Operations, 10DBA, 10Orchestrator, 10User-Kormat: orchestrator: Add service monitoring - https://phabricator.wikimedia.org/T266338 (10Marostegui) Thank you Daniel! This looks good for now, so far we are going to keep notifications disabled on the host as we are doing many changes still, some of which inv... [13:22:59] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me! (But let's not merge on a Friday :-)" [puppet] - 10https://gerrit.wikimedia.org/r/639747 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:23:58] (03PS1) 10Jbond: dnsrecursore: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639762 (https://phabricator.wikimedia.org/T266479) [13:24:04] (03CR) 10Jbond: [C: 03+2] codesearch: replace os_version and require_packages [puppet] - 10https://gerrit.wikimedia.org/r/639761 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:24:38] (03CR) 10Jbond: [C: 03+2] dnsrecursore: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639762 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:25:23] 10Operations, 10DBA, 10Orchestrator: orchestrator: Use ssl for talking to db servers - https://phabricator.wikimedia.org/T267401 (10Kormat) Looking at the code, it looks like this is what happens: - if MySQLTopologyUseMixedTLS is set, check if the host 'requires' ssl - if it can auth to the db host without s... [13:28:05] PROBLEM - PHP opcache health on mwdebug1001 is CRITICAL: CRITICAL: opcache free space is below 50 MB https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:33:18] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (Issues continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) [13:33:33] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot (issue continues after board change) 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) [13:35:41] (03PS1) 10Jbond: dumps::generation: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639763 (https://phabricator.wikimedia.org/T266479) [13:35:45] (03PS1) 10Jbond: envoproxy: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639764 (https://phabricator.wikimedia.org/T266479) [13:35:58] (03PS2) 10Jbond: envoproxy: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639764 (https://phabricator.wikimedia.org/T266479) [13:36:10] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26353" [puppet] - 10https://gerrit.wikimedia.org/r/639764 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:37:00] (03PS1) 10Kormat: orchestrator: Require ssl connections to db servers [puppet] - 10https://gerrit.wikimedia.org/r/639765 (https://phabricator.wikimedia.org/T267401) [13:37:18] (03PS3) 10Jbond: envoproxy: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639764 (https://phabricator.wikimedia.org/T266479) [13:37:41] (03PS2) 10Kormat: orchestrator: Require ssl connections to db servers [puppet] - 10https://gerrit.wikimedia.org/r/639765 (https://phabricator.wikimedia.org/T267401) [13:39:04] (03CR) 10Marostegui: [C: 03+1] "Fingers crossed" [puppet] - 10https://gerrit.wikimedia.org/r/639765 (https://phabricator.wikimedia.org/T267401) (owner: 10Kormat) [13:40:26] (03PS1) 10Jbond: etc::monitoring: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639766 (https://phabricator.wikimedia.org/T266479) [13:41:17] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26355" [puppet] - 10https://gerrit.wikimedia.org/r/639766 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:41:57] (03CR) 10Jbond: [V: 03+1 C: 03+2] etc::monitoring: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639766 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:48:29] (03CR) 10Kormat: [C: 04-2] "Not merging yet, there are some issues in pontoon with this change." [puppet] - 10https://gerrit.wikimedia.org/r/639765 (https://phabricator.wikimedia.org/T267401) (owner: 10Kormat) [13:50:01] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26354" [puppet] - 10https://gerrit.wikimedia.org/r/639764 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:51:13] (03PS1) 10Jbond: geoip: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639767 (https://phabricator.wikimedia.org/T266479) [13:51:50] (03CR) 10jerkins-bot: [V: 04-1] geoip: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639767 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:52:29] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26356" [puppet] - 10https://gerrit.wikimedia.org/r/639767 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:53:32] (03CR) 10Jbond: [V: 03+1] "Changes in PCC are related to swapping out require_packages" [puppet] - 10https://gerrit.wikimedia.org/r/639764 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:56:33] (03PS2) 10Jbond: geoip: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639767 (https://phabricator.wikimedia.org/T266479) [13:57:17] ACKNOWLEDGEMENT - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 62487134184 and 0 seconds Hnowlan Awaiting reinitialisation https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [13:58:35] (03PS1) 10Jbond: git::lfs: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639768 (https://phabricator.wikimedia.org/T266479) [13:58:37] (03CR) 10Jbond: [C: 03+2] geoip: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639767 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [13:59:53] (03CR) 10jerkins-bot: [V: 04-1] git::lfs: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639768 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:01:34] 10Operations, 10Puppet, 10Goal, 10User-jbond: Puppet clean up Parent task - https://phabricator.wikimedia.org/T267395 (10Peachey88) [14:01:35] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [14:01:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:48] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [14:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:42] PROBLEM - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [14:03:37] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:03:40] (03PS2) 10Jbond: git::lfs: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639768 (https://phabricator.wikimedia.org/T266479) [14:03:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:53] ACKNOWLEDGEMENT - Check systemd state on maps2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Hnowlan Cassandra is unhealthy, will not be resurrected. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:04:53] ACKNOWLEDGEMENT - cassandra CQL 10.192.16.179:9042 on maps2002 is CRITICAL: connect to address 10.192.16.179 and port 9042: Connection refused Hnowlan Cassandra is unhealthy, will not be resurrected. https://phabricator.wikimedia.org/T93886 [14:04:53] ACKNOWLEDGEMENT - cassandra service on maps2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra is failed Hnowlan Cassandra is unhealthy, will not be resurrected. https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [14:05:00] (03CR) 10jerkins-bot: [V: 04-1] git::lfs: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639768 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:05:25] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:05:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:05] 10Operations, 10Puppet, 10User-jbond: Puppet clean up Parent task - https://phabricator.wikimedia.org/T267395 (10Aklapper) @Peachey88: Please don't; this is too small in scope for a #goal. [14:09:20] (03PS1) 10Jbond: haproxy: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639769 (https://phabricator.wikimedia.org/T266479) [14:09:45] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639769 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:11:12] (03PS3) 10Jbond: git::lfs: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639768 (https://phabricator.wikimedia.org/T266479) [14:12:24] jbond42: nice activity for a friday afternoon, refactor a ton of puppet code :D [14:12:39] elukey: ssshhhh [14:13:04] :) honestly thugh the risky ones im leaveing till monday tomerge [14:13:58] (03PS1) 10Jbond: install_server::dhcp_server: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639770 (https://phabricator.wikimedia.org/T266479) [14:14:01] jbond42: ahhahah yes yes [14:14:14] (03CR) 10Jbond: [C: 03+2] git::lfs: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639768 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:14:25] (03PS1) 10Ejegg: Special docroot for thankyou.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/639771 (https://phabricator.wikimedia.org/T259312) [14:14:33] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10Jgiannelos) I wrote a quick script to fetch some random pages and check if the rendered PDF is valid here... [14:15:01] (03PS2) 10Jbond: install_server::dhcp_server: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639770 (https://phabricator.wikimedia.org/T266479) [14:15:52] (03CR) 10jerkins-bot: [V: 04-1] Special docroot for thankyou.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/639771 (https://phabricator.wikimedia.org/T259312) (owner: 10Ejegg) [14:20:05] (03PS2) 10Ejegg: Special docroot for thankyou.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/639771 (https://phabricator.wikimedia.org/T259312) [14:21:07] (03PS3) 10Ejegg: Special docroot for thankyou.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/639771 (https://phabricator.wikimedia.org/T259312) [14:21:32] (03PS3) 10Jbond: install_server::dhcp_server: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639770 (https://phabricator.wikimedia.org/T266479) [14:21:34] (03PS1) 10Jbond: java: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639772 (https://phabricator.wikimedia.org/T266479) [14:21:51] (03PS2) 10Jbond: java: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639772 (https://phabricator.wikimedia.org/T266479) [14:22:01] (03PS3) 10Jbond: java: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639772 (https://phabricator.wikimedia.org/T266479) [14:22:12] (03CR) 10jerkins-bot: [V: 04-1] java: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639772 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:22:17] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639772 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:22:33] (03CR) 10jerkins-bot: [V: 04-1] Special docroot for thankyou.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/639771 (https://phabricator.wikimedia.org/T259312) (owner: 10Ejegg) [14:22:42] (03CR) 10Jbond: [C: 03+2] install_server::dhcp_server: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639770 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:24:57] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime [14:24:58] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:25:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:02] 10Operations, 10ops-eqiad, 10Analytics: analytics1046/analytics1057 stuck in booting - https://phabricator.wikimedia.org/T267392 (10elukey) [14:25:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:49] (03PS1) 10Jbond: jupyterhub: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639774 (https://phabricator.wikimedia.org/T266479) [14:29:30] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639774 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:29:56] (03PS1) 10Effie Mouzeli: mcrouter_wancache: tune onhost memcached [puppet] - 10https://gerrit.wikimedia.org/r/639775 (https://phabricator.wikimedia.org/T244340) [14:36:06] !log resyncing database on maps1001 [14:36:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:52] (03PS1) 10Jbond: labs_bootstrapvz: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639776 (https://phabricator.wikimedia.org/T266479) [14:44:16] (03CR) 10jerkins-bot: [V: 04-1] labs_bootstrapvz: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639776 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:44:18] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/26359" [puppet] - 10https://gerrit.wikimedia.org/r/639776 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:46:00] (03PS2) 10Jbond: labs_bootstrapvz: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639776 (https://phabricator.wikimedia.org/T266479) [14:46:15] !log installing wireshark security updates [14:46:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:15] (03CR) 10Jbond: labs_bootstrapvz: migrate to debian::codename and ensure_packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639776 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [14:57:38] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: Server moves to free up space on 10g racks - https://phabricator.wikimedia.org/T267065 (10Jclark-ctr) @elukey Hey when you get a chance can you let me know best day i can schedule with you some movies next week? [14:57:45] (03PS1) 10Jbond: libraryupgrader: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639777 (https://phabricator.wikimedia.org/T266479) [14:59:02] (03PS1) 10David Caro: [apt::conf] Allow passing integers as value [puppet] - 10https://gerrit.wikimedia.org/r/639778 [14:59:04] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [puppet] - 10https://gerrit.wikimedia.org/r/639778 (owner: 10David Caro) [14:59:14] 10Operations, 10Performance-Team, 10serviceops, 10Patch-For-Review, 10User-jijiki: MediaWiki to route specific keys to /*/mw-with-onhost-tier/ - https://phabricator.wikimedia.org/T264604 (10jijiki) @aaron We will merge your patches on Monday and enable onhost memcached on an API canary host :) [15:00:59] (03PS2) 10David Caro: [apt::conf] Allow passing integers as value [puppet] - 10https://gerrit.wikimedia.org/r/639778 [15:04:11] 10Operations, 10observability: grafana email alerting broken? - https://phabricator.wikimedia.org/T267409 (10CDanis) [15:06:41] (03PS1) 10Effie Mouzeli: hieradata: enable ICU 63 in two hosts [puppet] - 10https://gerrit.wikimedia.org/r/639780 (https://phabricator.wikimedia.org/T264991) [15:07:07] (03CR) 10jerkins-bot: [V: 04-1] hieradata: enable ICU 63 in two hosts [puppet] - 10https://gerrit.wikimedia.org/r/639780 (https://phabricator.wikimedia.org/T264991) (owner: 10Effie Mouzeli) [15:07:43] (03CR) 10Muehlenhoff: "Let's also add mwdebug2002, will be useful for additional update tests" [puppet] - 10https://gerrit.wikimedia.org/r/639780 (https://phabricator.wikimedia.org/T264991) (owner: 10Effie Mouzeli) [15:07:51] (03PS2) 10Effie Mouzeli: hieradata: enable ICU 63 in two hosts [puppet] - 10https://gerrit.wikimedia.org/r/639780 (https://phabricator.wikimedia.org/T264991) [15:08:21] (03CR) 10jerkins-bot: [V: 04-1] hieradata: enable ICU 63 in two hosts [puppet] - 10https://gerrit.wikimedia.org/r/639780 (https://phabricator.wikimedia.org/T264991) (owner: 10Effie Mouzeli) [15:09:47] (03CR) 10David Caro: cloud vps: unattendedupgrades seems to have a type issue for cleaning (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639586 (owner: 10Bstorm) [15:10:48] (03CR) 10Bstorm: cloud vps: unattendedupgrades seems to have a type issue for cleaning (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639586 (owner: 10Bstorm) [15:11:13] (03PS3) 10Effie Mouzeli: hieradata: enable ICU 63 in two hosts [puppet] - 10https://gerrit.wikimedia.org/r/639780 (https://phabricator.wikimedia.org/T264991) [15:16:34] (03CR) 10Andrew Bogott: "pcc runs remove a bunch of file requires; that's expected? Looks good otherwise." [puppet] - 10https://gerrit.wikimedia.org/r/639776 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [15:21:53] (03CR) 10Jbond: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/639776 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [15:27:49] (03CR) 10Andrew Bogott: [C: 03+2] labs_bootstrapvz: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639776 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [15:34:27] 10Operations, 10SRE-Access-Requests, 10Trust-and-Safety: Requesting access to restricted production access and analytics-privatedata-users for Zxane Soo - https://phabricator.wikimedia.org/T267312 (10RobH) [15:36:21] 10Operations, 10SRE-Access-Requests, 10Trust-and-Safety: Requesting access to restricted production access and analytics-privatedata-users for Zxane Soo - https://phabricator.wikimedia.org/T267312 (10RobH) a:05ZS→03thcipriani Tyler, Inclusion in the 'restricted' group includes sudo rights on mwlog and m... [15:40:04] (03PS1) 10Urbanecm: Revert "Change votewiki language temporarily to fa for fawiki elections" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639650 (https://phabricator.wikimedia.org/T262689) [15:41:00] 10Operations, 10Desktop Improvements, 10Product-Infrastructure-Team-Backlog, 10Proton, and 3 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10akosiaris) So this is not specific to frwiki it seems. Is there perhaps some correlation between page size... [15:43:21] (03PS2) 10Urbanecm: Revert "Change votewiki language temporarily to fa for fawiki elections" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639650 (https://phabricator.wikimedia.org/T262689) [15:45:14] (03PS1) 10Jbond: librenms: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639782 (https://phabricator.wikimedia.org/T266479) [15:45:17] (03PS1) 10Jbond: lxc: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639783 (https://phabricator.wikimedia.org/T266479) [15:45:41] (03PS2) 10Jbond: lxc: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639783 (https://phabricator.wikimedia.org/T266479) [15:45:54] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639783 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [15:46:04] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639782 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [15:48:35] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:48:52] (03PS1) 10Ahmon Dancy: openstack: Enable support for nested VMs [puppet] - 10https://gerrit.wikimedia.org/r/639784 [15:49:15] (03CR) 10jerkins-bot: [V: 04-1] openstack: Enable support for nested VMs [puppet] - 10https://gerrit.wikimedia.org/r/639784 (owner: 10Ahmon Dancy) [15:50:38] (03Abandoned) 10Ahmon Dancy: openstack: Enable support for nested VMs [puppet] - 10https://gerrit.wikimedia.org/r/639784 (owner: 10Ahmon Dancy) [15:51:08] (03PS9) 10Ahmon Dancy: openstack: Enable support for nested VMs [puppet] - 10https://gerrit.wikimedia.org/r/638146 [15:52:04] (03CR) 10Ahmon Dancy: openstack: Enable support for nested VMs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/638146 (owner: 10Ahmon Dancy) [15:53:01] (03PS3) 10Jbond: lxc: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639783 (https://phabricator.wikimedia.org/T266479) [15:56:26] (03PS1) 10Jbond: mariadb: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639785 (https://phabricator.wikimedia.org/T266479) [15:56:48] (03CR) 10jerkins-bot: [V: 04-1] mariadb: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639785 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [15:56:50] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639785 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [15:57:44] (03PS2) 10Jbond: mariadb: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639785 (https://phabricator.wikimedia.org/T266479) [16:00:17] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:00:22] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install aqs101[0-5] - https://phabricator.wikimedia.org/T267414 (10RobH) [16:00:27] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install aqs101[0-5] - https://phabricator.wikimedia.org/T267414 (10RobH) [16:01:01] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) rack/setup/install aqs101[0-5] - https://phabricator.wikimedia.org/T267414 (10RobH) [16:04:17] (03PS1) 10Jbond: mediawiki: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639786 (https://phabricator.wikimedia.org/T266479) [16:04:46] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639786 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [16:05:34] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639786 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [16:05:36] (03CR) 10Bstorm: "We should, perhaps, switch my File class back to an apt::config type in this patch as well." [puppet] - 10https://gerrit.wikimedia.org/r/639778 (owner: 10David Caro) [16:07:13] (03PS2) 10Jbond: mediawiki: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639786 (https://phabricator.wikimedia.org/T266479) [16:07:53] brennen: now that the pressure is off, who is a good person to talk to about logstash configuration [16:08:29] (03CR) 10jerkins-bot: [V: 04-1] mediawiki: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639786 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [16:08:33] to wit -- i'd like to be able to get the output of the formatNum deprecation warnings, without them being logspam. will wfDebugLog still show up in logstash? i'm not clear what the minimum log level we keep is [16:10:40] (03PS1) 10Jbond: mjolnir: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639787 (https://phabricator.wikimedia.org/T266479) [16:12:35] (03PS3) 10Jbond: mediawiki: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639786 (https://phabricator.wikimedia.org/T266479) [16:14:55] (03PS1) 10Jbond: motd: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639788 (https://phabricator.wikimedia.org/T266479) [16:17:29] (03PS1) 10Jbond: nginx: drop condition for ge jessei as it matches all nodes [puppet] - 10https://gerrit.wikimedia.org/r/639790 [16:17:45] RECOVERY - cassandra CQL 10.192.0.155:9042 on maps2005 is OK: TCP OK - 0.032 second response time on 10.192.0.155 port 9042 https://phabricator.wikimedia.org/T93886 [16:20:36] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2005.codfw.wmnet [16:20:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:21:58] (03CR) 10Andrew Bogott: [C: 03+1] toolforge bastion: safelist shells and related procs for the killer [puppet] - 10https://gerrit.wikimedia.org/r/639617 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm) [16:22:39] (03PS4) 10Bstorm: toolforge bastion: safelist shells and related procs for the killer [puppet] - 10https://gerrit.wikimedia.org/r/639617 (https://phabricator.wikimedia.org/T266300) [16:23:12] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install logtash203[345] - https://phabricator.wikimedia.org/T267420 (10RobH) [16:23:29] (03CR) 10Bstorm: [C: 03+2] toolforge bastion: safelist shells and related procs for the killer [puppet] - 10https://gerrit.wikimedia.org/r/639617 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm) [16:23:36] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install logtash203[345] - https://phabricator.wikimedia.org/T267420 (10RobH) [16:23:42] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install logtash203[345] - https://phabricator.wikimedia.org/T267420 (10RobH) [16:26:05] (03PS1) 10Jbond: ntp: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639791 (https://phabricator.wikimedia.org/T266479) [16:26:19] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639786 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [16:27:22] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/639791 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [16:29:39] (03CR) 10Zhuyifei1999: toolforge bastion: safelist shells and related procs for the killer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639617 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm) [16:29:58] (03PS1) 10Jbond: openstack: drop redundant os_version check [puppet] - 10https://gerrit.wikimedia.org/r/639792 [16:30:05] (03CR) 10Andrew Bogott: "btw, it would probably be useful if you make a phab task explaining your use case and attach this task to it" [puppet] - 10https://gerrit.wikimedia.org/r/638146 (owner: 10Ahmon Dancy) [16:32:10] (03CR) 10Bstorm: toolforge bastion: safelist shells and related procs for the killer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639617 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm) [16:35:09] (03CR) 10Andrew Bogott: [C: 03+1] openstack: drop redundant os_version check [puppet] - 10https://gerrit.wikimedia.org/r/639792 (owner: 10Jbond) [16:37:42] (03PS1) 10Jbond: openstack: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639793 (https://phabricator.wikimedia.org/T266479) [16:37:59] (03CR) 10Jbond: [C: 03+2] openstack: drop redundant os_version check [puppet] - 10https://gerrit.wikimedia.org/r/639792 (owner: 10Jbond) [16:41:13] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic, 10JavaScript, and 2 others: Javascript errors: Unable to add datalayers to map - https://phabricator.wikimedia.org/T267296 (10hnowlan) The maps clusters in both datacentres are now stable and these URLs appear to be working now. More accurate... [16:41:17] (03CR) 10Andrew Bogott: [C: 03+1] "lgtm. The patch description mentions ensure_packages which I don't think actually appears in this patch." [puppet] - 10https://gerrit.wikimedia.org/r/639793 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [16:43:16] (03CR) 10Jbond: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/639793 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [16:44:10] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime [16:44:12] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [16:44:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:16] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime [16:44:17] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [16:44:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:39] (03PS1) 10Jbond: res: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639794 (https://phabricator.wikimedia.org/T266479) [16:44:54] !log hnowlan@cumin1001 START - Cookbook sre.hosts.downtime [16:44:55] !log hnowlan@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [16:45:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:45:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:27] (03PS1) 10Jbond: postgresql::server: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639795 (https://phabricator.wikimedia.org/T266479) [16:49:54] (03PS1) 10Bstorm: toolforge bastion: improve the killer a bit [puppet] - 10https://gerrit.wikimedia.org/r/639796 (https://phabricator.wikimedia.org/T266300) [16:51:23] (03CR) 10Bstorm: toolforge bastion: improve the killer a bit (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639796 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm) [16:51:50] (03CR) 10Andrew Bogott: [C: 03+1] toolforge bastion: improve the killer a bit [puppet] - 10https://gerrit.wikimedia.org/r/639796 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm) [16:53:04] 10Operations, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10thcipriani) >>! In T264991#6603285, @jijiki wrote: > @thcipriani we will be upgrading to ICU 63 on the 16th Nov 2020. Since we will be restarting php-fpm across the cluster t... [16:53:43] (03PS1) 10Jbond: profile::analytics: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639798 (https://phabricator.wikimedia.org/T266479) [16:55:52] (03CR) 10CDanis: [C: 03+1] "LGTM! Did you want to do the deploy on this one, or should I?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639601 (https://phabricator.wikimedia.org/T259312) (owner: 10Ejegg) [16:57:13] (03PS1) 10Jbond: profile::backup: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639799 (https://phabricator.wikimedia.org/T266479) [16:59:43] 10Operations, 10SRE-Access-Requests, 10Trust-and-Safety: Requesting access to restricted production access and analytics-privatedata-users for Zxane Soo - https://phabricator.wikimedia.org/T267312 (10thcipriani) a:05thcipriani→03RobH >>! In T267312#6608971, @RobH wrote: > Tyler, > > Inclusion in the 're... [16:59:45] 10Operations, 10LDAP-Access-Requests: Add msantos to wmf LDAP group - https://phabricator.wikimedia.org/T267125 (10hnowlan) This was a typo on my behalf, oops. resolving. [16:59:54] 10Operations, 10LDAP-Access-Requests: Add msantos to wmf LDAP group - https://phabricator.wikimedia.org/T267125 (10hnowlan) 05Open→03Declined [17:00:15] 10Operations, 10SRE-Access-Requests, 10Trust-and-Safety: Requesting access to restricted production access and analytics-privatedata-users for Zxane Soo - https://phabricator.wikimedia.org/T267312 (10RobH) [17:00:58] (03CR) 10Ejegg: "Thanks CDanis! If you don't mind doing the deploy I'd appreciate the help. I see the instructions at https://wikitech.wikimedia.org/wiki/H" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639601 (https://phabricator.wikimedia.org/T259312) (owner: 10Ejegg) [17:02:16] (03CR) 10Bstorm: [C: 03+2] toolforge bastion: improve the killer a bit [puppet] - 10https://gerrit.wikimedia.org/r/639796 (https://phabricator.wikimedia.org/T266300) (owner: 10Bstorm) [17:02:57] (03PS1) 10Jbond: profile::ceph: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639800 (https://phabricator.wikimedia.org/T266479) [17:03:41] (03CR) 10Muehlenhoff: "The patch per se is fine, but we don't use ntp on jessie any more (client-side systemd-timesyncd is used), so we can simply axe the code e" [puppet] - 10https://gerrit.wikimedia.org/r/639791 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [17:04:10] (03PS4) 10CDanis: Special docroot for thankyou.wikipedia.org [puppet] - 10https://gerrit.wikimedia.org/r/639771 (https://phabricator.wikimedia.org/T259312) (owner: 10Ejegg) [17:04:23] (03CR) 10CDanis: "fixed wmf_style being linty" [puppet] - 10https://gerrit.wikimedia.org/r/639771 (https://phabricator.wikimedia.org/T259312) (owner: 10Ejegg) [17:04:35] jouncebot: refresh [17:04:36] I refreshed my knowledge about deployments. [17:04:36] jouncebot: next [17:04:37] In 14 hour(s) and 55 minute(s): No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201107T0800) [17:05:09] ejegg: do you mind waiting until Monday on this? [17:05:52] cdanis: not at all [17:05:57] great :) [17:06:10] we generally avoid friday deploys to the payments cluster too :) [17:06:16] (03PS1) 10Jbond: P:ci: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639801 (https://phabricator.wikimedia.org/T266479) [17:06:23] yeah, it's not mega-risky but also there's ... enough crufty stuff in the apache configs I'm not eager to push on a Friday [17:06:28] thanks! [17:07:48] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack to version 7 - https://phabricator.wikimedia.org/T234854 (10colewhite) >>! In T234854#6608216, @jcrespo wrote: > As a minor issue, one thing I noticed is that page load in the old ones takes 3 seconds, 26 seconds on the new one. This i... [17:09:03] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Upgrade ELK Stack to version 7 - https://phabricator.wikimedia.org/T234854 (10jcrespo) >>! In T234854#6609248, @colewhite wrote: > Unfortunately, it is the browser causing the delay. [[ https://github.com/elastic/kibana/issues/76401 | Kibana 7.10 is pu... [17:09:16] (03CR) 10Muehlenhoff: "In fact someone made a patch about this some months ago, but then it crept into backlog :-) https://gerrit.wikimedia.org/r/c/operations/pu" [puppet] - 10https://gerrit.wikimedia.org/r/639791 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [17:09:20] (03PS1) 10RobH: adding user zxane [puppet] - 10https://gerrit.wikimedia.org/r/639802 (https://phabricator.wikimedia.org/T267312) [17:10:09] (03CR) 10RobH: [C: 03+2] adding user zxane [puppet] - 10https://gerrit.wikimedia.org/r/639802 (https://phabricator.wikimedia.org/T267312) (owner: 10RobH) [17:10:50] 10Operations, 10SRE-Access-Requests, 10Trust-and-Safety: Requesting access to restricted production access and analytics-privatedata-users for Zxane Soo - https://phabricator.wikimedia.org/T267312 (10RobH) [17:12:35] (03PS1) 10Jbond: P:conf::client: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639803 (https://phabricator.wikimedia.org/T266479) [17:12:39] (03PS1) 10Jbond: P:cumin::master: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639804 (https://phabricator.wikimedia.org/T266479) [17:12:59] (03PS2) 10Jbond: P:cumin::master: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639804 (https://phabricator.wikimedia.org/T266479) [17:14:52] 10Operations, 10SRE-Access-Requests, 10Trust-and-Safety: Requesting access to restricted production access and analytics-privatedata-users for Zxane Soo - https://phabricator.wikimedia.org/T267312 (10RobH) 05Open→03Resolved Both groups requested have been approved by the mangers of those shell groups, so... [17:15:38] (03CR) 10Muehlenhoff: P:cumin::master: migrate to debian::codename and ensure_packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639804 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [17:18:06] 10Operations, 10SRE-Access-Requests, 10Trust-and-Safety: Requesting access to restricted production access and analytics-privatedata-users for Zxane Soo - https://phabricator.wikimedia.org/T267312 (10RobH) >>! In T267312#6607624, @Ottomata wrote: > Anyway, approved! My understanding is that @ZS is a full ti... [17:18:48] (03PS3) 10Jbond: P:cumin::master: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639804 (https://phabricator.wikimedia.org/T266479) [17:19:59] (03PS1) 10Jbond: P:cyberbot::exec: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639805 (https://phabricator.wikimedia.org/T266479) [17:20:16] 10Operations, 10Traffic: Integration tests for Wikidough - https://phabricator.wikimedia.org/T267424 (10ssingh) [17:21:31] 10Operations, 10Traffic: Integration tests for Wikidough - https://phabricator.wikimedia.org/T267424 (10ssingh) [17:21:33] 10Operations, 10Traffic, 10Patch-For-Review: Deploy Wikidough: Experimental DNS-over-HTTPS (DoH) public resolver - https://phabricator.wikimedia.org/T252132 (10ssingh) [17:24:22] (03CR) 10Jbond: P:cumin::master: migrate to debian::codename and ensure_packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639804 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [17:27:32] (03PS1) 10Jbond: P:docker::engine: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639807 (https://phabricator.wikimedia.org/T266479) [17:29:28] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/639804 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [17:30:04] (03PS1) 10Jbond: P:java::java_8: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639808 (https://phabricator.wikimedia.org/T266479) [17:32:20] (03PS1) 10Jbond: P:lists: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639809 (https://phabricator.wikimedia.org/T266479) [17:34:23] (03PS1) 10Jbond: P:mariadb: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639810 (https://phabricator.wikimedia.org/T266479) [17:36:56] (03PS1) 10Cwhite: hiera: add grafana to alertmanager partners [puppet] - 10https://gerrit.wikimedia.org/r/639811 (https://phabricator.wikimedia.org/T267017) [17:43:43] 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: 2020-09-15) upgrade/replace memory in stat100[58] - https://phabricator.wikimedia.org/T260448 (10RobH) Just got notice this was received, so it should be delivered to our cage/storage now. [17:44:20] 10Operations, 10ops-eqiad, 10Analytics: analytics1046/analytics1057 stuck in booting - https://phabricator.wikimedia.org/T267392 (10wiki_willy) a:03Cmjohnson [17:46:44] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: Q2) eqiad: Upgrades of Management Switches - https://phabricator.wikimedia.org/T259758 (10wiki_willy) a:05Cmjohnson→03Jclark-ctr [17:48:30] (03CR) 10Jbond: [C: 04-1] "LGTM just forgot to add the updated content, also added moritz" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/639778 (owner: 10David Caro) [17:49:13] (03PS1) 10Cwhite: hiera: enable grafana smtp notifications [puppet] - 10https://gerrit.wikimedia.org/r/639812 (https://phabricator.wikimedia.org/T267409) [17:50:16] (03CR) 10Jbond: [C: 03+1] "LGTM 😊" [puppet] - 10https://gerrit.wikimedia.org/r/605071 (owner: 10Muehlenhoff) [17:50:51] (03CR) 10Jbond: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/639791 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [17:51:10] (03Abandoned) 10Jbond: ntp: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639791 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [17:51:11] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 152 and 5416 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [17:52:19] (03CR) 10Huji: Revert "Change votewiki language temporarily to fa for fawiki elections" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639650 (https://phabricator.wikimedia.org/T262689) (owner: 10Urbanecm) [17:54:14] (03CR) 10Jbond: [C: 04-1] [apt::conf] Allow passing integers as value (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639778 (owner: 10David Caro) [17:56:03] (03CR) 10CDanis: [C: 03+1] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/639812 (https://phabricator.wikimedia.org/T267409) (owner: 10Cwhite) [17:59:21] 10Operations, 10Android-app-Bugs, 10Fundraising-Backlog, 10Thank-You-Page, and 5 others: Deal with donatewiki Thank You page launching in apps - https://phabricator.wikimedia.org/T259312 (10CDanis) Patches look ready to go -- as discussed with @Ejegg we'll get this deployed Monday. [17:59:24] (03PS1) 10Jbond: P:mediawiki: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639814 (https://phabricator.wikimedia.org/T266479) [18:02:02] (03PS1) 10Bstorm: wikireplicas: set up site.pp and hosts hiera for new servers [puppet] - 10https://gerrit.wikimedia.org/r/639815 (https://phabricator.wikimedia.org/T260843) [18:02:13] (03CR) 10Hnowlan: [C: 03+2] replicate osm twice a day like codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/638481 (owner: 10MSantos) [18:03:18] (03PS3) 10Hnowlan: replicate osm twice a day like codfw cluster [puppet] - 10https://gerrit.wikimedia.org/r/638481 (owner: 10MSantos) [18:03:23] (03PS1) 10Jbond: P:openstack: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639816 (https://phabricator.wikimedia.org/T266479) [18:04:58] (03PS1) 10Jbond: P:parsoid: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639817 (https://phabricator.wikimedia.org/T266479) [18:07:16] (03PS1) 10Jbond: P:phabricator: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639818 (https://phabricator.wikimedia.org/T266479) [18:08:20] (03CR) 10Bstorm: "I have no idea what the "right" values are for innodb_buffer_pool_size, so I admit that I just threw values in there that had been used on" [puppet] - 10https://gerrit.wikimedia.org/r/639815 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [18:08:53] (03PS1) 10Jbond: P:proxysql: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639819 (https://phabricator.wikimedia.org/T266479) [18:10:38] (03PS1) 10Jbond: P:puppetmaster::common: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639820 (https://phabricator.wikimedia.org/T266479) [18:12:08] (03PS1) 10Jbond: P:python37: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639822 (https://phabricator.wikimedia.org/T266479) [18:14:30] (03PS1) 10Jbond: P:spicerack: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639823 (https://phabricator.wikimedia.org/T266479) [18:20:20] (03PS1) 10Jbond: P:tendril::webserver: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639824 (https://phabricator.wikimedia.org/T266479) [18:21:56] (03PS1) 10Jbond: P:tlsproxy::envoy: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639825 (https://phabricator.wikimedia.org/T266479) [18:25:11] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 48622144 and 7457 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [18:26:48] 10Operations, 10observability: smart-data-dump should fail loudly when it can't gather metrics - https://phabricator.wikimedia.org/T267135 (10colewhite) A cursory look shows two standing problems that are related and possibly blocking: # it cannot handle a configuration where a host has both mixed RAID and sta... [18:27:50] 10Operations, 10SRE-Access-Requests: Requesting access to production infrastructure services for jgiannelos - https://phabricator.wikimedia.org/T257187 (10Mholloway) It turns out that @Jgiannelos will need access to the `deployment` group after all (or at any rate one of [[ https://github.com/wikimedia/puppet/... [18:28:04] 10Operations, 10observability: update logging ES's template index to type the 'age' field as an integer - https://phabricator.wikimedia.org/T266906 (10colewhite) [18:28:07] 10Operations, 10Wikimedia-Logstash, 10observability, 10Patch-For-Review: Standardize the logging format - https://phabricator.wikimedia.org/T234565 (10colewhite) [18:28:26] 10Operations, 10observability: update logging ES's template index to type the 'age' field as an integer - https://phabricator.wikimedia.org/T266906 (10colewhite) p:05Triage→03Medium [18:40:46] (03PS10) 10Ahmon Dancy: openstack: Enable support for nested VMs [puppet] - 10https://gerrit.wikimedia.org/r/638146 (https://phabricator.wikimedia.org/T267433) [18:41:30] (03CR) 10Ahmon Dancy: "> Patch Set 9:" [puppet] - 10https://gerrit.wikimedia.org/r/638146 (https://phabricator.wikimedia.org/T267433) (owner: 10Ahmon Dancy) [18:48:27] (03PS1) 10Jbond: P:toolforge: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639826 (https://phabricator.wikimedia.org/T266479) [18:48:39] (03CR) 10ArielGlenn: [C: 03+1] "Looks good and pcc showed no changes on a sample dumpsdata host." [puppet] - 10https://gerrit.wikimedia.org/r/639763 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [18:49:51] (03CR) 10jerkins-bot: [V: 04-1] P:toolforge: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639826 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [18:52:43] (03PS1) 10Jbond: P:webperf::xhgui: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639827 (https://phabricator.wikimedia.org/T266479) [18:54:43] !log cwhite@cumin1001 conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet [18:54:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:56:02] (03PS1) 10Jbond: P:wikistats::httpd: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639829 (https://phabricator.wikimedia.org/T266479) [18:56:27] (03PS1) 10Dwisehaupt: Add frdb1004 to monitoring [puppet] - 10https://gerrit.wikimedia.org/r/639830 (https://phabricator.wikimedia.org/T265086) [18:58:19] RECOVERY - PHP7 rendering on mw1379 is OK: HTTP OK: HTTP/1.1 302 Found - 643 bytes in 0.055 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [18:58:59] RECOVERY - Apache HTTP on mw1379 is OK: HTTP OK: HTTP/1.1 302 Found - 629 bytes in 0.070 second response time https://wikitech.wikimedia.org/wiki/Application_servers [18:59:07] (03CR) 10Zfilipin: [C: 03+1] "Looks good to me!" [phabricator/deployment] (wmf/stable) - 10https://gerrit.wikimedia.org/r/639293 (https://phabricator.wikimedia.org/T265463) (owner: 10Harriet Ayugi) [19:00:39] (03PS1) 10Jbond: P:wmcs: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639831 (https://phabricator.wikimedia.org/T266479) [19:02:47] (03PS1) 10Jbond: P:zookeeper: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639832 (https://phabricator.wikimedia.org/T266479) [19:04:21] (03PS1) 10Jbond: P:zuul::server: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639833 (https://phabricator.wikimedia.org/T266479) [19:07:21] 10Operations, 10serviceops, 10Patch-For-Review, 10User-notice: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Krinkle) [19:07:44] 10Operations, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Krinkle) [19:10:21] (03PS1) 10Ottomata: refinery - ProduceCanaryEvents - use schema.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/639835 [19:11:54] (03CR) 10jerkins-bot: [V: 04-1] refinery - ProduceCanaryEvents - use schema.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/639835 (owner: 10Ottomata) [19:12:41] (03PS2) 10Ottomata: refinery - ProduceCanaryEvents - use schema.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/639835 [19:19:07] 10Operations, 10Machine Learning Platform, 10ORES, 10Okapi, and 3 others: ORES redis: max number of clients reached... - https://phabricator.wikimedia.org/T263910 (10calbon) @akosiaris Can you review it? I don't know enough about the nodes vs redis connection to intelligently review. [19:19:36] (03CR) 10Ottomata: [C: 03+2] refinery - ProduceCanaryEvents - use schema.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/639835 (owner: 10Ottomata) [19:26:04] (03PS1) 10Jbond: P:prometheus: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639836 (https://phabricator.wikimedia.org/T266479) [19:27:40] (03PS1) 10Jbond: P:puppet_compiler::packages: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639837 (https://phabricator.wikimedia.org/T266479) [19:34:33] (03PS1) 10Ssingh: Initial commit of the knead-wikidough test suite [software/knead-wikidough] - 10https://gerrit.wikimedia.org/r/639838 (https://phabricator.wikimedia.org/T267424) [19:36:37] (03PS1) 10Jbond: (WIP) puppetdb/puppetmaster: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639840 (https://phabricator.wikimedia.org/T266479) [19:38:29] (03PS1) 10Jbond: query_service: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639841 (https://phabricator.wikimedia.org/T266479) [19:38:55] (03PS1) 10RLazarus: Add assert_body_regex and assert_headers_regex. [software/httpbb] - 10https://gerrit.wikimedia.org/r/639842 [19:40:16] (03CR) 10Bstorm: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/26363/" [puppet] - 10https://gerrit.wikimedia.org/r/639815 (https://phabricator.wikimedia.org/T260843) (owner: 10Bstorm) [19:41:42] (03PS1) 10Jbond: racktables: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639843 (https://phabricator.wikimedia.org/T266479) [19:43:31] (03PS1) 10Jbond: O:alerting_host: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639844 (https://phabricator.wikimedia.org/T266479) [19:44:41] (03PS1) 10Jbond: mediawiki::memcached: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639845 (https://phabricator.wikimedia.org/T266479) [19:45:26] (03PS2) 10Jbond: lists: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639845 (https://phabricator.wikimedia.org/T266479) [19:50:38] (03PS1) 10Jbond: mediawaki::memcached: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639847 (https://phabricator.wikimedia.org/T266479) [19:51:53] Hey all - would like to get a quick security patch out for CologneBlue today, if possible: https://phabricator.wikimedia.org/T267278 [19:52:21] (03CR) 10RLazarus: [C: 03+1] Add a Hiera option to enable ICU63 component [puppet] - 10https://gerrit.wikimedia.org/r/639751 (https://phabricator.wikimedia.org/T264991) (owner: 10Muehlenhoff) [19:52:47] (03PS1) 10Jbond: puppetmaster::standalone: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639848 (https://phabricator.wikimedia.org/T266479) [19:53:36] (03CR) 10SBassett: "This change is ready for review." [skins/CologneBlue] (wmf/1.35.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639652 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [19:53:47] 10Operations, 10ops-eqiad, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps10[05-10].eqiad.wmnet - https://phabricator.wikimedia.org/T260269 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` maps1009.eqiad.wmnet ` The log can be found in `/var... [19:53:49] (03CR) 10SBassett: "This change is ready for review." [skins/CologneBlue] (wmf/1.35.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639653 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [19:54:22] (03CR) 10jerkins-bot: [V: 04-1] SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.35.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639652 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [19:54:42] (03CR) 10jerkins-bot: [V: 04-1] SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.35.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639653 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [19:55:58] (03PS1) 10Jbond: simplelamp2: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639849 (https://phabricator.wikimedia.org/T266479) [19:57:21] (03CR) 10CDanis: [C: 03+1] "LGTM thanks!" (032 comments) [software/httpbb] - 10https://gerrit.wikimedia.org/r/639842 (owner: 10RLazarus) [19:57:34] 10Operations, 10SRE-Access-Requests: Requesting access to production infrastructure services for jgiannelos - https://phabricator.wikimedia.org/T257187 (10jcrespo) @Mholloway No one on SRE will monitor a closed ticket- I only happened to notice it by pure chance. You will need to file a new ticket to process n... [19:58:14] sbassett there were merge conflicts in the backports [19:58:22] do you want me to fix them? [19:58:22] yep, fixing now [19:58:33] ack [19:59:14] (03PS1) 10Jbond: simplelap: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639850 (https://phabricator.wikimedia.org/T266479) [20:00:49] (03PS1) 10Jbond: striker: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639851 (https://phabricator.wikimedia.org/T266479) [20:01:36] (03PS2) 10SBassett: SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.35.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639652 (https://phabricator.wikimedia.org/T267278) [20:02:05] (03PS2) 10RLazarus: Add assert_body_regex and assert_headers_regex. [software/httpbb] - 10https://gerrit.wikimedia.org/r/639842 [20:02:18] (03CR) 10RLazarus: [C: 03+2] Add assert_body_regex and assert_headers_regex. [software/httpbb] - 10https://gerrit.wikimedia.org/r/639842 (owner: 10RLazarus) [20:02:32] (03PS2) 10SBassett: SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.35.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639653 (https://phabricator.wikimedia.org/T267278) [20:03:12] (03CR) 10jerkins-bot: [V: 04-1] SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.35.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639652 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [20:03:32] (03PS1) 10Jbond: wmcs: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639852 (https://phabricator.wikimedia.org/T266479) [20:04:13] (03Merged) 10jenkins-bot: Add assert_body_regex and assert_headers_regex. [software/httpbb] - 10https://gerrit.wikimedia.org/r/639842 (owner: 10RLazarus) [20:04:20] (03CR) 10jerkins-bot: [V: 04-1] SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.35.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639653 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [20:04:40] (03PS1) 10Jbond: rsyslog: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639853 (https://phabricator.wikimedia.org/T266479) [20:05:41] !log robh@cumin1001 START - Cookbook sre.hosts.downtime [20:05:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:51] (03CR) 10SBassett: "recheck" [skins/CologneBlue] (wmf/1.35.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639653 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [20:06:30] (03CR) 10SBassett: "recheck" [skins/CologneBlue] (wmf/1.35.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639652 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [20:07:42] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [20:07:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:26] (03PS1) 10Jbond: service: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639854 (https://phabricator.wikimedia.org/T266479) [20:11:10] (03PS1) 10Dwisehaupt: Shift payments back to eqiad after upgrades [dns] - 10https://gerrit.wikimedia.org/r/639855 (https://phabricator.wikimedia.org/T265688) [20:12:22] (03PS1) 10Jbond: smart: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639856 (https://phabricator.wikimedia.org/T266479) [20:12:22] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/475453 (https://phabricator.wikimedia.org/T204993) (owner: 10Alex Monk) [20:14:24] (03PS1) 10Jbond: sudo: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639857 (https://phabricator.wikimedia.org/T266479) [20:14:41] 10Operations, 10ops-eqiad, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps10[05-10].eqiad.wmnet - https://phabricator.wikimedia.org/T260269 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['maps1009.eqiad.wmnet'] ` and were **ALL** successful. [20:16:35] (03PS1) 10Jbond: swift: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639858 (https://phabricator.wikimedia.org/T266479) [20:17:52] (03CR) 10SBassett: "Filed T267437 for CI errors." [skins/CologneBlue] (wmf/1.35.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639653 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [20:17:58] 10Operations, 10ops-eqiad, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps10[05-10].eqiad.wmnet - https://phabricator.wikimedia.org/T260269 (10RobH) [20:18:06] (03CR) 10SBassett: "Filed T267437 for CI errors." [skins/CologneBlue] (wmf/1.35.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639652 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [20:18:09] 10Operations, 10ops-eqiad, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps10[05-10].eqiad.wmnet - https://phabricator.wikimedia.org/T260269 (10RobH) all hosts installed and set to staged. [20:20:54] sbassett went to look into that CI error, but now beta is giving me 502 errors [20:21:09] did something just change? [20:22:35] (03PS1) 10Jbond: systemd: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639859 (https://phabricator.wikimedia.org/T266479) [20:23:36] (03CR) 10jerkins-bot: [V: 04-1] systemd: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639859 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [20:24:30] (03PS1) 10Jbond: tendril: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639860 (https://phabricator.wikimedia.org/T266479) [20:25:32] (03PS1) 10Jbond: testreduce: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639861 (https://phabricator.wikimedia.org/T266479) [20:27:57] (03PS1) 10Jbond: thumbor: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639862 (https://phabricator.wikimedia.org/T266479) [20:30:16] (03PS1) 10Jbond: ulog: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639863 (https://phabricator.wikimedia.org/T266479) [20:30:41] (03CR) 10jerkins-bot: [V: 04-1] ulog: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639863 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [20:33:33] (03PS1) 10Jbond: uwsgi: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639864 (https://phabricator.wikimedia.org/T266479) [20:36:02] (03PS1) 10Jbond: varnish: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639865 (https://phabricator.wikimedia.org/T266479) [20:36:52] (03PS3) 10SBassett: SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.35.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639653 (https://phabricator.wikimedia.org/T267278) [20:36:54] (03PS1) 10Reedy: SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639655 (https://phabricator.wikimedia.org/T267278) [20:37:01] (03PS1) 10Reedy: SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639656 (https://phabricator.wikimedia.org/T267278) [20:37:29] (03Abandoned) 10Reedy: SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.35.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639653 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [20:37:34] (03Abandoned) 10Reedy: SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.35.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639652 (https://phabricator.wikimedia.org/T267278) (owner: 10SBassett) [20:40:48] (03PS1) 10Jbond: zuul: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639868 (https://phabricator.wikimedia.org/T266479) [20:41:22] (03CR) 10jerkins-bot: [V: 04-1] zuul: migrate to debian::codename and ensure_packages [puppet] - 10https://gerrit.wikimedia.org/r/639868 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [20:43:00] (03CR) 10SBassett: [C: 03+2] SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639655 (https://phabricator.wikimedia.org/T267278) (owner: 10Reedy) [20:43:10] (03CR) 10SBassett: [C: 03+2] SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639656 (https://phabricator.wikimedia.org/T267278) (owner: 10Reedy) [20:44:27] (03CR) 10Jgreen: [C: 03+2] Shift payments back to eqiad after upgrades [dns] - 10https://gerrit.wikimedia.org/r/639855 (https://phabricator.wikimedia.org/T265688) (owner: 10Dwisehaupt) [20:48:35] (03Merged) 10jenkins-bot: SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.36.0-wmf.14) - 10https://gerrit.wikimedia.org/r/639655 (https://phabricator.wikimedia.org/T267278) (owner: 10Reedy) [20:48:37] (03Merged) 10jenkins-bot: SECURITY: Fix escaping of the 'qbfind' message [skins/CologneBlue] (wmf/1.36.0-wmf.16) - 10https://gerrit.wikimedia.org/r/639656 (https://phabricator.wikimedia.org/T267278) (owner: 10Reedy) [20:51:06] (03CR) 10BryanDavis: [C: 03+1] Look for service.template in various code directories [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/636993 (https://phabricator.wikimedia.org/T266692) (owner: 10Legoktm) [20:56:21] 10Operations, 10ops-eqiad, 10DC-Ops, 10Maps: (Need By: TBD) rack/setup/install maps10[05-10].eqiad.wmnet - https://phabricator.wikimedia.org/T260269 (10RobH) 05Open→03Resolved [20:56:27] !log reedy@deploy1001 Synchronized php-1.36.0-wmf.14/skins/CologneBlue/: T267278 (duration: 01m 10s) [20:56:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:56:53] (03CR) 10Urbanecm: Revert "Change votewiki language temporarily to fa for fawiki elections" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639650 (https://phabricator.wikimedia.org/T262689) (owner: 10Urbanecm) [20:57:44] RECOVERY - PHP opcache health on mwdebug1001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [20:57:56] !log reedy@deploy1001 Synchronized php-1.36.0-wmf.16/skins/CologneBlue/: T267278 (duration: 01m 05s) [20:58:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:58:38] (03PS3) 10Urbanecm: Revert "Change votewiki language temporarily to fa for fawiki elections" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639650 (https://phabricator.wikimedia.org/T262689) [20:59:41] 10Operations, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10jijiki) >>! In T264991#6608410, @Joe wrote: > For the ICU transition it's crucially important that no machine with write access to the databases gets updated before the date... [21:17:15] 10Operations, 10SRE-Access-Requests: Requesting access to production infrastructure services for jgiannelos - https://phabricator.wikimedia.org/T257187 (10Mholloway) Thanks, @jcrespo. @thcipriani, do you have continuing concerns about @Jgiannelos being added to `deployment` in light of it being required to ac... [21:22:15] 10Operations, 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team-TODO, 10Traffic: Beta cluster seems to be extremely slow for logged in user during page navigation - https://phabricator.wikimedia.org/T267435 (10thcipriani) I think the problem is somewhere in the varnish layers; however, I'm concedin... [21:25:38] 10Operations, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10RLazarus) On a pointer from @joe (thanks!) I used the same basic approach as T86096#2325895 to identify wikis we'll need to touch with updateCollation.php. {F32455073} The... [21:30:27] 10Operations, 10serviceops, 10CommRel-Specialists-Support (Oct-Dec-2020), 10User-notice: CommRel support for ICU 63 upgrade - https://phabricator.wikimedia.org/T267145 (10RLazarus) Yep, that text looks good to me -- the "eight of the ten biggest Wikipedias" language is the last thing I wanted to verify, an... [22:00:49] (03CR) 10Volans: P:cumin::master: migrate to debian::codename and ensure_packages (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/639804 (https://phabricator.wikimedia.org/T266479) (owner: 10Jbond) [22:03:54] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:15:50] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:29:20] !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling [22:29:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:29:46] !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@dc63e7e]: Update to master, primarily updates for ores weekly predictions handling (duration: 00m 26s) [22:29:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:39:40] !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling [22:39:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:40:48] !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@bfaac0f]: Update to master, primarily updates for ores weekly predictions handling (duration: 01m 08s) [22:40:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:43] (03CR) 10Krinkle: [C: 03+1] Add "mcrouter-with-onhost-tier" entry to $wgObjectCaches [mediawiki-config] - 10https://gerrit.wikimedia.org/r/636094 (https://phabricator.wikimedia.org/T264604) (owner: 10Aaron Schulz) [22:58:14] (03PS1) 10Bstorm: toolforge k8s: upgrade docker and containerd [puppet] - 10https://gerrit.wikimedia.org/r/639881 (https://phabricator.wikimedia.org/T263284) [23:02:25] (03PS1) 10Bstorm: toolforge-k8s: AdmissionsConfiguration is GA after 1.17 [puppet] - 10https://gerrit.wikimedia.org/r/639883 (https://phabricator.wikimedia.org/T263284) [23:19:12] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:23:27] 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10wiki_willy) Just a side note - these Netbox errors should go away, once the assets are entered into Netbox: https://netbox.wikimedia.org/extras/rep... [23:31:16] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:48:07] (03CR) 10Bstorm: [C: 04-1] "This needs a little work. Since we generally keep the last version around, and that version (1.16) needs 18.09, a second docker definition" [puppet] - 10https://gerrit.wikimedia.org/r/639881 (https://phabricator.wikimedia.org/T263284) (owner: 10Bstorm)