[00:00:04] RoanKattouw, Niharika, and Urbanecm: Time to snap out of that daydream and deploy Evening backport window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201216T0000). [00:00:04] No GERRIT patches in the queue for this window AFAICS. [00:00:14] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: reformat phabbanlist for new IP banning format, remove lines that use old non-working format [puppet] - 10https://gerrit.wikimedia.org/r/649753 (https://phabricator.wikimedia.org/T270185) (owner: 10Wolfgang Kandek) [00:21:33] 10Operations, 10MW-on-K8s, 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Pipeline): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10Krinkle) Some kind of `/deploy` repo seems needed, I think, as otherwise we would be deploying unaud... [00:36:17] (03PS3) 10Wolfgang Kandek: Phabricator: New IP banning format, remove lines in old non-working format [puppet] - 10https://gerrit.wikimedia.org/r/649753 (https://phabricator.wikimedia.org/T270185) [00:41:24] PROBLEM - Postgres Replication Lag on maps1007 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 7356794272 and 847 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:41:30] PROBLEM - Postgres Replication Lag on maps1005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 7301721256 and 854 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:41:40] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1396343920 and 79 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:43:16] PROBLEM - Postgres Replication Lag on maps1010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1451460152 and 65 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:43:18] PROBLEM - Postgres Replication Lag on maps1006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 5798439072 and 324 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:43:56] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1968091808 and 92 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:44:16] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1109935816 and 53 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:44:22] PROBLEM - Postgres Replication Lag on maps1008 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1241401384 and 58 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:50:36] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 92584 and 221 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:50:54] RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 17344 and 240 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:02] RECOVERY - Postgres Replication Lag on maps1008 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 72976 and 247 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:34] RECOVERY - Postgres Replication Lag on maps1010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 30696 and 280 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:51:36] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 38776 and 282 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:53:14] RECOVERY - Postgres Replication Lag on maps1006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 355200 and 380 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:54:46] RECOVERY - Postgres Replication Lag on maps1005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 455416 and 472 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:56:18] RECOVERY - Postgres Replication Lag on maps1007 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 443304 and 564 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:58:14] PROBLEM - Postgres Replication Lag on maps2005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 170143432 and 13 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:58:20] PROBLEM - Postgres Replication Lag on maps2006 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 341639536 and 27 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [00:59:58] RECOVERY - Postgres Replication Lag on maps2006 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 575496 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:03:10] PROBLEM - Postgres Replication Lag on maps2005 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 174849288 and 12 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [01:04:50] RECOVERY - Postgres Replication Lag on maps2005 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 0 and 29 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:09:00] 10Operations, 10Traffic, 10Services (watching), 10Sustainability (MediaWiki-MultiDC): Create HTTP verb and sticky cookie DC routing in VCL - https://phabricator.wikimedia.org/T91820 (10Krinkle) [02:11:07] 10Operations, 10MediaWiki-General, 10Performance-Team, 10serviceops-radar, and 3 others: Move MainStash out of Redis to a simpler multi-dc aware solution - https://phabricator.wikimedia.org/T212129 (10Krinkle) [02:12:59] 10Operations, 10MediaWiki-General, 10Performance-Team, 10serviceops-radar, and 3 others: Move MainStash out of Redis to a simpler multi-dc aware solution - https://phabricator.wikimedia.org/T212129 (10Krinkle) [04:27:46] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 67 probes of 589 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:33:28] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 53 probes of 589 (alerts on 65) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:24:22] 10Operations, 10MediaWiki-General, 10Performance-Team, 10serviceops-radar, and 3 others: Move MainStash out of Redis to a simpler multi-dc aware solution - https://phabricator.wikimedia.org/T212129 (10Marostegui) @Gilles @Krinkle I assume we want DB backups from this database section the same way we backup... [06:27:39] (03CR) 10Elukey: "> Patch Set 1:" [homer/public] - 10https://gerrit.wikimedia.org/r/649706 (https://phabricator.wikimedia.org/T270196) (owner: 10Elukey) [06:28:30] PROBLEM - ores on ores2009 is CRITICAL: connect to address 10.192.48.90 and port 8081: Connection refused https://wikitech.wikimedia.org/wiki/Services/Monitoring/ores [06:29:29] this happens every morning on different ores nodes during log rotation --^ [06:33:28] and a puppet run fixes it, it seems as if the reload command (kill -HUP) simply stops uwsgi-ores [06:34:58] RECOVERY - ores on ores2009 is OK: HTTP OK: HTTP/1.0 200 OK - 6397 bytes in 0.085 second response time https://wikitech.wikimedia.org/wiki/Services/Monitoring/ores [06:46:37] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Access to prod mysql from stat1004 - https://phabricator.wikimedia.org/T270196 (10Marostegui) @gmodena just to confirm what was agreed at T268505#6660388 about throttling writes is still happening, right? [06:53:30] also /etc/logrotate.d/ores does, for uwsgi, copytruncate + uwsgi reload, but in theory only the former would be ok no? [06:56:22] ah it is the standard logrotate template for uwsgi [07:17:13] (03PS1) 10Marostegui: mariadb: Add eqiad x2 hosts [puppet] - 10https://gerrit.wikimedia.org/r/649771 (https://phabricator.wikimedia.org/T269324) [07:18:19] (03CR) 10Marostegui: [C: 03+2] mariadb: Add eqiad x2 hosts [puppet] - 10https://gerrit.wikimedia.org/r/649771 (https://phabricator.wikimedia.org/T269324) (owner: 10Marostegui) [07:20:53] !log Stop mysql on db2142 to clone db1151 - T269324 [07:20:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:20:56] T269324: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 [07:36:42] (03PS2) 10Ryan Kemper: [wdqs] proper selector for machines running the streaming-updater (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/646621 (https://phabricator.wikimedia.org/T266986) (owner: 10DCausse) [07:44:58] (03CR) 10Ryan Kemper: [C: 03+2] [wdqs] proper selector for machines running the streaming-updater (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/646621 (https://phabricator.wikimedia.org/T266986) (owner: 10DCausse) [07:55:59] !log depool wdqs1005 (catching up on lag) [07:56:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:34] (03PS1) 10Marostegui: wmnet: Add x2-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/649818 (https://phabricator.wikimedia.org/T269324) [08:08:28] (03CR) 10Marostegui: [C: 03+2] wmnet: Add x2-master CNAME [dns] - 10https://gerrit.wikimedia.org/r/649818 (https://phabricator.wikimedia.org/T269324) (owner: 10Marostegui) [08:15:09] (03CR) 10Elukey: [C: 03+2] sre.hadoop.change-distro-from-cdh: use systemctl mask where needed [cookbooks] - 10https://gerrit.wikimedia.org/r/649655 (owner: 10Elukey) [08:15:17] (03CR) 10Elukey: sre.hadoop.change-distro-from-cdh: use systemctl mask where needed [cookbooks] - 10https://gerrit.wikimedia.org/r/649655 (owner: 10Elukey) [08:20:00] (03CR) 10Elukey: [C: 03+2] sre.hadoop.upgrade-bigtop-distro: use systemctl mask where needed [cookbooks] - 10https://gerrit.wikimedia.org/r/649635 (https://phabricator.wikimedia.org/T269919) (owner: 10Elukey) [08:23:14] !log acme-chief and acme-chief-api restarts for openssl upgrades (CVE-2020-1971) [08:23:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:55] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Release-Engineering-Team (CI & Testing services): replace doc1001.eqiad.wmnet with a buster VM - https://phabricator.wikimedia.org/T247653 (10hashar) [08:30:03] (03CR) 10Gergő Tisza: [C: 04-1] "Need to disable impact module." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649586 (https://phabricator.wikimedia.org/T266020) (owner: 10Gergő Tisza) [08:35:28] !log gehel@cumin1001 START - Cookbook sre.wdqs.data-transfer [08:35:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:32] (03PS1) 10Jcrespo: mariadb-backups: Setup x2 production backups [puppet] - 10https://gerrit.wikimedia.org/r/649820 (https://phabricator.wikimedia.org/T269324) [08:45:58] (03PS1) 10Lokal Profil: [WIP] Support setting to disable e-mail notification on bot edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649821 (https://phabricator.wikimedia.org/T262750) [08:49:30] (03CR) 10Marostegui: "I haven't got confirmation this is needed. I was going to ping you if this was needed in the end or not." [puppet] - 10https://gerrit.wikimedia.org/r/649820 (https://phabricator.wikimedia.org/T269324) (owner: 10Jcrespo) [08:53:22] (03CR) 10Jcrespo: "Cool, it was just a first approach, spending 30 seconds to see how we would go about it. Thanks, waiting for your check for further instru" [puppet] - 10https://gerrit.wikimedia.org/r/649820 (https://phabricator.wikimedia.org/T269324) (owner: 10Jcrespo) [08:54:14] (03CR) 10Marostegui: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/649820 (https://phabricator.wikimedia.org/T269324) (owner: 10Jcrespo) [08:55:26] (03CR) 10Jcrespo: "BTW, this informed that extra port is at port# + 20, so #30 doesn't collide with anything (there is not port 3310 used), but production an" [puppet] - 10https://gerrit.wikimedia.org/r/649820 (https://phabricator.wikimedia.org/T269324) (owner: 10Jcrespo) [09:12:55] (03CR) 10Gehel: [C: 04-1] "See minor comment about logging inline." (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/640571 (https://phabricator.wikimedia.org/T265526) (owner: 10Mstyles) [09:14:01] (03PS1) 10DCausse: [wdqs] add jmx_wdqs_streaming_updater prometheus job [puppet] - 10https://gerrit.wikimedia.org/r/649827 (https://phabricator.wikimedia.org/T266986) [09:15:23] 10Operations, 10serviceops: ifup@eno1.service failed on some buster hosts - https://phabricator.wikimedia.org/T270220 (10Joe) a:03jbond Per @elukey's comment, assigning to John. [09:23:44] (03PS1) 10Elukey: cumin: add kafka-test alias [puppet] - 10https://gerrit.wikimedia.org/r/649829 [09:25:09] ACKNOWLEDGEMENT - Check systemd state on mc1033 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Effie Mouzeli T270220 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:25:09] ACKNOWLEDGEMENT - Check systemd state on mc2033 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Effie Mouzeli T270220 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:25:09] ACKNOWLEDGEMENT - Check systemd state on mw1265 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Effie Mouzeli T270220 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:25:17] (03CR) 10Elukey: [C: 03+2] cumin: add kafka-test alias [puppet] - 10https://gerrit.wikimedia.org/r/649829 (owner: 10Elukey) [09:25:34] (03PS1) 10ArielGlenn: keep fewer cirrussearch dumps on the internal dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/649830 [09:27:18] (03CR) 10ArielGlenn: [C: 03+2] keep fewer cirrussearch dumps on the internal dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/649830 (owner: 10ArielGlenn) [09:28:27] RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:28:53] ta daaan [09:29:27] !log force execution of cumin-check-aliases.service on cumin[12]001 hosts to clear alarms [09:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:39] RECOVERY - Check systemd state on cumin2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:30:49] RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:32:49] <_joe_> !log reset-failed for docker report jobs on deneb, failed because of a registry gateway timeout [09:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:31] 10Operations, 10Data-Persistence-Backup, 10SRE-swift-storage, 10Epic, 10Goal: WMF media storage must be adequately backed up in a remote location - https://phabricator.wikimedia.org/T262668 (10Miriam) Hi @ottomata, thanks for the ping! Getting a copy of Commons (thumbnails only would be fine) which is di... [09:35:34] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: Reshard commonswiki_file elasticsearch index - https://phabricator.wikimedia.org/T260083 (10dcausse) 05Declined→03Open a:05RKemper→03None Re-opening as it's alerting again [09:39:38] 10Operations, 10Data-Persistence-Backup, 10SRE-swift-storage, 10Epic, 10Goal: WMF media storage must be adequately backed up in a remote location - https://phabricator.wikimedia.org/T262668 (10jcrespo) > thumbnails only would be fine Sadly the backups are focusing only on originals (for now). [09:42:19] 10Operations, 10Data-Persistence-Backup, 10SRE-swift-storage, 10Epic, 10Goal: WMF media storage must be adequately backed up in a remote location - https://phabricator.wikimedia.org/T262668 (10Miriam) @jcrespo originals would be great, too, I only thought of thumbnails because they generally require less... [09:43:05] PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:43:38] (03CR) 10JMeybohm: [C: 03+1] "> Patch Set 2:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649604 (owner: 10Alexandros Kosiaris) [09:43:41] ufffff [09:43:56] Alias rpki matched 0 hosts [09:44:14] ah there you go, it didn't finis the run when it recovered [09:45:26] (03PS1) 10Jbond: late_command: Fix cidr prefix [puppet] - 10https://gerrit.wikimedia.org/r/649831 (https://phabricator.wikimedia.org/T270220) [09:46:15] (03CR) 10Jbond: [C: 03+2] late_command: Fix cidr prefix [puppet] - 10https://gerrit.wikimedia.org/r/649831 (https://phabricator.wikimedia.org/T270220) (owner: 10Jbond) [09:46:57] PROBLEM - Check systemd state on cumin2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:49:33] elukey: need a hand? [09:49:55] !log swift eqiad-prod: add weight to ms-be106[0-3] - T268435 [09:49:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:59] T268435: Add ms-be106[0-3] to swift - https://phabricator.wikimedia.org/T268435 [09:51:24] volans: hello hello, yes I was checking why the rpki alias yields issues, indeed cumin A:rpki points to zero nodes, but I am not sure why [09:51:39] I can have a look [09:54:17] oh gosh [09:54:20] rpki1001 is a PKI Server (pki) [09:54:25] jbond42 ^^^ [09:54:39] node /pki[12]001\.(eqiad|codfw)\.wmnet/ is missing a ^ at the start [09:55:00] volans: no rpkie is a rpkie server pki is a pki server [09:55:04] yes [09:55:10] but site.pp [09:55:14] one is cfssl for issueing certss [09:55:20] oh i see [09:55:22] matches pki [09:55:23] to rpki [09:55:28] because of missing ^ [09:55:33] because of missing '^' [09:56:11] (03PS1) 10Jbond: site.pp: fix definitions for pki servers [puppet] - 10https://gerrit.wikimedia.org/r/649832 [09:56:13] ^^ [09:56:22] loool [09:56:25] how to cleanup the rpki servers after that? [09:56:27] they now have both [09:56:31] 20906 ? Ssl 0:03 /usr/bin/multirootca -a [09:56:38] luckily still running [09:56:38] 4251 ? Ssl 4636:04 /usr/bin/routinator - [09:56:43] because puppet doesn't remove stuff [09:56:46] ahh pain. ill take a look in a sec [09:56:47] (03CR) 10David Caro: [C: 03+1] "Waiting for Arturo, if we can unpin the better, but it looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/649471 (https://phabricator.wikimedia.org/T270128) (owner: 10Bstorm) [09:56:56] puppet safety ++ sigh [09:57:03] (03CR) 10Filippo Giunchedi: [C: 03+2] alertmanager: rename irc contact to sre-irc [puppet] - 10https://gerrit.wikimedia.org/r/649646 (https://phabricator.wikimedia.org/T267018) (owner: 10Filippo Giunchedi) [09:57:05] (03CR) 10Jbond: [C: 03+2] site.pp: fix definitions for pki servers [puppet] - 10https://gerrit.wikimedia.org/r/649832 (owner: 10Jbond) [09:57:11] (03CR) 10Volans: [C: 03+1] "LGTM, we need to manually cleanup or reimage the rpki servers though" [puppet] - 10https://gerrit.wikimedia.org/r/649832 (owner: 10Jbond) [09:57:13] (03CR) 10Filippo Giunchedi: [C: 03+2] alertmanager: add peering@ contact/team [puppet] - 10https://gerrit.wikimedia.org/r/649645 (https://phabricator.wikimedia.org/T267018) (owner: 10Filippo Giunchedi) [09:57:17] jbond42: let us know if you need help! [09:57:35] jbond42: merged your change as well [09:57:36] ack thanks but should be simple enough to clean up i think [09:57:43] godog: Pls [09:58:08] {{done}} [09:58:16] thx [09:58:35] jbond42: ok, can you take care of it? I don't know what to cleanup without checking all the manifests [09:58:45] volans: yes will do [09:59:00] thanks! [09:59:38] np [10:00:53] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [10:00:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:58] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [10:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:05:20] (03CR) 10Filippo Giunchedi: [C: 03+2] smokeping: force redirect to https [puppet] - 10https://gerrit.wikimedia.org/r/647654 (owner: 10Filippo Giunchedi) [10:05:25] 10Operations, 10ops-eqiad, 10Analytics-Clusters, 10DC-Ops: (Need By: TBD) rack/setup/install an-worker10[18-41] - https://phabricator.wikimedia.org/T260445 (10elukey) @Cmjohnson what do you think about the last proposal? [10:05:29] (03CR) 10Filippo Giunchedi: [C: 03+2] alertmanager: set karma poll interval to 10s [puppet] - 10https://gerrit.wikimedia.org/r/647655 (https://phabricator.wikimedia.org/T266017) (owner: 10Filippo Giunchedi) [10:05:32] !log gehel@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [10:05:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:06:30] PROBLEM - WDQS high update lag on wdqs1003 is CRITICAL: 5283 ge 3600 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [10:10:27] 10Operations, 10netops, 10observability, 10Patch-For-Review, 10User-fgiunchedi: LibreNMS sends its alerts to Alertmanager, resulting in email notifications to network operations - https://phabricator.wikimedia.org/T267018 (10fgiunchedi) 05Open→03Resolved This is complete: all alerts from librenms flo... [10:13:12] PROBLEM - Check systemd state on rpki1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:14:24] RECOVERY - Check systemd state on rpki1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:16:56] (03PS1) 10Jbond: late_command: use correct slash [puppet] - 10https://gerrit.wikimedia.org/r/649835 (https://phabricator.wikimedia.org/T270220) [10:16:58] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) [10:18:17] (03CR) 10Jbond: [C: 03+2] late_command: use correct slash [puppet] - 10https://gerrit.wikimedia.org/r/649835 (https://phabricator.wikimedia.org/T270220) (owner: 10Jbond) [10:18:18] jbond42: one qs - is the new eno1/64 dev something that will be gradually added along with new reimages? Trying to understand how things will change [10:19:05] elukey: no it was a bug the /64 should be on the address not the interface [10:19:14] (03PS1) 10Effie Mouzeli: Swap mc1019 with mc1031 for Redis lock manager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649836 (https://phabricator.wikimedia.org/T265643) [10:19:51] 10Operations, 10Continuous-Integration-Infrastructure: Have linters/tests results show up as comments in files on gerrit - https://phabricator.wikimedia.org/T209149 (10kostajh) @hashar @Tgr @awight I was sitting down to look at Quibble to see how we could add support for the Gerrit reporting / fix format that... [10:20:20] 10Operations, 10serviceops, 10Patch-For-Review: ifup@eno1.service failed on some buster hosts - https://phabricator.wikimedia.org/T270220 (10jbond) Noticed the following servers with this issue which i have manually fixed `... [10:20:24] jbond42: ahhh now I get the fix okok [10:20:32] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Access to prod mysql from stat1004 - https://phabricator.wikimedia.org/T270196 (10gmodena) Hey @Marostegui thanks for checking. We'll do batch inserts and throttle writes. [10:21:11] 10Operations, 10serviceops, 10Patch-For-Review: ifup@eno1.service failed on some buster hosts - https://phabricator.wikimedia.org/T270220 (10jbond) for the record the following line was introduced by error > up ip addr add 2620:0:861:107:10:64:48:155 dev eno1/64 it should be the same as what puppet adds i.... [10:21:33] (03PS2) 10Elukey: admin: remove access for user dstrine [puppet] - 10https://gerrit.wikimedia.org/r/645275 (https://phabricator.wikimedia.org/T268801) [10:21:36] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Access to prod mysql from stat1004 - https://phabricator.wikimedia.org/T270196 (10Marostegui) Excellent - thanks for confirming. Also if you can ping me here once you've done the first batch, that'd be helpful. I would like to check our graphs to see w... [10:22:34] 10Operations, 10SRE-tools, 10Patch-For-Review: spicerack/cookbook: add additional arguments IRC/SAL logging - https://phabricator.wikimedia.org/T221212 (10Volans) 05Open→03Resolved The new Spicerack class-based API for cookbooks allow now to customize the !log message from within the cookbook. The downti... [10:30:44] !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single [10:30:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:47] !log reboot rpki1001 [10:30:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:31:30] (03PS1) 10Marostegui: site.pp: Clarify the situation with db1154 and db1155 [puppet] - 10https://gerrit.wikimedia.org/r/649839 (https://phabricator.wikimedia.org/T268742) [10:32:11] (03CR) 10Marostegui: [C: 03+2] site.pp: Clarify the situation with db1154 and db1155 [puppet] - 10https://gerrit.wikimedia.org/r/649839 (https://phabricator.wikimedia.org/T268742) (owner: 10Marostegui) [10:33:21] (03PS1) 10JMeybohm: Bump debian/changelog for config file changes [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/649841 (https://phabricator.wikimedia.org/T244335) [10:34:06] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:34:30] (03PS2) 10JMeybohm: Bump debian/changelog for config file changes [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/649841 (https://phabricator.wikimedia.org/T244335) [10:34:39] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [10:34:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:01] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [10:35:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:23] (03CR) 10JMeybohm: [C: 03+2] Bump debian/changelog for config file changes [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/649841 (https://phabricator.wikimedia.org/T244335) (owner: 10JMeybohm) [10:35:32] !log jbond@cumin1001 START - Cookbook sre.hosts.reboot-single [10:35:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:38] !log reboot rpki2001 [10:35:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:52] ACKNOWLEDGEMENT - WDQS high update lag on wdqs1003 is CRITICAL: 4429 ge 3600 Gehel catching up on lag after data transfer https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [10:37:09] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE [10:37:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:54] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [10:37:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:57] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:41:09] (03PS1) 10JMeybohm: Fix typo (missing comma) in Build-Depends [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/649843 [10:41:34] (03CR) 10JMeybohm: [C: 03+2] Fix typo (missing comma) in Build-Depends [debs/kubernetes] (future) - 10https://gerrit.wikimedia.org/r/649843 (owner: 10JMeybohm) [10:43:55] volans: elukey: rpki has been fixed and cleaned up [10:44:13] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=routinator site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:44:48] jbond42: thanks! having a quick look [10:45:54] thx [10:45:57] 10Operations, 10serviceops, 10Patch-For-Review: ifup@eno1.service failed on some buster hosts - https://phabricator.wikimedia.org/T270220 (10jbond) I have applied a fix and tested this on sertest1001 and all looks good but will leave this task open for further confirmation [10:46:33] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [10:47:16] 10Operations, 10Continuous-Integration-Infrastructure: Have linters/tests results show up as comments in files on gerrit - https://phabricator.wikimedia.org/T209149 (10Tgr) We could also make a report which inherits from the default one and outputs the POST body into a file as a side effect. Note that GerritR... [10:49:14] (03PS1) 10Effie Mouzeli: hiera: reimage mc1022,mc2022 to buster [puppet] - 10https://gerrit.wikimedia.org/r/649844 (https://phabricator.wikimedia.org/T213089) [10:52:53] (03CR) 10Effie Mouzeli: [C: 03+2] hiera: reimage mc1022,mc2022 to buster [puppet] - 10https://gerrit.wikimedia.org/r/649844 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli) [10:53:36] (03CR) 10Volans: "post-merge question" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649829 (owner: 10Elukey) [10:55:50] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Swap mc1019 with mc1031 for Redis lock manager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649836 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [10:59:26] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc1022.eqiad.wmnet ` The log can be... [10:59:44] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc2022.codfw.wmnet ` The log can be... [11:01:58] (03CR) 10Elukey: cumin: add kafka-test alias (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649829 (owner: 10Elukey) [11:02:25] (03PS1) 10Filippo Giunchedi: pontoon: lock hiera output file [puppet] - 10https://gerrit.wikimedia.org/r/649847 [11:02:36] RECOVERY - Check systemd state on mc1033 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:02:45] elukey: I don't care personally, I know that mor.itz and jbond42 tend to use those generic aliases for security upgrades [11:02:52] RECOVERY - Check systemd state on mw1265 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:02:58] RECOVERY - Check systemd state on mc2033 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [11:03:02] but maybe not need as they go to the generic $DC aliases [11:03:22] ack ack [11:03:39] I added Moritz to the CR as well, so he can comment when he is back [11:04:37] (03CR) 10Alexandros Kosiaris: [C: 03+1] "As far as the chart goes, +1 from me. I even tested it locally and although it took a long time to be ready, it finally succeded." [deployment-charts] - 10https://gerrit.wikimedia.org/r/640571 (https://phabricator.wikimedia.org/T265526) (owner: 10Mstyles) [11:10:09] !log stopping and restarting dbstore1004 to mitigate (short term) T270112 [11:10:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:13] T270112: mariadb on dbstore hosts, and specifically dbstore1004, possible memory leaking - https://phabricator.wikimedia.org/T270112 [11:10:18] ^ elukey [11:10:42] ack [11:13:14] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1022.eqiad.wmnet with reason: REIMAGE [11:13:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:55] jouncebot: now [11:13:55] No deployments scheduled for the next 0 hour(s) and 46 minute(s) [11:14:32] (03CR) 10Effie Mouzeli: [C: 03+2] Swap mc1019 with mc1031 for Redis lock manager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649836 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [11:14:42] !log jbond@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE [11:14:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:10] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc2022.codfw.wmnet with reason: REIMAGE [11:15:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:16] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1022.eqiad.wmnet with reason: REIMAGE [11:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:24] (03Merged) 10jenkins-bot: Swap mc1019 with mc1031 for Redis lock manager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649836 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [11:17:25] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2022.codfw.wmnet with reason: REIMAGE [11:17:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:36] (03PS1) 10Marostegui: install_server: Do not reimage db214[234] [puppet] - 10https://gerrit.wikimedia.org/r/649852 (https://phabricator.wikimedia.org/T269324) [11:17:38] (03CR) 10Jbond: [C: 03+1] "LGTM i think the original qmail still uses ANY queries however i think most distributions have patched away from this and have if they had" [puppet] - 10https://gerrit.wikimedia.org/r/649674 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [11:18:31] (03CR) 10Jbond: [C: 03+1] "LGTM thx" [puppet] - 10https://gerrit.wikimedia.org/r/649679 (owner: 10Dzahn) [11:19:16] !log jiji@deploy1001 Synchronized wmf-config/ProductionServices.php: Swap mc1019 with mc1031 for Redis lock manager - T265643 (duration: 01m 17s) [11:19:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:20] !log jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki2001.codfw.wmnet with reason: REIMAGE [11:19:21] T265643: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643 [11:19:30] (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db214[234] [puppet] - 10https://gerrit.wikimedia.org/r/649852 (https://phabricator.wikimedia.org/T269324) (owner: 10Marostegui) [11:19:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:39] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/649707 (owner: 10Dzahn) [11:21:51] (03CR) 10Jbond: [C: 03+2] "LGTM wil merge" [puppet] - 10https://gerrit.wikimedia.org/r/649681 (owner: 10Dzahn) [11:22:19] (03CR) 10Jbond: [C: 03+2] "lgtm will merge" [puppet] - 10https://gerrit.wikimedia.org/r/649680 (owner: 10Dzahn) [11:22:40] (03CR) 10Jbond: [C: 03+2] "lgtm will merge" [puppet] - 10https://gerrit.wikimedia.org/r/649683 (owner: 10Dzahn) [11:23:03] (03CR) 10Jbond: [V: 03+2 C: 03+2] puppetmaster: remove absented puppet-wildcardsign [puppet] - 10https://gerrit.wikimedia.org/r/649683 (owner: 10Dzahn) [11:27:03] (03PS2) 10Jbond: puppetmaster: remove absented ganglia-gen and sshknowngen [puppet] - 10https://gerrit.wikimedia.org/r/649681 (owner: 10Dzahn) [11:27:45] (03PS2) 10Jbond: puppetmaster: remove absented puppet-wildcardsign [puppet] - 10https://gerrit.wikimedia.org/r/649683 (owner: 10Dzahn) [11:27:53] (03CR) 10Jbond: [C: 03+2] puppetmaster: remove absented ganglia-gen and sshknowngen [puppet] - 10https://gerrit.wikimedia.org/r/649681 (owner: 10Dzahn) [11:27:58] (03CR) 10Jbond: [V: 03+2 C: 03+2] puppetmaster: remove absented puppet-wildcardsign [puppet] - 10https://gerrit.wikimedia.org/r/649683 (owner: 10Dzahn) [11:32:20] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/649471 (https://phabricator.wikimedia.org/T270128) (owner: 10Bstorm) [11:32:57] PROBLEM - rpki grafana alert on alert1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: eqiad rsync status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [11:33:37] PROBLEM - Swift https backend on ms-fe1008 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift [11:34:43] RECOVERY - Swift https backend on ms-fe1008 is OK: HTTP OK: HTTP/1.1 200 OK - 391 bytes in 0.304 second response time https://wikitech.wikimedia.org/wiki/Swift [11:34:47] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2022.codfw.wmnet'] ` and were **ALL** successful. [11:35:03] (03CR) 10Volans: "Thanks for the migration! Few minor things inline" (039 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/649653 (https://phabricator.wikimedia.org/T269925) (owner: 10Elukey) [11:35:45] (03CR) 10Volans: [C: 03+1] "LGTM fwiw, I'm missing all the specific context for the need of masking/unmasking :)" [cookbooks] - 10https://gerrit.wikimedia.org/r/649655 (owner: 10Elukey) [11:42:22] (03PS1) 10KartikMistry: Add Wikidocumentaries campaign for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649854 (https://phabricator.wikimedia.org/T269875) [11:44:38] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc1022.eqiad.wmnet'] ` and were **ALL** successful. [11:45:21] (03CR) 10David Caro: [C: 03+2] toolforge: upgrade sssd patch version [puppet] - 10https://gerrit.wikimedia.org/r/649471 (https://phabricator.wikimedia.org/T270128) (owner: 10Bstorm) [11:46:15] PROBLEM - Swift https backend on ms-fe1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Swift [11:47:57] PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - swift_80: Servers ms-fe1005.eqiad.wmnet, ms-fe1007.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [11:48:59] RECOVERY - WDQS high update lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1176 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [11:49:19] RECOVERY - Swift https backend on ms-fe1007 is OK: HTTP OK: HTTP/1.1 200 OK - 391 bytes in 0.012 second response time https://wikitech.wikimedia.org/wiki/Swift [11:49:33] RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal [11:50:23] (03PS1) 10Marostegui: mariadb: Add db1154 as a temp sanitarium [puppet] - 10https://gerrit.wikimedia.org/r/649856 (https://phabricator.wikimedia.org/T268742) [11:51:44] (03CR) 10Marostegui: [C: 03+2] mariadb: Add db1154 as a temp sanitarium [puppet] - 10https://gerrit.wikimedia.org/r/649856 (https://phabricator.wikimedia.org/T268742) (owner: 10Marostegui) [11:52:42] !log Stop s1, s3, s5 and s8 on db1124 to copy it to db1154 (this will generate lag on wikireplicas) T268742 [11:52:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:46] T268742: Test upgrading sanitarium hosts to Buster + 10.4 - https://phabricator.wikimedia.org/T268742 [11:55:40] (03CR) 10Nikerabbit: "These no longer need to be optimized?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649610 (https://phabricator.wikimedia.org/T267776) (owner: 10Urbanecm) [11:56:53] (03CR) 10Urbanecm: "> Patch Set 2:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649610 (https://phabricator.wikimedia.org/T267776) (owner: 10Urbanecm) [12:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: I, the Bot under the Fountain, allow thee, The Deployer, to do European mid-day backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201216T1200). [12:00:04] cormacparle: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:17] * cormacparle__ waves [12:00:18] I can deploy today! [12:00:39] (03CR) 10Urbanecm: [C: 03+2] Remove license mapping for search for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649617 (https://phabricator.wikimedia.org/T257938) (owner: 10Cparle) [12:01:30] (03Merged) 10jenkins-bot: Remove license mapping for search for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649617 (https://phabricator.wikimedia.org/T257938) (owner: 10Cparle) [12:04:42] (03CR) 10Volans: "Much nicer, couple of suggestions inline." (035 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 (owner: 10Elukey) [12:05:23] cormacparle__: should be deployed to beta automatically soon :) [12:05:55] thank you! [12:06:06] 10Operations, 10Puppet, 10SRE-tools, 10User-jbond: PKI server don't reimage cleanly - https://phabricator.wikimedia.org/T270269 (10jbond) p:05Triage→03Medium [12:06:12] no problem [12:18:30] 10Operations, 10MW-on-K8s, 10serviceops, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Pipeline): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10akosiaris) >>! In T261369#6693707, @Krinkle wrote: > Some kind of `/deploy` repo seems needed, I thi... [12:18:50] (03PS1) 10Urbanecm: MetaContactPages: Remove licenseabuse contact page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649862 (https://phabricator.wikimedia.org/T269781) [12:18:52] (03CR) 10Urbanecm: [C: 03+2] MetaContactPages: Remove licenseabuse contact page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649862 (https://phabricator.wikimedia.org/T269781) (owner: 10Urbanecm) [12:19:41] (03Merged) 10jenkins-bot: MetaContactPages: Remove licenseabuse contact page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649862 (https://phabricator.wikimedia.org/T269781) (owner: 10Urbanecm) [12:20:38] !log imported kubernetes 1.16.15-2 into component/kubernetes-future stretch-wikimedia [12:20:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:20:56] !log hnowlan@puppetmaster1001 conftool action : set/weight=5; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw1265.eqiad.wmnet [12:20:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:22] !log hnowlan@puppetmaster1001 conftool action : set/weight=5; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1265.eqiad.wmnet [12:21:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:27] !log urbanecm@deploy1001 Synchronized wmf-config/MetaContactPages.php: 0c651a6adc2d07b4163fba47109a5070884e7f54: MetaContactPages: Remove licenseabuse contact page (T269781) (duration: 01m 03s) [12:21:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:30] T269781: WMF Trademark Abuse notification Meta-Wiki form errors due to no valid email (recipient account is globally locked) - https://phabricator.wikimedia.org/T269781 [12:26:54] (03PS1) 10Jbond: sre.puppet.renew-cert: add support for allow_alt_names [cookbooks] - 10https://gerrit.wikimedia.org/r/649863 [12:28:03] (03PS1) 10Andrew-WMDE: Add a job for TemplateData metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649864 (https://phabricator.wikimedia.org/T270246) [12:29:19] (03CR) 10jerkins-bot: [V: 04-1] sre.puppet.renew-cert: add support for allow_alt_names [cookbooks] - 10https://gerrit.wikimedia.org/r/649863 (owner: 10Jbond) [12:29:23] (03PS1) 10Jbond: puppet: update get_certificate_metadata so the pattern is more specific [software/spicerack] - 10https://gerrit.wikimedia.org/r/649865 [12:29:57] (03CR) 10Andrew-WMDE: [C: 04-1] "Wait for I493641324ff5d9121ad7583efdc5b4d7dc57ce17 to be merged" [puppet] - 10https://gerrit.wikimedia.org/r/649864 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [12:31:23] (03PS2) 10Jbond: sre.puppet.renew-cert: add support for allow_alt_names [cookbooks] - 10https://gerrit.wikimedia.org/r/649863 [12:35:00] RECOVERY - Ensure local MW versions match expected deployment on mw1265 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [12:38:00] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 258969624 and 18 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:40:40] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1671176 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:45:20] PROBLEM - Postgres Replication Lag on maps2010 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 1236724208 and 53 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:46:30] (03CR) 10Andrew-WMDE: Add a job for some visualeditor metrics aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649660 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [12:46:31] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1265.eqiad.wmnet [12:46:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:44] RECOVERY - Postgres Replication Lag on maps2010 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 65144 and 69 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:47:54] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1006.eqiad.wmnet [12:47:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:58] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1007.eqiad.wmnet [12:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:02] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1008.eqiad.wmnet [12:48:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:06] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1010.eqiad.wmnet [12:48:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:32] (03PS1) 10Jbond: phabricator: update RemoteIPInternalProxy with correct IP adrdess [puppet] - 10https://gerrit.wikimedia.org/r/649872 (https://phabricator.wikimedia.org/T270185) [13:02:09] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27151/console" [puppet] - 10https://gerrit.wikimedia.org/r/649872 (https://phabricator.wikimedia.org/T270185) (owner: 10Jbond) [13:02:51] (03PS2) 10Jbond: phabricator: update RemoteIPInternalProxy with correct IP adrdess [puppet] - 10https://gerrit.wikimedia.org/r/649872 (https://phabricator.wikimedia.org/T270185) [13:03:45] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27152/console" [puppet] - 10https://gerrit.wikimedia.org/r/649872 (https://phabricator.wikimedia.org/T270185) (owner: 10Jbond) [13:04:48] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2005.codfw.wmnet [13:04:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:03] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2006.codfw.wmnet [13:05:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:34] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2008.codfw.wmnet [13:05:38] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2009.codfw.wmnet [13:05:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:43] !log hnowlan@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=maps,service=kartotherian-ssl,name=maps2010.codfw.wmnet [13:05:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:16] (03PS1) 10ArielGlenn: add wikitch to mediawiki import sources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649874 (https://phabricator.wikimedia.org/T270284) [13:16:42] (03CR) 10Ssingh: [V: 03+2] "Thanks for the review!" [puppet] - 10https://gerrit.wikimedia.org/r/649674 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [13:16:53] (03CR) 10Ssingh: [V: 03+2 C: 03+2] dnsdist: respond to qtype=ANY queries with NOTIMP [puppet] - 10https://gerrit.wikimedia.org/r/649674 (https://phabricator.wikimedia.org/T252132) (owner: 10Ssingh) [13:22:36] RECOVERY - mediawiki-installation DSH group on mw1265 is OK: OK https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [13:33:14] (03CR) 10Alexandros Kosiaris: "+1, but these will creep back in eventually, unless e.g. something in CI stops it from happening" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649628 (owner: 10JMeybohm) [13:35:31] (03CR) 10Elukey: Port the Spicerack interactive module (032 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 (owner: 10Elukey) [13:40:45] 10Operations, 10netops, 10observability, 10User-fgiunchedi: LibreNMS sends its alerts to Alertmanager, resulting in email notifications to network operations - https://phabricator.wikimedia.org/T267018 (10CDanis) So I guess we need a separate task for paging and the check_librenms deprecation? [13:43:21] (03PS6) 10Elukey: Port the Spicerack interactive module [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 [13:44:17] (03CR) 10Awight: Add a job for some visualeditor metrics aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649660 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [13:47:06] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10hashar) [13:52:38] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:54:06] jouncebot: now [13:54:06] No deployments scheduled for the next 5 hour(s) and 5 minute(s) [13:54:12] (03PS1) 10Urbanecm: wawikisource: Add author NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649879 (https://phabricator.wikimedia.org/T269431) [13:54:14] (03PS1) 10Urbanecm: wawikisource: Translate NS_PROJECT [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649880 (https://phabricator.wikimedia.org/T269431) [13:54:16] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [13:54:24] (03CR) 10Urbanecm: [C: 03+2] wawikisource: Add author NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649879 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [13:55:22] (03Merged) 10jenkins-bot: wawikisource: Add author NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649879 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [13:56:07] (03CR) 10Urbanecm: [C: 03+2] wawikisource: Translate NS_PROJECT [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649880 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [13:56:18] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/649863 (owner: 10Jbond) [13:57:02] (03Merged) 10jenkins-bot: wawikisource: Translate NS_PROJECT [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649880 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [13:57:08] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 80dc9890c0349fd788a0c4c1aa335b307d3909e9: wawikisource: Add author NS (T269431) (duration: 01m 02s) [13:57:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:15] T269431: Create Wikisource Walloon - https://phabricator.wikimedia.org/T269431 [13:58:24] (03PS1) 10Urbanecm: wawikisource: Add English aliases for Author NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649881 [13:58:26] (03CR) 10Urbanecm: [C: 03+2] wawikisource: Add English aliases for Author NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649881 (owner: 10Urbanecm) [13:59:07] (03PS2) 10Urbanecm: wawikisource: Add English aliases for Author NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649881 (https://phabricator.wikimedia.org/T269431) [13:59:14] (03CR) 10Urbanecm: [C: 03+2] wawikisource: Add English aliases for Author NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649881 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [13:59:50] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: ebfd84b658413fa0af2e6a62ddcce14d95246f0a: wawikisource: Translate NS_PROJECT (T269431) (duration: 01m 01s) [13:59:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:07] (03Merged) 10jenkins-bot: wawikisource: Add English aliases for Author NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649881 (https://phabricator.wikimedia.org/T269431) (owner: 10Urbanecm) [14:01:44] (03CR) 10Volans: [C: 03+1] "Oh, good catch, thanks for the fix!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/649865 (owner: 10Jbond) [14:02:12] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 5c9c6c1a311635328efde9e65190035b151efd88: wawikisource: Add English aliases for Author NS (T269431) (duration: 01m 02s) [14:02:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:18] T269431: Create Wikisource Walloon - https://phabricator.wikimedia.org/T269431 [14:06:27] (03PS1) 10Jbond: P:phabricator: migrate banlist to abuse-networks [puppet] - 10https://gerrit.wikimedia.org/r/649882 (https://phabricator.wikimedia.org/T270285) [14:06:32] (03CR) 10Jbond: [C: 03+2] puppet: update get_certificate_metadata so the pattern is more specific [software/spicerack] - 10https://gerrit.wikimedia.org/r/649865 (owner: 10Jbond) [14:06:54] (03CR) 10Jbond: [C: 03+2] sre.puppet.renew-cert: add support for allow_alt_names [cookbooks] - 10https://gerrit.wikimedia.org/r/649863 (owner: 10Jbond) [14:07:05] (03CR) 10Alexandros Kosiaris: [C: 04-1] admin_ng Update/Fix PodSecurityPolicies (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) (owner: 10JMeybohm) [14:08:36] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10kostajh) ` lang=bash $ docker pull docker-registry.wikimedia.org/releng/node10-test-browser Using... [14:08:42] (03PS2) 10Jbond: P:phabricator: migrate banlist to abuse-networks [puppet] - 10https://gerrit.wikimedia.org/r/649882 (https://phabricator.wikimedia.org/T270285) [14:09:02] 10Operations, 10Technical-blog-posts, 10Traffic: 3rd part of blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T270074 (10ema) p:05Triage→03Medium [14:10:41] (03Merged) 10jenkins-bot: puppet: update get_certificate_metadata so the pattern is more specific [software/spicerack] - 10https://gerrit.wikimedia.org/r/649865 (owner: 10Jbond) [14:10:43] (03Merged) 10jenkins-bot: sre.puppet.renew-cert: add support for allow_alt_names [cookbooks] - 10https://gerrit.wikimedia.org/r/649863 (owner: 10Jbond) [14:11:15] 10Operations, 10Technical-blog-posts, 10Traffic: 3rd part of blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T270074 (10ema) >>! In T270074#6689578, @srodlund wrote: > @ema Awesome! Let me know when your first draft is ready. Looking forward to read... [14:12:33] (03CR) 10JMeybohm: admin_ng Update/Fix PodSecurityPolicies (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) (owner: 10JMeybohm) [14:12:37] (03CR) 10Volans: [C: 03+1] "LGTM!" (032 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 (owner: 10Elukey) [14:13:17] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10kostajh) Are we using harbor (was scanning T209271)? If so, perhaps this is the same issue as http... [14:17:31] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10kostajh) > I can not reproduce it consistently. @zeljkofilipin so eventually it worked for you?... [14:19:30] (03PS1) 10Jbond: network: add phabricator context for abuse nets [puppet] - 10https://gerrit.wikimedia.org/r/649886 [14:20:38] (03CR) 10Jbond: [V: 03+2 C: 03+2] network: add phabricator context for abuse nets [puppet] - 10https://gerrit.wikimedia.org/r/649886 (owner: 10Jbond) [14:21:22] (03PS3) 10Jbond: phabricator: update RemoteIPInternalProxy with correct IP adrdess [puppet] - 10https://gerrit.wikimedia.org/r/649872 (https://phabricator.wikimedia.org/T270185) [14:21:33] (03PS3) 10Jbond: P:phabricator: migrate banlist to abuse-networks [puppet] - 10https://gerrit.wikimedia.org/r/649882 (https://phabricator.wikimedia.org/T270285) [14:22:18] (03CR) 10Alexandros Kosiaris: [C: 04-1] admin_ng Update/Fix PodSecurityPolicies (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) (owner: 10JMeybohm) [14:23:55] (03PS4) 10Jbond: P:phabricator: migrate banlist to abuse-networks [puppet] - 10https://gerrit.wikimedia.org/r/649882 (https://phabricator.wikimedia.org/T270285) [14:26:51] (03PS5) 10Jbond: P:phabricator: migrate banlist to abuse-networks [puppet] - 10https://gerrit.wikimedia.org/r/649882 (https://phabricator.wikimedia.org/T270285) [14:27:23] (03CR) 10Ottomata: "> I am not super happy about analytics becoming a "part" of a production pipeline" [homer/public] - 10https://gerrit.wikimedia.org/r/649706 (https://phabricator.wikimedia.org/T270196) (owner: 10Elukey) [14:27:53] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27156/console" [puppet] - 10https://gerrit.wikimedia.org/r/649882 (https://phabricator.wikimedia.org/T270285) (owner: 10Jbond) [14:31:09] (03PS2) 10Elukey: sre.hadoop.change-distro-from-cdh: move to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/649653 (https://phabricator.wikimedia.org/T269925) [14:31:25] (03CR) 10JMeybohm: admin_ng Update/Fix PodSecurityPolicies (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) (owner: 10JMeybohm) [14:31:27] (03Abandoned) 10Elukey: sre.hadoop.change-distro-from-cdh: use systemctl mask where needed [cookbooks] - 10https://gerrit.wikimedia.org/r/649655 (owner: 10Elukey) [14:31:41] (03CR) 10Elukey: sre.hadoop.change-distro-from-cdh: move to class API (036 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/649653 (https://phabricator.wikimedia.org/T269925) (owner: 10Elukey) [14:33:15] (03CR) 10Alexandros Kosiaris: [V: 03+1 C: 03+2] "Thanks. PCC is happy, only staging in codfw will be altered, merging." [puppet] - 10https://gerrit.wikimedia.org/r/649601 (owner: 10Alexandros Kosiaris) [14:34:35] (03CR) 10Jbond: [V: 03+1] P:phabricator: migrate banlist to abuse-networks (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/649882 (https://phabricator.wikimedia.org/T270285) (owner: 10Jbond) [14:44:05] 10Operations, 10Research, 10Wikimedia-Mailing-lists: No admin response for many months for research-internal listserv - https://phabricator.wikimedia.org/T270213 (10Isaac) Thanks for the quick response @Dzahn ! I emailed the list to ask for nominations and @Ladsgroup graciously volunteered so we will be the... [14:45:39] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' . [14:45:39] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'production' . [14:45:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:08] (03PS1) 10CDanis: no-op: sort lines for dbctl instances [puppet] - 10https://gerrit.wikimedia.org/r/649889 [14:46:10] (03PS1) 10CDanis: conftool/dbctl: add x2 section & hosts for it [puppet] - 10https://gerrit.wikimedia.org/r/649890 (https://phabricator.wikimedia.org/T269324) [14:46:21] !log deploy all services to staging codfw cluster [14:46:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:24] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' . [14:46:24] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' . [14:46:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:05] (03CR) 10Marostegui: [C: 03+1] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/649889 (owner: 10CDanis) [14:54:01] (03CR) 10Marostegui: "is this all that is needed? + dbctl config commit? (I would like to document it so we don't have to bother you again in the future)" [puppet] - 10https://gerrit.wikimedia.org/r/649890 (https://phabricator.wikimedia.org/T269324) (owner: 10CDanis) [14:54:11] !log gehel@cumin1001 START - Cookbook sre.wdqs.data-transfer [14:54:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:13] (03CR) 10CDanis: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/649890 (https://phabricator.wikimedia.org/T269324) (owner: 10CDanis) [14:56:33] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' . [14:56:33] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' . [14:56:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:55] (03PS4) 10Giuseppe Lavagetto: php: make env variables play nice with blubber [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/643737 [14:59:32] (03CR) 10Marostegui: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/649890 (https://phabricator.wikimedia.org/T269324) (owner: 10CDanis) [14:59:37] (03CR) 10Marostegui: [C: 03+1] conftool/dbctl: add x2 section & hosts for it [puppet] - 10https://gerrit.wikimedia.org/r/649890 (https://phabricator.wikimedia.org/T269324) (owner: 10CDanis) [15:01:42] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) [15:04:45] (03PS1) 10Andrew Bogott: Cinder: allow query filtering by volume_id [puppet] - 10https://gerrit.wikimedia.org/r/649891 (https://phabricator.wikimedia.org/T269511) [15:05:37] (03CR) 10Andrew Bogott: [C: 03+2] Cinder: allow query filtering by volume_id [puppet] - 10https://gerrit.wikimedia.org/r/649891 (https://phabricator.wikimedia.org/T269511) (owner: 10Andrew Bogott) [15:05:39] (03CR) 10Nikerabbit: [C: 03+1] Add Wikidocumentaries campaign for ContentTranslation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649854 (https://phabricator.wikimedia.org/T269875) (owner: 10KartikMistry) [15:07:27] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'production' . [15:07:27] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' . [15:07:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:15] (03CR) 10Volans: [C: 03+1] "LGTM thanks for the fixes, the remaining comments are optional, up to you." [cookbooks] - 10https://gerrit.wikimedia.org/r/649653 (https://phabricator.wikimedia.org/T269925) (owner: 10Elukey) [15:09:19] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'canary' . [15:09:19] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' . [15:09:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:26] (03PS1) 10Bstorm: toolsdb: set the rsync script to be the whole data dir [puppet] - 10https://gerrit.wikimedia.org/r/649893 (https://phabricator.wikimedia.org/T266587) [15:10:14] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'production' . [15:10:14] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' . [15:10:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:45] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [15:10:45] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [15:10:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:20] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' . [15:11:20] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' . [15:11:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:17] (03CR) 10Elukey: [C: 03+2] sre.hadoop.change-distro-from-cdh: move to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/649653 (https://phabricator.wikimedia.org/T269925) (owner: 10Elukey) [15:12:22] (03PS3) 10Elukey: sre.hadoop.change-distro-from-cdh: move to class API [cookbooks] - 10https://gerrit.wikimedia.org/r/649653 (https://phabricator.wikimedia.org/T269925) [15:13:16] (03PS1) 10Razzi: kafka: add remaining nodes to kafka test cluster [puppet] - 10https://gerrit.wikimedia.org/r/649894 (https://phabricator.wikimedia.org/T268202) [15:13:18] (03PS1) 10Andrew Bogott: Cinder: allow even more query filtering by volume_id [puppet] - 10https://gerrit.wikimedia.org/r/649895 (https://phabricator.wikimedia.org/T269511) [15:14:11] (03CR) 10Andrew Bogott: [C: 03+2] Cinder: allow even more query filtering by volume_id [puppet] - 10https://gerrit.wikimedia.org/r/649895 (https://phabricator.wikimedia.org/T269511) (owner: 10Andrew Bogott) [15:15:04] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10zeljkofilipin) @kostajh it was failing consistently on my macos 11 machine, working fine on my mac... [15:18:41] (03CR) 10Nikerabbit: "> Good point, I totally forgot to look into that. Can you document at https://wikitech.wikimedia.org/wiki/Wikimedia_site_requests how to o" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649610 (https://phabricator.wikimedia.org/T267776) (owner: 10Urbanecm) [15:21:45] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . [15:21:45] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [15:21:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:17] (03PS1) 10Alexandros Kosiaris: staging-codfw: Add IP address range to cassandras [puppet] - 10https://gerrit.wikimedia.org/r/649897 [15:30:28] (03PS1) 10Elukey: sre.hadoop.change-distro-from-cdh: use systemctl mask/unmask where needed [cookbooks] - 10https://gerrit.wikimedia.org/r/649898 (https://phabricator.wikimedia.org/T269919) [15:30:41] (03CR) 10CDanis: [C: 03+2] conftool/dbctl: add x2 section & hosts for it [puppet] - 10https://gerrit.wikimedia.org/r/649890 (https://phabricator.wikimedia.org/T269324) (owner: 10CDanis) [15:30:47] (03CR) 10CDanis: [C: 03+2] no-op: sort lines for dbctl instances [puppet] - 10https://gerrit.wikimedia.org/r/649889 (owner: 10CDanis) [15:34:27] (03CR) 10Elukey: [C: 03+2] sre.hadoop.change-distro-from-cdh: use systemctl mask/unmask where needed [cookbooks] - 10https://gerrit.wikimedia.org/r/649898 (https://phabricator.wikimedia.org/T269919) (owner: 10Elukey) [15:36:26] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' . [15:36:27] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' . [15:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:58] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' . [15:36:58] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'plain' . [15:36:58] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' . [15:36:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:24] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'production' . [15:37:24] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' . [15:37:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:36] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10hnowlan) mw1265 is now reimaged and pooled with weight 5 (as opposed to its previous 25) [15:37:51] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [15:37:51] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' . [15:37:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:15] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' . [15:38:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:27] 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Patch-For-Review, and 2 others: Upgrade MediaWiki clusters to Debian Buster (debian 10) - https://phabricator.wikimedia.org/T245757 (10jijiki) [15:40:22] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' . [15:40:22] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' . [15:40:22] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' . [15:40:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:10] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' . [15:41:10] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' . [15:41:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:06] (03PS1) 10DCausse: Revert "[cirrus] setup perfield builder A/B test on spaceless languages" [extensions/WikimediaEvents] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649705 [15:42:26] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' . [15:42:26] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' . [15:42:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:28] (03CR) 10Awight: Add a job for CodeMirror metrics aggregation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649661 (https://phabricator.wikimedia.org/T267902) (owner: 10Awight) [15:42:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:42] (03PS1) 10DCausse: Revert "[cirrus] setup perfield builder A/B test on spaceless languages" [extensions/WikimediaEvents] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/649906 [15:42:55] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' . [15:42:55] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' . [15:42:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:44] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' . [15:43:44] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' . [15:43:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:01] (03CR) 10JMeybohm: [C: 03+2] "> Patch Set 1:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/649628 (owner: 10JMeybohm) [15:45:18] (03Merged) 10jenkins-bot: admin_ng: Fix indentation in various places [deployment-charts] - 10https://gerrit.wikimedia.org/r/649628 (owner: 10JMeybohm) [15:45:30] (03CR) 10Elukey: "> Patch Set 1:" [homer/public] - 10https://gerrit.wikimedia.org/r/649706 (https://phabricator.wikimedia.org/T270196) (owner: 10Elukey) [15:47:07] (03PS2) 10JMeybohm: admin_ng Update/Fix PodSecurityPolicies [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) [15:47:14] (03CR) 10jerkins-bot: [V: 04-1] admin_ng Update/Fix PodSecurityPolicies [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) (owner: 10JMeybohm) [15:47:18] (03PS1) 10Effie Mouzeli: Swap Redis Lock hosts with upgraded ones [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649901 (https://phabricator.wikimedia.org/T265643) [15:47:33] (03CR) 10JMeybohm: admin_ng Update/Fix PodSecurityPolicies (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) (owner: 10JMeybohm) [15:47:50] PROBLEM - Logstash Elasticsearch indexing errors #o11y on alert1001 is CRITICAL: 8.358 ge 8 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash [15:47:52] (03PS3) 10JMeybohm: admin_ng Update/Fix PodSecurityPolicies [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) [15:51:05] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Swap Redis Lock hosts with upgraded ones [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649901 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [15:51:42] (03CR) 10Elukey: [C: 03+2] Port the Spicerack interactive module [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649426 (owner: 10Elukey) [15:51:57] (03PS4) 10JMeybohm: admin_ng Update/Fix PodSecurityPolicies [deployment-charts] - 10https://gerrit.wikimedia.org/r/649629 (https://phabricator.wikimedia.org/T228967) [15:53:48] jouncebot: now [15:53:48] No deployments scheduled for the next 3 hour(s) and 6 minute(s) [15:53:55] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' . [15:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:17] (03CR) 10Ladsgroup: [C: 03+1] Swap Redis Lock hosts with upgraded ones [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649901 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [15:54:54] 10Operations, 10serviceops: ifup@eno1.service failed on some buster hosts - https://phabricator.wikimedia.org/T270220 (10RLazarus) Oh wow, I filed this and went to bed, love to wake up and see it fully handled. :) Thanks all! [15:55:35] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'production' . [15:55:36] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' . [15:55:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:36] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] php: make env variables play nice with blubber [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/643737 (owner: 10Giuseppe Lavagetto) [15:59:14] (03CR) 10Effie Mouzeli: [C: 03+2] Swap Redis Lock hosts with upgraded ones [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649901 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [16:00:12] (03Merged) 10jenkins-bot: Swap Redis Lock hosts with upgraded ones [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649901 (https://phabricator.wikimedia.org/T265643) (owner: 10Effie Mouzeli) [16:00:13] 10Operations, 10SRE-Access-Requests: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10thcipriani) >>! In T270125#6692335, @RLazarus wrote: > @thcipriani Can you please also comment, approving for the deployment group on behalf of releng? Approved! [16:01:47] (03CR) 10Razzi: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/645120 (owner: 10Elukey) [16:03:21] (03PS2) 10Elukey: icinga: add users razzi/Razzi and elukey to run commands [puppet] - 10https://gerrit.wikimedia.org/r/645120 [16:03:29] !log jiji@deploy1001 Synchronized wmf-config/ProductionServices.php: Swap Redis lock managers with upgraded ones - T265643 (duration: 01m 03s) [16:03:31] (03CR) 10Mforns: [C: 03+1] "LGTM!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649661 (https://phabricator.wikimedia.org/T267902) (owner: 10Awight) [16:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:33] T265643: Upgrade MediaWiki's Redis cluster to Debian Buster - https://phabricator.wikimedia.org/T265643 [16:06:15] (03CR) 10Elukey: [C: 03+2] icinga: add users razzi/Razzi and elukey to run commands [puppet] - 10https://gerrit.wikimedia.org/r/645120 (owner: 10Elukey) [16:06:56] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS: https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/27157/console" [puppet] - 10https://gerrit.wikimedia.org/r/649661 (https://phabricator.wikimedia.org/T267902) (owner: 10Awight) [16:07:12] (03CR) 10Elukey: [V: 03+1 C: 03+2] Add a job for CodeMirror metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649661 (https://phabricator.wikimedia.org/T267902) (owner: 10Awight) [16:08:18] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:08:42] (03PS1) 10Effie Mouzeli: hiera: upgrade mc1019, mc2019 to buster [puppet] - 10https://gerrit.wikimedia.org/r/649904 (https://phabricator.wikimedia.org/T213089) [16:09:36] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:10:56] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10jijiki) [16:13:05] (03CR) 10Gmodena: "> Patch Set 1:" [homer/public] - 10https://gerrit.wikimedia.org/r/649706 (https://phabricator.wikimedia.org/T270196) (owner: 10Elukey) [16:18:22] (03CR) 10RLazarus: [C: 03+1] hiera: upgrade mc1019, mc2019 to buster [puppet] - 10https://gerrit.wikimedia.org/r/649904 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli) [16:20:30] (03CR) 10Mforns: "Looking good to me, but as Andrew-WMDE mentions, let's wait for queries to be deployed." [puppet] - 10https://gerrit.wikimedia.org/r/649864 (https://phabricator.wikimedia.org/T270246) (owner: 10Andrew-WMDE) [16:21:32] (03CR) 10Effie Mouzeli: [C: 03+2] hiera: upgrade mc1019, mc2019 to buster [puppet] - 10https://gerrit.wikimedia.org/r/649904 (https://phabricator.wikimedia.org/T213089) (owner: 10Effie Mouzeli) [16:21:40] (03CR) 10Mforns: "LGTM! Let's wait until queries are deployed, then we can merge!" [puppet] - 10https://gerrit.wikimedia.org/r/649660 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [16:22:05] (03CR) 10Alexandros Kosiaris: [C: 03+2] staging-codfw: Add IP address range to cassandras [puppet] - 10https://gerrit.wikimedia.org/r/649897 (owner: 10Alexandros Kosiaris) [16:22:32] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc1019.eqiad.wmnet ` The log can be... [16:22:44] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jiji on cumin1001.eqiad.wmnet for hosts: ` mc2019.codfw.wmnet ` The log can be... [16:23:31] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to releasers-wikibase for toan - https://phabricator.wikimedia.org/T269777 (10KFrancis) @jbond I am confirming the completed NDA! Please feel free to proceed! Thanks! [16:23:47] !log gehel@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [16:23:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:24:57] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10KFrancis) @jbond I am confirming the completed NDA. Please feel free to move forward on this request. Thanks! [16:27:36] PROBLEM - WDQS high update lag on wdqs1003 is CRITICAL: 5295 ge 3600 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [16:27:45] (03CR) 10DannyS712: [C: 03+1] add wikitch to mediawiki import sources [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649874 (https://phabricator.wikimedia.org/T270284) (owner: 10ArielGlenn) [16:29:36] 10Operations, 10LDAP-Access-Requests: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10KFrancis) @RLazarus In case you didn't see... please proceed with this request. I have the completed NDA. Thanks! [16:32:14] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'production' . [16:32:14] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' . [16:32:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:52] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'production' . [16:32:52] !log akosiaris@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'echostore' for release 'staging' . [16:32:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:36:18] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc1019.eqiad.wmnet with reason: REIMAGE [16:36:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:20] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1019.eqiad.wmnet with reason: REIMAGE [16:38:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:39] (03PS1) 10RLazarus: admin: Add masssly to ldap_only_users. [puppet] - 10https://gerrit.wikimedia.org/r/649929 (https://phabricator.wikimedia.org/T269843) [16:38:43] !log jiji@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE [16:38:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:46] !log jiji@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE [16:40:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:02] (03PS1) 10PipelineBot: blubberoid: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/649931 [16:47:48] (03PS3) 10Jdlrobson: wgMinervaCountErrors config was removed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649434 (https://phabricator.wikimedia.org/T266359) [16:49:33] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc1019.eqiad.wmnet'] ` and were **ALL** successful. [16:56:33] (03PS4) 10RLazarus: admin: add toan user and add to wikibase-releasers group [puppet] - 10https://gerrit.wikimedia.org/r/647662 (https://phabricator.wikimedia.org/T269777) (owner: 10Jbond) [16:57:35] (03PS1) 10Jbond: (WIP) first pass at qurying downtime from icinga [puppet] - 10https://gerrit.wikimedia.org/r/649933 (https://phabricator.wikimedia.org/T268211) [16:58:25] (03CR) 10RLazarus: [C: 03+2] "Merging this now that the NDA is sorted." [puppet] - 10https://gerrit.wikimedia.org/r/647662 (https://phabricator.wikimedia.org/T269777) (owner: 10Jbond) [16:59:06] (03CR) 10jerkins-bot: [V: 04-1] (WIP) first pass at qurying downtime from icinga [puppet] - 10https://gerrit.wikimedia.org/r/649933 (https://phabricator.wikimedia.org/T268211) (owner: 10Jbond) [17:00:09] (03CR) 10Bstorm: [C: 03+2] toolsdb: set the rsync script to be the whole data dir [puppet] - 10https://gerrit.wikimedia.org/r/649893 (https://phabricator.wikimedia.org/T266587) (owner: 10Bstorm) [17:00:29] 10Operations, 10Platform Engineering, 10Wikidata, 10serviceops, and 4 others: Upgrade memcached cluster to Debian Buster - https://phabricator.wikimedia.org/T213089 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mc2019.codfw.wmnet'] ` and were **ALL** successful. [17:00:50] (03CR) 10Jbond: "@volans early input would be useful" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/649933 (https://phabricator.wikimedia.org/T268211) (owner: 10Jbond) [17:03:41] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to releasers-wikibase for toan - https://phabricator.wikimedia.org/T269777 (10RLazarus) 05Open→03Resolved a:03RLazarus @KFrancis Thank you! @toan Your releasers-wikibase access should be taken care of (may take up to 30 min to... [17:04:00] 10Operations, 10SRE-Access-Requests: Requesting access to Analytics Data for toan - https://phabricator.wikimedia.org/T269678 (10RLazarus) [17:11:43] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:12:37] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:14:05] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:14:48] 10Operations, 10SRE-Access-Requests: Requesting access to Analytics Data for toan - https://phabricator.wikimedia.org/T269678 (10RLazarus) Already in the nda group: ` rzl@mwmaint1002:~$ ldapsearch -x cn=nda | grep toan member: uid=toan,ou=people,dc=wikimedia,dc=org ` Kerberos principal created: ` rzl@krb100... [17:15:32] 10Operations, 10serviceops, 10User-jijiki: Upgrade memcached to version 1.6.x - https://phabricator.wikimedia.org/T270315 (10jijiki) [17:15:44] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10JMeybohm) >>! In T270270#6695364, @kostajh wrote: > Are we using harbor (was scanning T209271)? If... [17:15:51] 10Operations, 10serviceops, 10User-jijiki: Upgrade memcached to version 1.6.x - https://phabricator.wikimedia.org/T270315 (10jijiki) [17:15:53] 10Operations, 10serviceops, 10Patch-For-Review: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki) [17:17:47] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:17:49] 10Operations, 10serviceops, 10Patch-For-Review: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki) [17:19:31] (03PS1) 10RLazarus: admin: Add toan to analytics-{wmde,privatedata}-users; add krb principal [puppet] - 10https://gerrit.wikimedia.org/r/649938 (https://phabricator.wikimedia.org/T269678) [17:19:33] (03CR) 10Ottomata: [C: 03+1] "Indeed, and we don't have any other great way to support this use case right now. Not blocking this, just expressing concern!" [homer/public] - 10https://gerrit.wikimedia.org/r/649706 (https://phabricator.wikimedia.org/T270196) (owner: 10Elukey) [17:29:36] 10Operations, 10serviceops: Test and deploy mcrouter 0.41 - https://phabricator.wikimedia.org/T244476 (10jijiki) 05Open→03Declined [17:29:38] 10Operations, 10serviceops, 10Patch-For-Review: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki) [17:29:41] 10Operations, 10serviceops: Test and deploy mcrouter 0.41 - https://phabricator.wikimedia.org/T244476 (10jijiki) For reasons explained in T251574#6148741, going with 0.41 is our only option, closing this task [17:30:39] 10Operations, 10serviceops: Recurrent TX bw saturation for mediawiki memcached shards - https://phabricator.wikimedia.org/T258679 (10jijiki) 05Open→03Resolved a:03jijiki I am closing this since there are not immediate actionables :) [17:30:42] (03PS2) 10RLazarus: admin: Add mosa to ldap_only_users. [puppet] - 10https://gerrit.wikimedia.org/r/649929 (https://phabricator.wikimedia.org/T269843) [17:30:44] 10Operations, 10serviceops, 10Patch-For-Review: Upgrade and improve our application object caching service (memcached) - https://phabricator.wikimedia.org/T244852 (10jijiki) [17:31:16] (03CR) 10Thcipriani: [C: 03+1] Phabricator: New IP banning format, remove lines in old non-working format (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649753 (https://phabricator.wikimedia.org/T270185) (owner: 10Wolfgang Kandek) [17:31:18] (03CR) 10RLazarus: "Thanks Sukhbir for pointing out I had the wrong username!" [puppet] - 10https://gerrit.wikimedia.org/r/649929 (https://phabricator.wikimedia.org/T269843) (owner: 10RLazarus) [17:32:27] (03CR) 10Ssingh: [C: 03+1] "+1, uid matches, NDA completed in T269777." [puppet] - 10https://gerrit.wikimedia.org/r/649938 (https://phabricator.wikimedia.org/T269678) (owner: 10RLazarus) [17:33:41] (03CR) 10Ssingh: [C: 03+2] admin: Add mosa to ldap_only_users. [puppet] - 10https://gerrit.wikimedia.org/r/649929 (https://phabricator.wikimedia.org/T269843) (owner: 10RLazarus) [17:34:43] (03CR) 10RLazarus: [C: 03+2] admin: Add toan to analytics-{wmde,privatedata}-users; add krb principal [puppet] - 10https://gerrit.wikimedia.org/r/649938 (https://phabricator.wikimedia.org/T269678) (owner: 10RLazarus) [17:34:59] (03PS3) 10RLazarus: admin: Add mosa to ldap_only_users. [puppet] - 10https://gerrit.wikimedia.org/r/649929 (https://phabricator.wikimedia.org/T269843) [17:37:42] (03PS1) 10Ejegg: Disable CentralNotice on API portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649942 (https://phabricator.wikimedia.org/T270308) [17:38:06] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10JMeybohm) Could you please check if you see any additional errors/hints in the docker daemon logs?... [17:38:39] (03CR) 10Jforrester: [C: 03+1] Disable CentralNotice on API portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649942 (https://phabricator.wikimedia.org/T270308) (owner: 10Ejegg) [17:38:49] (03CR) 10jerkins-bot: [V: 04-1] Disable CentralNotice on API portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649942 (https://phabricator.wikimedia.org/T270308) (owner: 10Ejegg) [17:40:09] (03PS2) 10Ejegg: Disable CentralNotice on API portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649942 (https://phabricator.wikimedia.org/T270308) [17:41:01] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: NDA for Superset Request from WMDE Employee - Mohammed Sadat - https://phabricator.wikimedia.org/T269843 (10RLazarus) 05Open→03Resolved a:03RLazarus @KFrancis Thanks! @Mohammed_Sadat_WMDE You've been added to the nda LDAP group, and should now... [17:41:41] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Analytics Data for toan - https://phabricator.wikimedia.org/T269678 (10RLazarus) 05Open→03Resolved a:03RLazarus @toan You should be all set! The access group changes might take up to 30 min to roll out everywhere. Let me know... [17:41:44] 10Operations, 10WVUI: Import npm 6.14.8 to buster dist. on apt.wikimedia.org - https://phabricator.wikimedia.org/T270321 (10nnikkhoui) [17:51:30] 10Operations: launch Klaxon: manual paging app for trusted users to escalate urgent issues to SRE - https://phabricator.wikimedia.org/T270324 (10CDanis) [17:52:54] !log uploading python-thumbor-wikimedia_2.9-1 to stretch-wikimedia/component/thumbor [17:52:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:55:28] 10Operations, 10observability: API key production 'wikimedia' VictorOps environment - https://phabricator.wikimedia.org/T270325 (10CDanis) [17:56:21] (03CR) 10Jforrester: [C: 03+1] Disable CentralNotice on API portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649942 (https://phabricator.wikimedia.org/T270308) (owner: 10Ejegg) [17:59:23] 10Operations, 10observability: API key for the production 'wikimedia' VictorOps environment - https://phabricator.wikimedia.org/T270325 (10CDanis) [18:01:41] PROBLEM - WDQS high update lag on wdqs1003 is CRITICAL: 3614 ge 3600 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [18:09:18] 10Operations, 10Research, 10Wikimedia-Mailing-lists: No admin response for many months for research-internal listserv - https://phabricator.wikimedia.org/T270213 (10Dzahn) @Isaac @Ladsgroup I replaced the admin addresses and then ran a shell command to reset the password to something random and mail it to th... [18:22:23] 10Operations, 10Privacy Engineering, 10WMF-Legal, 10Privacy: Consider moving policy.wikimedia.org away from WordPress.com - https://phabricator.wikimedia.org/T132104 (10Dzahn) This ticket from 2016 is still stalled and it seems the reason is nobody really feels responsible for it. Looking at the tags we ha... [18:24:03] RECOVERY - WDQS high update lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1160 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [18:26:16] 10Operations, 10Privacy Engineering, 10WMF-Legal, 10Privacy: Consider moving policy.wikimedia.org away from WordPress.com - https://phabricator.wikimedia.org/T132104 (10Dzahn) also T132103#6610462 where Legal says that it is consistent with a "non-wiki privacy policy". Personally I still think it's kind o... [18:29:14] 10Operations, 10Research, 10Wikimedia-Mailing-lists: No admin response for many months for research-internal listserv - https://phabricator.wikimedia.org/T270213 (10Isaac) 05Open→03Resolved a:03Isaac Received -- thanks @Dzahn ! [18:31:20] 10Operations, 10SRE-Access-Requests: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10RLazarus) [18:32:14] 10Operations, 10Data-Persistence-Backup, 10SRE-swift-storage, 10Goal, 10Patch-For-Review: Prepare a proof of concept of the minimum setup capable of backup and recover testwiki media files - https://phabricator.wikimedia.org/T264189 (10jcrespo) I've done a simulation of a backup of enwiki images to test... [18:32:49] (03PS10) 10Jcrespo: [WIP] We continue with swift listing and download tests for media backups [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/643980 [18:33:20] (03CR) 10jerkins-bot: [V: 04-1] [WIP] We continue with swift listing and download tests for media backups [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/643980 (owner: 10Jcrespo) [18:38:34] (03PS11) 10Jcrespo: [WIP] We continue with swift listing and download tests for media backups [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/643980 [18:46:49] RECOVERY - rpki grafana alert on alert1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts https://grafana.wikimedia.org/d/UwUa77GZk/ [18:48:42] 10Operations, 10Privacy Engineering, 10WMF-Legal, 10Privacy: Consider moving policy.wikimedia.org away from WordPress.com - https://phabricator.wikimedia.org/T132104 (10JFishback_WMF) Hey @Dzahn: >>! "Privacy" (not a team but topic based?), Yes, just means a Privacy issue is probably implicated. >>! "Priv... [18:50:25] (03PS2) 10Tchanders: extension-list: Add IPInfo extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644548 (https://phabricator.wikimedia.org/T260599) [18:50:27] (03PS2) 10Tchanders: Add IPInfo config to InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644549 (https://phabricator.wikimedia.org/T260599) [18:50:28] (03PS3) 10Tchanders: Add IPInfo extension config to InitialiseSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644550 (https://phabricator.wikimedia.org/T260599) [18:50:31] (03PS3) 10Tchanders: Load IPInfo extension in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644551 (https://phabricator.wikimedia.org/T260599) [18:51:14] (03CR) 10jerkins-bot: [V: 04-1] Add IPInfo extension config to InitialiseSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644550 (https://phabricator.wikimedia.org/T260599) (owner: 10Tchanders) [18:51:27] (03CR) 10jerkins-bot: [V: 04-1] Load IPInfo extension in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644551 (https://phabricator.wikimedia.org/T260599) (owner: 10Tchanders) [18:53:12] 10Operations, 10Privacy Engineering, 10WMF-Legal, 10Privacy: Consider moving policy.wikimedia.org away from WordPress.com - https://phabricator.wikimedia.org/T132104 (10Dzahn) Thanks for the feedback @JFishback_WMF , cheers! Yes, we are on the same page here, this needs to be decided by management talking... [18:55:28] (03PS3) 10Awight: Add a job for some visualeditor metrics aggregation [puppet] - 10https://gerrit.wikimedia.org/r/649660 (https://phabricator.wikimedia.org/T262209) [18:55:45] (03PS4) 10Tchanders: Add IPInfo extension config to InitialiseSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644550 (https://phabricator.wikimedia.org/T260599) [18:55:47] (03PS4) 10Tchanders: Load IPInfo extension in CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/644551 (https://phabricator.wikimedia.org/T260599) [18:56:09] (03CR) 10Awight: [C: 03+1] "The job is merged now :-)" [puppet] - 10https://gerrit.wikimedia.org/r/649660 (https://phabricator.wikimedia.org/T262209) (owner: 10Awight) [18:57:29] 10Operations, 10SRE-Access-Requests: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10aezell) Approved! [18:57:33] 10Operations, 10observability: API key for the production 'wikimedia' VictorOps environment - https://phabricator.wikimedia.org/T270325 (10Volans) For now anything works I guess, but in the long run I can see more automation will be integrated with VO APIs and IMHO it would be nice to have a system user that w... [19:00:04] marxarelli and longma: That opportune time is upon us again. Time for a Train log triage with CPT deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201216T1900). [19:00:04] RoanKattouw, Niharika, and Urbanecm: That opportune time is upon us again. Time for a Morning backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201216T1900). [19:00:04] dcausse: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:01:06] o/ [19:01:12] I can deploy [19:02:11] (03CR) 10DCausse: [C: 03+2] Revert "[cirrus] setup perfield builder A/B test on spaceless languages" [extensions/WikimediaEvents] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/649906 (owner: 10DCausse) [19:02:18] o/ and here :) but for some reason not in the list (?!) [19:02:30] (03PS1) 10Ahmon Dancy: Fix link to pywmflib Gerrit project [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649950 [19:03:08] dcausse: i was in the wrong window.. https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1891743&oldid=1891732 [19:03:11] Jdlrobson: I don't see your patch? [19:03:29] (03CR) 10Volans: [C: 03+1] "LGTM, optional comment inline. Thanks for the addition" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649421 (owner: 10RLazarus) [19:03:51] this one: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/649434/ [19:04:21] Jdlrobson: ok shipping [19:04:45] (03CR) 10DCausse: [C: 03+2] wgMinervaCountErrors config was removed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649434 (https://phabricator.wikimedia.org/T266359) (owner: 10Jdlrobson) [19:05:39] (03Merged) 10jenkins-bot: wgMinervaCountErrors config was removed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649434 (https://phabricator.wikimedia.org/T266359) (owner: 10Jdlrobson) [19:06:07] thx dcausse [19:06:11] just doing some holiday cleanup [19:07:32] (03CR) 10RLazarus: [C: 03+2] "Thanks!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/649421 (owner: 10RLazarus) [19:07:43] (03Merged) 10jenkins-bot: Revert "[cirrus] setup perfield builder A/B test on spaceless languages" [extensions/WikimediaEvents] (wmf/1.36.0-wmf.21) - 10https://gerrit.wikimedia.org/r/649906 (owner: 10DCausse) [19:09:14] (03PS3) 10Bstorm: partman: build a recipe to re-image nfs servers [puppet] - 10https://gerrit.wikimedia.org/r/647815 (https://phabricator.wikimedia.org/T266199) [19:09:30] !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T266359: wgMinervaCountErrors config was removed (duration: 01m 03s) [19:09:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:34] T266359: Stop counting errors in Minerva - https://phabricator.wikimedia.org/T266359 [19:09:50] Jdlrobson: done [19:11:02] 10Operations, 10observability: API key for the production 'wikimedia' VictorOps environment - https://phabricator.wikimedia.org/T270325 (10herron) 05Open→03Resolved An API key for klaxxon (discussed via IRC, that's what this is going to be used for, see linked task as well) has been created and added to th... [19:11:04] 10Operations: launch Klaxon: manual paging app for trusted users to escalate urgent issues to SRE - https://phabricator.wikimedia.org/T270324 (10herron) [19:12:41] 10Operations, 10Domains, 10Okapi, 10Traffic: Okapi Domains - https://phabricator.wikimedia.org/T269686 (10Protsack.stephan) Yep, you are totally right no MediaWiki wiks underneath. Those will point to different IP addresses. We'll just allocate those and provide them to you. [19:12:57] (03CR) 10DCausse: [C: 03+2] Revert "[cirrus] setup perfield builder A/B test on spaceless languages" [extensions/WikimediaEvents] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649705 (owner: 10DCausse) [19:15:00] 10Operations, 10observability: API key for the production 'wikimedia' VictorOps environment - https://phabricator.wikimedia.org/T270325 (10CDanis) Great, thanks! I'll put this in puppet-private once I have the other puppetization ready :) [19:17:41] (03Merged) 10jenkins-bot: Revert "[cirrus] setup perfield builder A/B test on spaceless languages" [extensions/WikimediaEvents] (wmf/1.36.0-wmf.22) - 10https://gerrit.wikimedia.org/r/649705 (owner: 10DCausse) [19:18:12] !log dcausse@deploy1001 Synchronized php-1.36.0-wmf.21/extensions/WikimediaEvents/: T266027: Revert [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 03s) [19:18:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:18:15] T266027: Test perfield_builder on spaceless languages - https://phabricator.wikimedia.org/T266027 [19:19:07] (03CR) 10Volans: "Thanks for the patch, good catch. One comment inline." (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649950 (owner: 10Ahmon Dancy) [19:20:53] (03CR) 10Ahmon Dancy: Fix link to pywmflib Gerrit project (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649950 (owner: 10Ahmon Dancy) [19:21:33] (03PS2) 10Ahmon Dancy: Fix link to pywmflib Gerrit project [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649950 [19:21:33] !log dcausse@deploy1001 Synchronized php-1.36.0-wmf.22/extensions/WikimediaEvents/: T266027: Revert [cirrus] setup perfield builder A/B test on spaceless languages (duration: 01m 03s) [19:21:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:06] (03CR) 10Ahmon Dancy: Fix link to pywmflib Gerrit project (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649950 (owner: 10Ahmon Dancy) [19:23:27] !log Morning backport window deploy done [19:23:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:25:49] (03PS1) 10Legoktm: admin: Add legoktm to ops, use new Yubikey SSH key [puppet] - 10https://gerrit.wikimedia.org/r/649951 [19:30:00] (03CR) 10RLazarus: [C: 03+2] "Verified it's the Real Kunal, or at least the same person who's been turning up to meetings, with a quick video chat." [puppet] - 10https://gerrit.wikimedia.org/r/649951 (owner: 10Legoktm) [19:36:41] (03CR) 10Volans: [C: 03+2] "LGTM, thanks for the patch." [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649950 (owner: 10Ahmon Dancy) [19:38:01] (03Merged) 10jenkins-bot: Fix link to pywmflib Gerrit project [software/pywmflib] - 10https://gerrit.wikimedia.org/r/649950 (owner: 10Ahmon Dancy) [19:39:01] 10Operations, 10fundraising-tech-ops, 10netops, 10Patch-For-Review: Manage frack switches with Netbox - https://phabricator.wikimedia.org/T268802 (10Dwisehaupt) @jbond Thanks. I pulled the patch in and it runs great in the VM. ` (vb)dallas@frpm1001:~$ sudo facter -p lldp { enp0s3 => { neighbors => [... [19:45:16] 10Operations, 10Technical-blog-posts, 10Traffic: 3rd part of blog post series: the evolution of Wikimedia's Content Delivery Network - https://phabricator.wikimedia.org/T270074 (10srodlund) I looked at the doc and was able to copy edit it! If you are able to go through and accept changes in the next day, I'm... [19:54:33] 10Operations, 10DNS, 10Okapi, 10Traffic: Okapi Domains - https://phabricator.wikimedia.org/T269686 (10Reedy) Can I ask how you're going to do HTTPS on them? HSTS is enabled for wikimedia.org, and for subdomains, so HTTPS will be required. ` $ curl -I https://www.wikimedia.org | grep -i Strict strict-tran... [19:57:59] (03PS1) 10Legoktm: icinga: Add legoktm to various permissions [puppet] - 10https://gerrit.wikimedia.org/r/649954 [20:00:04] marxarelli and longma: Time to snap out of that daydream and deploy Mediawiki train - American Version. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201216T2000). [20:00:39] (03CR) 10RLazarus: [C: 03+1] icinga: Add legoktm to various permissions [puppet] - 10https://gerrit.wikimedia.org/r/649954 (owner: 10Legoktm) [20:01:55] longma: o/ [20:02:03] rolling momentarily [20:02:04] o/ [20:04:20] (03PS1) 10Dduvall: group1 wikis to 1.36.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649955 [20:04:37] (03CR) 10Dduvall: [C: 03+2] group1 wikis to 1.36.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649955 (owner: 10Dduvall) [20:04:53] (03Merged) 10jenkins-bot: group1 wikis to 1.36.0-wmf.22 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649955 (owner: 10Dduvall) [20:05:05] yolo-lo, merry christmas [20:05:29] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.22 [20:05:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:46] !log added myself to the ops LDAP group [20:05:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:30] !log dduvall@deploy1001 Synchronized php: group1 wikis to 1.36.0-wmf.22 (duration: 01m 01s) [20:06:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:42] (03CR) 10Legoktm: [C: 03+2] icinga: Add legoktm to various permissions [puppet] - 10https://gerrit.wikimedia.org/r/649954 (owner: 10Legoktm) [20:08:50] so far, so good [20:10:00] ya [20:11:27] there was a solitary "Exception from line 29 of /srv/mediawiki/php-1.36.0-wmf.22/extensions/MassMessage/includes/Content/MassMessageListDiffEngine.php: Cannot diff content types other than MassMessageListContent" right as apaches started syncing, but that appears to be known and i'm guessing coincidental that it occurred during the sync [20:11:49] nothing else [20:12:07] !log group1 to 1.36.0-wmf.22 complete. no new errors or concerning rates (refs T267415) [20:12:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:11] T267415: 1.36.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T267415 [20:12:18] T269403 might be that bug marxarelli [20:12:19] T269403: ParameterTypeException when `diff`ing invalid MassMessageListContent - https://phabricator.wikimedia.org/T269403 [20:12:43] ah, ok. i also found T265524 [20:12:43] T265524: Cannot diff content types other than MassMessageListContent - https://phabricator.wikimedia.org/T265524 [20:12:54] possible dupe [20:13:23] just similar - one if a diff to another content type, the other is diffing broken content [20:14:04] got it [20:27:43] (03PS1) 10Cwhite: profile: add normalize_level filter script [puppet] - 10https://gerrit.wikimedia.org/r/649956 (https://phabricator.wikimedia.org/T234565) [20:29:34] (03CR) 10jerkins-bot: [V: 04-1] profile: add normalize_level filter script [puppet] - 10https://gerrit.wikimedia.org/r/649956 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [20:30:11] PROBLEM - Check systemd state on stat1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:35:13] 10Operations, 10DNS, 10Okapi, 10Traffic: Okapi Domains - https://phabricator.wikimedia.org/T269686 (10Protsack.stephan) I was thinking of https://certbot.eff.org/ or something like that for HTTPS. But we are flexible in that manner so if there are some preferences we can definitely work things out. Let me... [20:37:22] (03PS2) 10Cwhite: profile: add normalize_level filter script [puppet] - 10https://gerrit.wikimedia.org/r/649956 (https://phabricator.wikimedia.org/T234565) [20:41:15] RECOVERY - Check systemd state on stat1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:45:09] PROBLEM - Long running screen/tmux on maps1006 is CRITICAL: CRIT: Long running SCREEN process. (user: root PID: 14948, 1742321s 1728000s). https://wikitech.wikimedia.org/wiki/Monitoring/Long_running_screens [20:45:38] 👀 [20:45:42] hnowlan maybe? ^ [20:48:32] (03PS1) 10Jeena Huneidi: Add mw-cli to the releases server [puppet] - 10https://gerrit.wikimedia.org/r/649958 (https://phabricator.wikimedia.org/T250241) [21:00:04] chrisalbon and accraze: Dear deployers, time to do the Services – Graphoid / ORES deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201216T2100). [21:05:54] (03CR) 10Volans: [C: 04-1] "Is there a puppet compiler run to see? See comments inline" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [21:11:55] (03PS2) 10Holger Knust: WIP: Add new watchlist job [dumps] - 10https://gerrit.wikimedia.org/r/625895 (https://phabricator.wikimedia.org/T51133) [21:12:36] (03CR) 10jerkins-bot: [V: 04-1] WIP: Add new watchlist job [dumps] - 10https://gerrit.wikimedia.org/r/625895 (https://phabricator.wikimedia.org/T51133) (owner: 10Holger Knust) [21:14:08] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/27158/" [puppet] - 10https://gerrit.wikimedia.org/r/649679 (owner: 10Dzahn) [21:14:25] 10Operations, 10DNS, 10Okapi, 10Traffic: Okapi Domains - https://phabricator.wikimedia.org/T269686 (10BBlack) There's probably a lot of context missing here, athough we can gather some from https://www.mediawiki.org/wiki/Okapi and https://meta.wikimedia.org/wiki/Okapi . Perhaps we could get a primer on wh... [21:19:43] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10kostajh) >>! In T270270#6696236, @JMeybohm wrote: > Could you please check if you see any addition... [21:20:37] (03PS22) 10CRusnov: netbox: Adjust settings for supporting Netbox 2.9 series [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) [21:20:39] (03CR) 10CRusnov: netbox: Adjust settings for supporting Netbox 2.9 series (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [21:20:43] (03PS3) 10CRusnov: netbox: Add only non-2.8 compatible setting for Netbox [puppet] - 10https://gerrit.wikimedia.org/r/649436 (https://phabricator.wikimedia.org/T266488) [21:23:11] (03CR) 10jerkins-bot: [V: 04-1] netbox: Adjust settings for supporting Netbox 2.9 series [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [21:24:34] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10kostajh) I can reproduce the error on macOS 11.0.1 + Docker for Mac 3.0.1 (Docker version 20.10.0)... [21:25:07] (03CR) 10CRusnov: "thanks! here's PCC output" [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [21:25:35] (03CR) 10Dzahn: "still noop on alert1001" [puppet] - 10https://gerrit.wikimedia.org/r/649679 (owner: 10Dzahn) [21:28:45] 10Operations, 10MediaWiki-Docker, 10serviceops, 10User-zeljkofilipin: docker pull from docker-registry fails with `ERROR: missing or empty Content-Length header` - https://phabricator.wikimedia.org/T270270 (10kostajh) I filed https://github.com/docker/for-mac/issues/5143 in case it's an issue with Docker f... [21:30:25] (03CR) 10Dzahn: [C: 03+1] "looks alright in compiler: https://puppet-compiler.wmflabs.org/compiler1002/27161/releases1002.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/649958 (https://phabricator.wikimedia.org/T250241) (owner: 10Jeena Huneidi) [21:31:12] (03CR) 10Volans: "Apart the CI failure, why those are only in the old catalog in the compiler?" [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [21:50:05] (03PS3) 10Ejegg: Disable CentralNotice on API portal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/649942 (https://phabricator.wikimedia.org/T270308) [22:01:43] (03PS2) 10Dzahn: mediawiki/jobrunner: create beta role, remove hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/649707 (https://phabricator.wikimedia.org/T209953) [22:02:58] (03CR) 10Dzahn: [V: 03+1] "I checked in Horizon in deployment-prep for the puppet prefix "deployment-jobrunner" and adjusted the role it uses to "role::beta::mediawi" [puppet] - 10https://gerrit.wikimedia.org/r/649707 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:38:19] (03CR) 10Jeena Huneidi: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/649958 (https://phabricator.wikimedia.org/T250241) (owner: 10Jeena Huneidi) [22:44:23] (03CR) 10Dzahn: [V: 03+1 C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/27162/" [puppet] - 10https://gerrit.wikimedia.org/r/649707 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:47:31] (03CR) 10Dzahn: "noop on prod jobrunner mw1307" [puppet] - 10https://gerrit.wikimedia.org/r/649707 (https://phabricator.wikimedia.org/T209953) (owner: 10Dzahn) [22:54:21] 10Operations, 10Traffic: Image fails to load with CORS violation - https://phabricator.wikimedia.org/T270209 (10Aklapper) > I get an error message I don't, being logged in, on Firefox 84. Which browser and browser version is this about? [22:55:37] 10Operations: Image fails to load with CORS violation - https://phabricator.wikimedia.org/T270209 (10Aklapper) [22:59:34] (03CR) 10Dzahn: [C: 03+1] "> Oh, sorry! Where/how do I submit a request?" [puppet] - 10https://gerrit.wikimedia.org/r/649958 (https://phabricator.wikimedia.org/T250241) (owner: 10Jeena Huneidi) [23:00:27] (03CR) 10CRusnov: "> Patch Set 22:" [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [23:00:29] (03CR) 10CRusnov: "> Patch Set 22:" [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [23:00:50] (03CR) 10CRusnov: "> Patch Set 22:" [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [23:02:32] (03CR) 10Jeena Huneidi: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/649958 (https://phabricator.wikimedia.org/T250241) (owner: 10Jeena Huneidi) [23:03:17] (03PS23) 10CRusnov: netbox: Adjust settings for supporting Netbox 2.9 series [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) [23:03:19] (03PS4) 10CRusnov: netbox: Add only non-2.8 compatible setting for Netbox [puppet] - 10https://gerrit.wikimedia.org/r/649436 (https://phabricator.wikimedia.org/T266488) [23:07:39] (03CR) 10Dzahn: "I don't know how IPs ended up on abuse-networks in the past. Is that about just Phabricator or about other services as well? Do we really " [puppet] - 10https://gerrit.wikimedia.org/r/649882 (https://phabricator.wikimedia.org/T270285) (owner: 10Jbond) [23:11:32] (03CR) 10Dzahn: [C: 03+1] "2017 seems a reasonably long time ago to remove bans. Maybe we should set a policy about expiration of banned IPs.. I wonder if there is o" [puppet] - 10https://gerrit.wikimedia.org/r/649753 (https://phabricator.wikimedia.org/T270185) (owner: 10Wolfgang Kandek) [23:12:58] (03CR) 10Dzahn: [C: 03+1] "Oh yea, and of course as the commit message says these in in "non-working format" anyways. So yea. Needs checking after removing them thou" [puppet] - 10https://gerrit.wikimedia.org/r/649753 (https://phabricator.wikimedia.org/T270185) (owner: 10Wolfgang Kandek) [23:21:01] (03CR) 10Volans: [C: 03+1] "As said before I'm not familiar with the redis bits. The change looks sane for me, compiler output seems good too, I didn't test it though" [puppet] - 10https://gerrit.wikimedia.org/r/643354 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [23:21:48] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/649436 (https://phabricator.wikimedia.org/T266488) (owner: 10CRusnov) [23:24:33] 10Operations: Image fails to load with CORS violation - https://phabricator.wikimedia.org/T270209 (10RoySmith) I'm still getting it. I can reproduce on any of: Chrome Version 87.0.4280.88 (Official Build) (x86_64) (same result in both a regular or an incognito window) Safari Version 14.0.2 (16610.3.7.1.9) Fire... [23:27:56] 10Operations, 10SRE-Access-Requests: Requesting access to releases1002/2002 for jhuneidi, brennen - https://phabricator.wikimedia.org/T270350 (10jeena) [23:28:57] (03CR) 10Jeena Huneidi: "I created this task for the access request: https://phabricator.wikimedia.org/T270350" [puppet] - 10https://gerrit.wikimedia.org/r/649958 (https://phabricator.wikimedia.org/T250241) (owner: 10Jeena Huneidi) [23:33:43] (03CR) 10Dzahn: [C: 03+1] "Thank you, we will make sure this doesn't take too long. I'll check the boxes that can already be checked now." [puppet] - 10https://gerrit.wikimedia.org/r/649958 (https://phabricator.wikimedia.org/T250241) (owner: 10Jeena Huneidi) [23:34:41] (03CR) 10Bstorm: [C: 03+2] wikireplicas: close the connection object for maintain-meta_p [puppet] - 10https://gerrit.wikimedia.org/r/649475 (https://phabricator.wikimedia.org/T269620) (owner: 10Bstorm) [23:35:47] 10Operations, 10SRE-Access-Requests: Requesting access to releases1002/2002 for jhuneidi, brennen - https://phabricator.wikimedia.org/T270350 (10Dzahn) [23:36:51] (03PS1) 10RLazarus: admin: Promote stran from ldap_only_users to users; add to deployment [puppet] - 10https://gerrit.wikimedia.org/r/649992 (https://phabricator.wikimedia.org/T270125) [23:41:07] 10Operations, 10DNS, 10Okapi, 10Traffic: Create three Okapi sub-domains (okapi*.wikimedia.org) - https://phabricator.wikimedia.org/T269686 (10Aklapper) [23:41:33] (03CR) 10Dzahn: [V: 03+1 C: 03+1] "matches LDAP and corp-LDAP and has approvals" [puppet] - 10https://gerrit.wikimedia.org/r/649992 (https://phabricator.wikimedia.org/T270125) (owner: 10RLazarus) [23:43:11] (03CR) 10RLazarus: [C: 03+2] admin: Promote stran from ldap_only_users to users; add to deployment [puppet] - 10https://gerrit.wikimedia.org/r/649992 (https://phabricator.wikimedia.org/T270125) (owner: 10RLazarus) [23:49:19] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10RLazarus) [23:49:55] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to deployment group for STran - https://phabricator.wikimedia.org/T270125 (10RLazarus) 05Open→03Resolved @thcipriani @aezell Thanks! @STran You're all set -- give it up to 30 minutes for the change to be deployed everywhere, and... [23:51:27] (03PS2) 10Dzahn: Add mw-cli to the releases server [puppet] - 10https://gerrit.wikimedia.org/r/649958 (https://phabricator.wikimedia.org/T250241) (owner: 10Jeena Huneidi) [23:56:40] !log bootstrapped meta_p database for the new s7 replicas T269427 [23:56:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:56:44] T269427: Prepare and check storage layer for eowikivoyage - https://phabricator.wikimedia.org/T269427 [23:57:43] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to releases1002/2002 for jhuneidi, brennen - https://phabricator.wikimedia.org/T270350 (10RLazarus) @thcipriani Based on the context in T250241 I'm guessing this already has your blessing (creating a `releasers-mwcli` group containing...