[00:05:22] 10Operations, 10Product-Analytics, 10Wikimedia-General-or-Unknown, 10Readers-Web-Backlog (Needs Product Owner Decisions), 10SEO: Yoruba Language Wikipedia not being indexed by search engines - https://phabricator.wikimedia.org/T236241 (10Jdlrobson) a:05Jdlrobson→03None [00:07:44] 10Operations, 10serviceops, 10Patch-For-Review: move 20 new codfw parsoid servers (parse2*) into production - https://phabricator.wikimedia.org/T247441 (10Dzahn) p:05Triage→03Medium [00:16:24] 10Operations, 10Keyholder: After arming a new key in keyholder, the identity file path does not show up - https://phabricator.wikimedia.org/T257329 (10Dzahn) see related issues from the past: T154943#3066450 https://gerrit.wikimedia.org/r/c/operations/puppet/+/312947 [00:23:00] (03PS1) 10BryanDavis: toolforge: enable delete API for docker-registry [puppet] - 10https://gerrit.wikimedia.org/r/610191 [00:30:05] (03CR) 10BryanDavis: "PCC output: https://puppet-compiler.wmflabs.org/compiler1002/23753/" [puppet] - 10https://gerrit.wikimedia.org/r/610191 (owner: 10BryanDavis) [00:36:39] (03CR) 10BryanDavis: "I'm hoping to use something like https://raw.githubusercontent.com/byrnedo/docker-reg-tool/master/docker_reg_tool to actually delete the o" [puppet] - 10https://gerrit.wikimedia.org/r/610191 (owner: 10BryanDavis) [00:38:17] (03PS1) 10Bstorm: paws-prometheus: add dummy value for the paws-k8s pk [labs/private] - 10https://gerrit.wikimedia.org/r/610192 (https://phabricator.wikimedia.org/T256361) [00:39:33] (03PS1) 10Dzahn: releases::mediawiki: support rsyncing files to multiple secondaries [puppet] - 10https://gerrit.wikimedia.org/r/610193 (https://phabricator.wikimedia.org/T247652) [00:40:31] (03CR) 10Bstorm: [V: 03+2 C: 03+2] paws-prometheus: add dummy value for the paws-k8s pk [labs/private] - 10https://gerrit.wikimedia.org/r/610192 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [00:40:49] (03CR) 10jerkins-bot: [V: 04-1] releases::mediawiki: support rsyncing files to multiple secondaries [puppet] - 10https://gerrit.wikimedia.org/r/610193 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [00:43:31] (03PS2) 10Dzahn: releases::mediawiki: support rsyncing files to multiple secondaries [puppet] - 10https://gerrit.wikimedia.org/r/610193 (https://phabricator.wikimedia.org/T247652) [00:47:39] 10Operations, 10netops: automatically sample from all FPCs on core routers - https://phabricator.wikimedia.org/T257392 (10CDanis) [00:47:55] 10Operations, 10netops: automatically sample from all FPCs on core routers - https://phabricator.wikimedia.org/T257392 (10CDanis) [00:48:20] 10Operations, 10netops: automatically sample from all FPCs on core routers - https://phabricator.wikimedia.org/T257392 (10CDanis) p:05Triage→03Low [00:48:27] 10Operations, 10netops: automatically sample from all FPCs on core routers - https://phabricator.wikimedia.org/T257392 (10CDanis) a:05CDanis→03None [00:51:42] 10Operations, 10ops-codfw, 10netops: (Need by: End of July-2020 ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Papaul) [01:14:54] (03CR) 10Dzahn: [C: 04-1] "duplicate declaration https://puppet-compiler.wmflabs.org/compiler1001/23754/" [puppet] - 10https://gerrit.wikimedia.org/r/610193 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [01:16:10] (03PS3) 10Dzahn: releases::mediawiki: support rsyncing files to multiple secondaries [puppet] - 10https://gerrit.wikimedia.org/r/610193 (https://phabricator.wikimedia.org/T247652) [01:17:20] (03CR) 10Reedy: Gerrit: Add ed25519 and ecdsa ssh host keys (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/556270 (https://phabricator.wikimedia.org/T240266) (owner: 10Paladox) [01:30:42] (03CR) 10Dzahn: Gerrit: Add ed25519 and ecdsa ssh host keys (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/556270 (https://phabricator.wikimedia.org/T240266) (owner: 10Paladox) [01:31:34] (03CR) 10Dzahn: Gerrit: Add ed25519 and ecdsa ssh host keys (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/556270 (https://phabricator.wikimedia.org/T240266) (owner: 10Paladox) [01:33:44] (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1003/23755/" [puppet] - 10https://gerrit.wikimedia.org/r/610193 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [01:40:38] (03PS1) 10Dzahn: releases: also sync blubber,parsoid,reprepro files to multiple servers [puppet] - 10https://gerrit.wikimedia.org/r/610195 (https://phabricator.wikimedia.org/T247652) [01:50:29] 10Operations, 10Analytics-Radar, 10Traffic, 10Privacy: Add request_id to webrequest logs as well as other event records ingested into Hadoop - https://phabricator.wikimedia.org/T113817 (10Ottomata) :) [03:32:47] !log andrew@deploy1001 Started deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130 [03:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:36:31] !log andrew@deploy1001 Finished deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130 (duration: 03m 44s) [03:36:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:53:52] 10Operations, 10OpenRefine, 10Traffic, 10serviceops, and 2 others: Clients failing API login due to dependence on "Set-Cookie" header name casing - https://phabricator.wikimedia.org/T249680 (10Antigng) Just to mention that apache httpd does a camel casing by default when proxying back from an http2-talking... [04:33:56] 10Operations, 10Graphoid, 10serviceops, 10Core Platform Team (Icebox), 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Jseddon) [04:46:16] (03CR) 10DannyS712: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609990 (https://phabricator.wikimedia.org/T257106) (owner: 10MarcoAurelio) [06:14:19] (03PS1) 10Giuseppe Lavagetto: restbase: re-switch to proton-http [puppet] - 10https://gerrit.wikimedia.org/r/610202 [06:16:14] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] restbase: re-switch to proton-http [puppet] - 10https://gerrit.wikimedia.org/r/610202 (owner: 10Giuseppe Lavagetto) [06:18:44] <_joe_> !log rolling restart of restbase to pick up the proton url change [06:18:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:28:10] PROBLEM - Check size of conntrack table on kubernetes1001 is CRITICAL: connect to address 10.64.0.121 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [06:29:36] PROBLEM - Check systemd state on kubernetes1001 is CRITICAL: connect to address 10.64.0.121 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:30:22] PROBLEM - puppet last run on kubernetes1001 is CRITICAL: connect to address 10.64.0.121 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [06:33:42] PROBLEM - MD RAID on kubernetes1001 is CRITICAL: connect to address 10.64.0.121 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [06:36:03] !log Deploy schema change on s2 primary master db1122 T238966 [06:36:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:36:08] T238966: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 [06:36:24] PROBLEM - DPKG on kubernetes1001 is CRITICAL: connect to address 10.64.0.121 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [06:40:04] PROBLEM - configured eth on kubernetes1001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.64.0.121: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [06:40:18] (03CR) 10Ayounsi: [C: 03+1] "lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/610030 (https://phabricator.wikimedia.org/T256958) (owner: 10Jbond) [06:40:40] PROBLEM - Check systemd state on kubernetes1001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.64.0.121: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:41:30] PROBLEM - Disk space on kubernetes1001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.64.0.121: Connection reset by peer https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=kubernetes1001&var-datasource=eqiad+prometheus/ops [06:42:46] PROBLEM - Check the NTP synchronisation status of timesyncd on kubernetes1001 is CRITICAL: connect to address 10.64.0.121 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP [06:43:30] PROBLEM - dhclient process on kubernetes1001 is CRITICAL: connect to address 10.64.0.121 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [06:43:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P11795 and previous config saved to /var/cache/conftool/dbconfig/20200708-064354-marostegui.json [06:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:05] (03CR) 10Ema: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/610027 (owner: 10Ema) [06:45:18] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [06:47:42] !log start topology changes on m1 T256717 [06:47:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:47] T256717: db1097 (m1 master) crashed due to memory issues. - https://phabricator.wikimedia.org/T256717 [06:50:00] PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1001 is CRITICAL: connect to address 10.64.0.121 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [06:50:25] 10Operations, 10MediaWiki-extensions-Score, 10Security-Team, 10Wikimedia-General-or-Unknown, and 3 others: Extension:Score / Lilypond is disabled on all wikis - https://phabricator.wikimedia.org/T257066 (10ArielGlenn) [06:54:14] PROBLEM - puppet last run on kubernetes1003 is CRITICAL: connect to address 10.64.32.23 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [06:54:14] PROBLEM - DPKG on kubernetes1003 is CRITICAL: connect to address 10.64.32.23 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [06:54:16] PROBLEM - MD RAID on kubernetes1003 is CRITICAL: connect to address 10.64.32.23 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [06:54:44] PROBLEM - Check size of conntrack table on kubernetes1003 is CRITICAL: connect to address 10.64.32.23 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [06:54:44] PROBLEM - Check systemd state on kubernetes1003 is CRITICAL: connect to address 10.64.32.23 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:00:14] PROBLEM - SSH on kubernetes1003 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [07:02:06] RECOVERY - SSH on kubernetes1003 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [07:02:08] both kubernetes nodes are still marked Ready and happily serving containers ... [07:02:54] PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1003 is CRITICAL: connect to address 10.64.32.23 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [07:04:02] PROBLEM - configured eth on kubernetes1003 is CRITICAL: connect to address 10.64.32.23 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [07:04:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1138', diff saved to https://phabricator.wikimedia.org/P11796 and previous config saved to /var/cache/conftool/dbconfig/20200708-070403-marostegui.json [07:04:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:04:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P11797 and previous config saved to /var/cache/conftool/dbconfig/20200708-070432-marostegui.json [07:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:05:42] PROBLEM - Disk space on kubernetes1003 is CRITICAL: connect to address 10.64.32.23 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=kubernetes1003&var-datasource=eqiad+prometheus/ops [07:08:05] (03PS2) 10Marostegui: mariadb: Promote db1080 to m1 master [puppet] - 10https://gerrit.wikimedia.org/r/610010 (https://phabricator.wikimedia.org/T256717) [07:08:18] RECOVERY - Check systemd state on kubernetes1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:08:42] RECOVERY - Check size of conntrack table on kubernetes1001 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [07:08:46] (03CR) 10Marostegui: mariadb: Promote db1080 to m1 master [puppet] - 10https://gerrit.wikimedia.org/r/610010 (https://phabricator.wikimedia.org/T256717) (owner: 10Marostegui) [07:08:50] (03CR) 10Marostegui: [C: 03+2] mariadb: Promote db1080 to m1 master [puppet] - 10https://gerrit.wikimedia.org/r/610010 (https://phabricator.wikimedia.org/T256717) (owner: 10Marostegui) [07:09:21] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1141', diff saved to https://phabricator.wikimedia.org/P11798 and previous config saved to /var/cache/conftool/dbconfig/20200708-070921-marostegui.json [07:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:32] RECOVERY - Check size of conntrack table on kubernetes1003 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [07:09:32] RECOVERY - Check systemd state on kubernetes1003 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:10:36] 10Operations, 10DBA, 10Patch-For-Review: db1097 (m1 master) crashed due to memory issues. - https://phabricator.wikimedia.org/T256717 (10Marostegui) pre failover steps done [07:10:54] RECOVERY - configured eth on kubernetes1001 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [07:11:20] RECOVERY - puppet last run on kubernetes1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:11:46] 10Operations, 10serviceops, 10Performance-Team (Radar), 10User-Elukey: mcrouter codfw proxies sometimes lead to TKOs - https://phabricator.wikimedia.org/T227265 (10elukey) As described in T255511 we should think about adding a gutter pool for mw2* proxies to better handle TKOs when they happen. [07:11:48] RECOVERY - puppet last run on kubernetes1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [07:13:36] RECOVERY - Check the NTP synchronisation status of timesyncd on kubernetes1001 is OK: OK: synced at Wed 2020-07-08 07:13:35 UTC. https://wikitech.wikimedia.org/wiki/NTP [07:14:18] RECOVERY - dhclient process on kubernetes1001 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [07:15:58] RECOVERY - MD RAID on kubernetes1003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [07:17:04] RECOVERY - MD RAID on kubernetes1001 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [07:17:43] 10Operations, 10Analytics-Clusters, 10netops, 10Patch-For-Review: Move netflow data to Eventgate Analytics - https://phabricator.wikimedia.org/T248865 (10elukey) Some high level issues that came up while talking about netflow on eventgate: * `pmacct` (the daemon that collects netflow data from routers) se... [07:18:34] <_joe_> jayme: any idea what's going on? [07:19:29] _joe_: not yet. nrpe died on both nodes [07:19:32] Jul 08 06:24:29 kubernetes1001 systemd[1]: nagios-nrpe-server.service: Main process exited, code=exited, status=2/INVALIDARGUMENT [07:19:34] Jul 08 06:24:29 kubernetes1001 systemd[1]: nagios-nrpe-server.service: Failed to fork: Resource temporarily unavailable [07:19:49] <_joe_> so OOM? [07:20:05] no reason for and no sign for it [07:20:50] RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [07:20:52] <_joe_> yeah and no reason at all for this [07:22:45] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete apt::pin for librdkafka1 on eventlogging [puppet] - 10https://gerrit.wikimedia.org/r/610120 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [07:23:10] RECOVERY - Disk space on kubernetes1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=kubernetes1001&var-datasource=eqiad+prometheus/ops [07:24:24] 10Operations, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10MoritzMuehlenhoff) [07:24:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P11799 and previous config saved to /var/cache/conftool/dbconfig/20200708-072431-marostegui.json [07:24:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:04] RECOVERY - DPKG on kubernetes1003 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [07:26:32] RECOVERY - Disk space on kubernetes1003 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=kubernetes1003&var-datasource=eqiad+prometheus/ops [07:27:41] _joe_: hmm...don't know what exitcode 2 means for NRPE but this "Failed to fork" (only on kubernetes1001) is weird as well [07:28:29] 10Operations, 10Graphoid, 10serviceops, 10Core Platform Team (Icebox), 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26): Undeploy graphoid for phase 1 wiki's - https://phabricator.wikimedia.org/T257402 (10Jseddon) [07:28:47] maybe it was hitting some ulimit [07:28:57] <_joe_> failed to fork is usually due to starvation of resources [07:29:03] <_joe_> usually memory [07:29:48] <_joe_> anyways, if it doesn't repeat (which I think will happen) we can disregard for now [07:30:05] (03CR) 10Muehlenhoff: [C: 03+2] pcc: Also recommend jenkinsapi Debian package [puppet] - 10https://gerrit.wikimedia.org/r/598704 (owner: 10Muehlenhoff) [07:30:12] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1142', diff saved to https://phabricator.wikimedia.org/P11800 and previous config saved to /var/cache/conftool/dbconfig/20200708-073011-marostegui.json [07:30:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:30:37] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P11801 and previous config saved to /var/cache/conftool/dbconfig/20200708-073037-marostegui.json [07:30:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:30:56] yeah (usually memory), but in this case *shrug* [07:31:11] 10Operations, 10Analytics-Clusters, 10procurement: RAM expansion for an-master100[1,2] nodes - https://phabricator.wikimedia.org/T257403 (10elukey) [07:33:44] RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1003 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [07:34:54] RECOVERY - configured eth on kubernetes1003 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [07:35:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P11802 and previous config saved to /var/cache/conftool/dbconfig/20200708-073548-marostegui.json [07:35:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:38:06] RECOVERY - DPKG on kubernetes1001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [07:38:27] (03CR) 10Jcrespo: "For the db removal, I think Manuel or Stephen should be the ones being briefed, as it affects m2 database service, rather than backups. Th" [puppet] - 10https://gerrit.wikimedia.org/r/609884 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [07:40:41] (03CR) 10Marostegui: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/609884 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [07:45:35] !log installing PHP 7.3 security updates [07:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:02] !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime [07:47:02] !log akosiaris@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [07:47:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:05] !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime [07:47:06] !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [07:47:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:10] 10Operations, 10ops-eqiad: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 (10ops-monitoring-bot) Icinga downtime for 2 days, 0:00:00 set by akosiaris@cumin1001 on 1 host(s) and their services with reason: Memory upgrade ` ganeti1007.eqiad.wmnet ` [07:47:11] !log akosiaris@cumin1001 START - Cookbook sre.hosts.downtime [07:47:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:12] !log akosiaris@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [07:47:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:16] 10Operations, 10ops-eqiad: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 (10ops-monitoring-bot) Icinga downtime for 2 days, 0:00:00 set by akosiaris@cumin1001 on 1 host(s) and their services with reason: Memory upgrade ` etcd1003.eqiad.wmnet ` [07:47:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:40] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:48:39] 10Operations, 10ops-eqiad: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 (10akosiaris) [07:48:54] !log stop bacula-director on backup1001 in preparation for m1 switchover T256717 [07:48:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:59] T256717: db1097 (m1 master) crashed due to memory issues. - https://phabricator.wikimedia.org/T256717 [07:49:17] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [07:49:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:32] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:49:32] 10Operations, 10ops-eqiad: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 (10akosiaris) @Jclark-ctr Yes! Took a while but all migrations are done, host has been downtimed for 48H and has been powered off [07:49:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11803 and previous config saved to /var/cache/conftool/dbconfig/20200708-074939-marostegui.json [07:49:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:15] bacula stop will make some alerts (that I am downtiming) but also some metrics collections fail [07:51:40] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [07:51:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:53:18] 10Operations, 10Graphoid, 10serviceops, 10Core Platform Team (Icebox): Undeploy graphoid for phase 1 wiki's - https://phabricator.wikimedia.org/T257402 (10Jseddon) [07:55:28] (03PS1) 10Seddon: Bug: T257402 Undeploy graphoid for phase 1 wiki's Change-Id: I49fd5c5d264cc4e87d827f0d565d2abc06aea604 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610234 (https://phabricator.wikimedia.org/T257402) [07:57:20] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=bacula site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:57:57] !log reimaging es2020 to buster T257284 [07:58:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:02] T257284: Upgrade es4 to debian buster + mariadb 10.4 - https://phabricator.wikimedia.org/T257284 [07:58:50] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission-hardware, and 2 others: decom cloudvirt1015 - https://phabricator.wikimedia.org/T257366 (10Peachey88) [07:58:58] (03PS1) 10Kormat: install_server: Switch es2020 to buster [puppet] - 10https://gerrit.wikimedia.org/r/610235 (https://phabricator.wikimedia.org/T257284) [07:59:17] jobs reduced availability on icinga1001 is CRITICAL: job=bacula site=eqiad is me [07:59:32] bacula is temporarilly stopped, will come back after maintenance [08:00:04] marostegui, jynus, and akosiarios: #bothumor My software never has bugs. It just develops random features. Rise for m1 database master failover. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200708T0800). [08:00:22] akosiaris: sorry for the typo above! ^ [08:00:24] 10Operations, 10Desktop Improvements, 10Traffic, 10Performance-Team (Radar): Turn off CDN cache for up to one week on several wikis for desktop improvements deployment - https://phabricator.wikimedia.org/T256750 (10ema) As @Krinkle pointed out, instead of turning off caching altogether we can invalidate th... [08:00:28] jouncebot: you have a typo there. But I like not be pinged [08:00:28] jynus akosiaris let's proceed? [08:00:42] go for it marostegui [08:00:48] yes [08:00:50] 10Operations, 10Desktop Improvements, 10Traffic, 10Performance-Team (Radar): CDN cache revalidation on several wikis for desktop improvements deployment - https://phabricator.wikimedia.org/T256750 (10ema) [08:00:51] ok! [08:00:53] !log Failover m1 from db1097 to db1080 - T256717 [08:00:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:57] T256717: db1097 (m1 master) crashed due to memory issues. - https://phabricator.wikimedia.org/T256717 [08:01:07] zarcillo/tendril should work this time [08:01:29] [ERROR] console - Error: Connection lost: The server closed the connection. [08:01:34] perfect, it restarted just fine [08:01:43] all done [08:01:55] wow, how many people are reconnecting to the pads [08:02:20] etherpad working for me [08:02:27] took a while to work back [08:02:38] maybe because overload of simultaneous connections? [08:02:44] librenms works too [08:02:47] it didn't error out on db [08:02:58] I did not have to do anything this time around [08:03:10] did you change anything from last time? [08:03:18] rt works too [08:03:33] jynus: nope [08:03:35] marostegui: confirms both tendril and zarcillo worked? [08:04:04] akosiaris: could I ask you to document on the misc section to say that "etharpad should reconnect automatically, if it doesn't restart the service"? [08:04:06] I am guessing last time the window was longer and etherpad was restarted enough times for systemd to give up [08:04:14] ah, makes sense [08:04:19] that should be documented [08:04:30] jynus: sure, where do you want me to add it? [08:04:35] jynus: confirmed [08:04:41] etherpad wikitech page? [08:04:47] or some other one? [08:04:50] https://wikitech.wikimedia.org/wiki/MariaDB/misc [08:04:54] ah, nice! [08:05:17] maybe link that from the ethperpad page, I think it is interesting [08:05:32] I will update bacula [08:06:05] I will start bacula again, if no one objects [08:06:06] 10Operations, 10Analytics-Clusters, 10procurement: RAM expansion for an-master100[1,2] nodes - https://phabricator.wikimedia.org/T257403 (10Peachey88) @elukey This should be moved from {S1} to {S4}. [08:06:09] jynus: if you run a quick bacula job just to confirm, that'd be good too [08:06:13] ok [08:06:18] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/610135 (https://phabricator.wikimedia.org/T234854) (owner: 10Herron) [08:06:43] one second as to stop bacula I needed to stop puppet [08:07:29] jynus: https://wikitech.wikimedia.org/w/index.php?title=MariaDB/misc&diff=1872777&oldid=1870951 done. I 'll update the etherpad page as well [08:07:59] thank, you akosiaris [08:08:19] think that the more you document that, the more is likely someone else will take over its maintance :-D [08:08:19] thanks akosiaris [08:09:06] (03CR) 10Marostegui: [C: 03+1] install_server: Switch es2020 to buster [puppet] - 10https://gerrit.wikimedia.org/r/610235 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [08:09:22] gerrit1001.wikimedia.org-Hourly-Sun-production-gerrit-repo-data is running [08:09:31] jynus: good, thanks [08:09:38] 243006 Incr 2,159 114.3 M OK 08-Jul-20 08:09 gerrit1001.wikimedia.org-Hourly-Sun-production-gerrit-repo-data [08:09:44] (03CR) 10Kormat: [C: 03+2] install_server: Switch es2020 to buster [puppet] - 10https://gerrit.wikimedia.org/r/610235 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [08:09:47] finised correctly [08:09:50] nice! [08:09:50] *finished [08:10:39] I am going to reschedule the offsite job, as I think it got fully cancelled [08:11:58] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [08:12:00] anything else to check ? [08:12:33] did zarcillo and tendril steps work? [08:12:56] (03PS1) 10Marostegui: db1097: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/610236 (https://phabricator.wikimedia.org/T256717) [08:13:44] you may have answered already, but I may have missed the answer with so many things ongoing, sorry [08:14:07] I think you said yes [08:14:24] jynus: yeah, everything else has been checked [08:14:33] and yes, zarcillo and tendril updated ok too [08:14:34] sorry, so many convos and check at the same time [08:14:40] no worries at all [08:14:50] cool [08:14:53] (03CR) 10Marostegui: [C: 03+2] db1097: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/610236 (https://phabricator.wikimedia.org/T256717) (owner: 10Marostegui) [08:15:20] !log kormat@cumin1001 dbctl commit (dc=all): 'Depool es2020 for reimaging T257284', diff saved to https://phabricator.wikimedia.org/P11804 and previous config saved to /var/cache/conftool/dbconfig/20200708-081519-kormat.json [08:15:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:15:25] T257284: Upgrade es4 to debian buster + mariadb 10.4 - https://phabricator.wikimedia.org/T257284 [08:16:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11805 and previous config saved to /var/cache/conftool/dbconfig/20200708-081647-marostegui.json [08:16:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:46] 10Operations, 10DBA: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10Marostegui) [08:19:43] 10Operations, 10DBA, 10Patch-For-Review: db1097 (m1 master) crashed due to memory issues. - https://phabricator.wikimedia.org/T256717 (10Marostegui) 05Open→03Resolved a:03Marostegui All done - the decommissioning on db1097 will be tracked at T257406 Thanks Jaime and Alex for supporting this maintenance! [08:20:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11806 and previous config saved to /var/cache/conftool/dbconfig/20200708-082040-marostegui.json [08:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:14] (03PS1) 10Marostegui: production-m3.sql: Remove dbproxy1003 grants [puppet] - 10https://gerrit.wikimedia.org/r/610237 (https://phabricator.wikimedia.org/T231280) [08:26:24] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11807 and previous config saved to /var/cache/conftool/dbconfig/20200708-082624-marostegui.json [08:26:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:50] !log Remove dbproxy1003 grants from misc hosts T231280 [08:28:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:28:55] T231280: Remove grants for the old dbproxy hosts from the misc databases - https://phabricator.wikimedia.org/T231280 [08:29:07] 10Operations, 10Icinga, 10observability: move icinga contacts file to public repo - https://phabricator.wikimedia.org/T164238 (10fgiunchedi) 05Open→03Declined The plan is to move all contacts for paging to VO over time so this task will be moot indeed, I'm going to boldy decline it but feel free to reopen! [08:29:49] 10Operations, 10serviceops, 10wikitech.wikimedia.org: Install php-ldap on all MW appservers - https://phabricator.wikimedia.org/T237889 (10Joe) So one of my concerns is actually about the parent task, I'll comment there. [08:32:22] 10Operations, 10Phatality, 10observability: Deploying "Phatality" plugin for Kibana invokes oom-killer on logstash::collector nodes - https://phabricator.wikimedia.org/T237706 (10fgiunchedi) Thanks @mmodell for the update! We're in the process of switching to ELK7 and I think it'll be worth to try deploying... [08:38:57] !log Upgraded docker.io on contint1001 and contint2001 [08:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:35] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [08:40:29] !log upgrading docker on remaining buster hosts [08:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:00] (03CR) 10Marostegui: [C: 03+2] production-m3.sql: Remove dbproxy1003 grants [puppet] - 10https://gerrit.wikimedia.org/r/610237 (https://phabricator.wikimedia.org/T231280) (owner: 10Marostegui) [08:42:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1074', diff saved to https://phabricator.wikimedia.org/P11808 and previous config saved to /var/cache/conftool/dbconfig/20200708-084227-marostegui.json [08:42:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:12] (03CR) 10Alexandros Kosiaris: "Oh, dammit. I 'll try and see how much of backport packages are being used in our images" [puppet] - 10https://gerrit.wikimedia.org/r/610050 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [08:44:28] 10Operations, 10Wikimedia-Mailing-lists: Create secondary mailinglist for german arbcom - https://phabricator.wikimedia.org/T256306 (10jcrespo) Hi, sorry for the delay on the response. Normally a new list creation request takes very little time to be processed. However, because this being an arbcom list reque... [08:45:08] (03PS1) 10Marostegui: report_users: Remove dbproxy1003's IP [software] - 10https://gerrit.wikimedia.org/r/610239 (https://phabricator.wikimedia.org/T256216) [08:45:36] 10Operations, 10Wikimedia-Mailing-lists: Create secondary mailinglist for german arbcom - https://phabricator.wikimedia.org/T256306 (10jcrespo) p:05Triage→03High [08:45:58] (03CR) 10Marostegui: [C: 03+2] report_users: Remove dbproxy1003's IP [software] - 10https://gerrit.wikimedia.org/r/610239 (https://phabricator.wikimedia.org/T256216) (owner: 10Marostegui) [08:46:30] 10Operations, 10ops-eqiad, 10DBA, 10decommission-hardware, 10Patch-For-Review: Decommission dbproxy1003.eqiad.wmnet - https://phabricator.wikimedia.org/T256216 (10Marostegui) [08:47:05] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks for explaining and providing stats about those components in the commit message!" [puppet] - 10https://gerrit.wikimedia.org/r/610049 (https://phabricator.wikimedia.org/T257327) (owner: 10Muehlenhoff) [08:47:55] (03PS1) 10Kormat: install_server: more generic matching for db hosts [puppet] - 10https://gerrit.wikimedia.org/r/610240 [08:48:39] (03PS1) 10Marostegui: mariadb: Remove puppet references for dbproxy1003 [puppet] - 10https://gerrit.wikimedia.org/r/610241 (https://phabricator.wikimedia.org/T256216) [08:49:28] (03CR) 10Jcrespo: [C: 03+1] "+1 If tested fully with cumin that it doesn't match other hosts." [puppet] - 10https://gerrit.wikimedia.org/r/610240 (owner: 10Kormat) [08:50:26] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P11809 and previous config saved to /var/cache/conftool/dbconfig/20200708-085024-marostegui.json [08:50:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:39] (03CR) 10Jbond: [C: 03+2] graphite: add graphite host as a global [puppet] - 10https://gerrit.wikimedia.org/r/610035 (owner: 10Jbond) [08:53:34] (03CR) 10Muehlenhoff: "You could do that with debmonitor, by selecting the respective image and then applying "bpo" as the filter (which applies to the package a" [puppet] - 10https://gerrit.wikimedia.org/r/610050 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [08:54:16] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [08:54:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:33] !log marostegui@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) [08:54:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:55] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [08:54:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:31] (03PS5) 10Jbond: profile::cassandra::single_instance: update to graphite_hosts global [puppet] - 10https://gerrit.wikimedia.org/r/610036 [08:55:44] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [08:55:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:50] 10Operations, 10ops-eqiad, 10DBA, 10decommission-hardware, 10Patch-For-Review: Decommission dbproxy1003.eqiad.wmnet - https://phabricator.wikimedia.org/T256216 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `dbproxy1003.eqiad.wmnet` - dbproxy1003.eqiad... [08:56:05] (03CR) 10Marostegui: [C: 03+2] mariadb: Remove puppet references for dbproxy1003 [puppet] - 10https://gerrit.wikimedia.org/r/610241 (https://phabricator.wikimedia.org/T256216) (owner: 10Marostegui) [08:56:34] (03PS5) 10Jbond: statsite::instance: fix style violation in define [puppet] - 10https://gerrit.wikimedia.org/r/610045 [08:57:16] (03CR) 10Jbond: [C: 03+2] profile::cassandra::single_instance: update to graphite_hosts global [puppet] - 10https://gerrit.wikimedia.org/r/610036 (owner: 10Jbond) [08:57:17] 10Operations, 10ops-eqiad, 10DBA, 10decommission-hardware, 10Patch-For-Review: Decommission dbproxy1003.eqiad.wmnet - https://phabricator.wikimedia.org/T256216 (10Marostegui) [08:57:44] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work), 10Patch-For-Review: (Need by: 2020-04-02) rack/setup/install relforge100[34] - https://phabricator.wikimedia.org/T241791 (10Gehel) @RKemper a few pointers for your investigation: * you can use `install_console $fqdn_of_server` to connect to the... [08:57:46] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1147', diff saved to https://phabricator.wikimedia.org/P11810 and previous config saved to /var/cache/conftool/dbconfig/20200708-085745-marostegui.json [08:57:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:30] (03PS1) 10Marostegui: wmnet: Remove dbproxy1003 DNS [dns] - 10https://gerrit.wikimedia.org/r/610248 (https://phabricator.wikimedia.org/T256216) [08:59:34] (03CR) 10Jbond: [C: 03+2] profile::statistics::explorer::misc_jobs: add graphite_host global [puppet] - 10https://gerrit.wikimedia.org/r/610039 (owner: 10Jbond) [09:03:44] (03CR) 10Kormat: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/610240 (owner: 10Kormat) [09:05:17] (03PS2) 10Jforrester: VisualEditor: Explicitly set visualeditor-enable to 0 when non-default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610156 (https://phabricator.wikimedia.org/T248343) (owner: 10C. Scott Ananian) [09:11:17] (03CR) 10Kormat: "Compared against zarcillo:" [puppet] - 10https://gerrit.wikimedia.org/r/610240 (owner: 10Kormat) [09:13:27] 10Operations, 10Scap, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (Release-Engineering-Team-TODO (2020-07-01 to 2020-09-30 (Q1))): scap configuration in puppet defaults to forge the git repo name with 'mediawiki/services/' - https://phabricator.wikimedia.org/T257413 (10... [09:13:41] 10Operations, 10Scap, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (Release-Engineering-Team-TODO (2020-07-01 to 2020-09-30 (Q1))): scap configuration in puppet defaults to forge the git repo name with 'mediawiki/services/' - https://phabricator.wikimedia.org/T257413 (10... [09:15:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P11811 and previous config saved to /var/cache/conftool/dbconfig/20200708-091557-marostegui.json [09:16:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:17:33] (03CR) 10Marostegui: [C: 03+2] wmnet: Remove dbproxy1003 DNS [dns] - 10https://gerrit.wikimedia.org/r/610248 (https://phabricator.wikimedia.org/T256216) (owner: 10Marostegui) [09:17:35] (03CR) 10Muehlenhoff: [C: 03+2] Switch Graphite to CAS-only [puppet] - 10https://gerrit.wikimedia.org/r/609400 (owner: 10Muehlenhoff) [09:22:36] (03PS1) 10Ema: cloud: add ats-be mapping rules for traffic-cache-atsupload [puppet] - 10https://gerrit.wikimedia.org/r/610250 [09:23:35] 10Operations, 10ops-eqiad, 10decommission-hardware: Decommission dbproxy1003.eqiad.wmnet - https://phabricator.wikimedia.org/T256216 (10Marostegui) a:05Marostegui→03Jclark-ctr [09:23:51] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission-hardware: Decommission dbproxy1003.eqiad.wmnet - https://phabricator.wikimedia.org/T256216 (10Marostegui) Host ready for #dc-ops [09:26:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P11812 and previous config saved to /var/cache/conftool/dbconfig/20200708-092627-marostegui.json [09:26:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:51] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1149', diff saved to https://phabricator.wikimedia.org/P11813 and previous config saved to /var/cache/conftool/dbconfig/20200708-092650-marostegui.json [09:26:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:06] (03PS1) 10Jbond: graphite_url: use hiera instead of lookup as we still use hiera3 [puppet] - 10https://gerrit.wikimedia.org/r/610252 [09:27:48] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM, let's give it a shot" [puppet] - 10https://gerrit.wikimedia.org/r/610252 (owner: 10Jbond) [09:28:55] (03CR) 10Kormat: [C: 03+2] install_server: more generic matching for db hosts [puppet] - 10https://gerrit.wikimedia.org/r/610240 (owner: 10Kormat) [09:28:57] (03CR) 10Jbond: [C: 03+2] graphite_url: use hiera instead of lookup as we still use hiera3 [puppet] - 10https://gerrit.wikimedia.org/r/610252 (owner: 10Jbond) [09:29:22] kormat: you happy for me to merge [09:29:32] jbond42: yes, please :) [09:30:05] merged [09:30:19] ty! [09:32:48] PROBLEM - BGP status on cr2-eqord is CRITICAL: BGP CRITICAL - AS2914/IPv4: Active - NTT https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [09:34:59] (03CR) 10Ema: [C: 03+2] cloud: add ats-be mapping rules for traffic-cache-atsupload [puppet] - 10https://gerrit.wikimedia.org/r/610250 (owner: 10Ema) [09:35:56] (03PS1) 10Hashar: Explicitly mentions the repository in scap::sources [puppet] - 10https://gerrit.wikimedia.org/r/610254 (https://phabricator.wikimedia.org/T257413) [09:36:19] (03CR) 10jerkins-bot: [V: 04-1] Explicitly mentions the repository in scap::sources [puppet] - 10https://gerrit.wikimedia.org/r/610254 (https://phabricator.wikimedia.org/T257413) (owner: 10Hashar) [09:37:10] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/610254 (https://phabricator.wikimedia.org/T257413) (owner: 10Hashar) [09:38:30] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production infrastructure services for jgiannelos - https://phabricator.wikimedia.org/T257187 (10jcrespo) [09:39:03] (03PS1) 10Muehlenhoff: Remove cas-puppetboard from caches [puppet] - 10https://gerrit.wikimedia.org/r/610255 [09:40:57] (03PS2) 10Hashar: Explicitly mentions the repository in scap::sources [puppet] - 10https://gerrit.wikimedia.org/r/610254 (https://phabricator.wikimedia.org/T257413) [09:41:37] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/610254 (https://phabricator.wikimedia.org/T257413) (owner: 10Hashar) [09:42:11] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production infrastructure services for jgiannelos - https://phabricator.wikimedia.org/T257187 (10jcrespo) May I aks you to update the list of requested groups on the description? `deploy-service` was approved to proceed at T257187... [09:42:13] (03PS1) 10Giuseppe Lavagetto: restbase: add new service-proxy endpoints [puppet] - 10https://gerrit.wikimedia.org/r/610256 (https://phabricator.wikimedia.org/T255133) [09:42:15] (03PS1) 10Giuseppe Lavagetto: restbase: use envoy for contacting MediaWiki, parsoid [puppet] - 10https://gerrit.wikimedia.org/r/610257 (https://phabricator.wikimedia.org/T255133) [09:42:37] (03CR) 10jerkins-bot: [V: 04-1] restbase: add new service-proxy endpoints [puppet] - 10https://gerrit.wikimedia.org/r/610256 (https://phabricator.wikimedia.org/T255133) (owner: 10Giuseppe Lavagetto) [09:44:31] 10Operations, 10DBA, 10CAS-SSO, 10Patch-For-Review, 10User-jbond: Request new database for idp-test.wikimedia.org - https://phabricator.wikimedia.org/T256120 (10Marostegui) @jbond the emtpy `cas_staging` database has been created on m1. You need to point your application to: `m1-master.eqiad.wmnet` and... [09:44:37] (03CR) 10Ema: [C: 03+2] varnish: simplify rate limiting for cache_upload [puppet] - 10https://gerrit.wikimedia.org/r/610027 (owner: 10Ema) [09:45:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1149', diff saved to https://phabricator.wikimedia.org/P11814 and previous config saved to /var/cache/conftool/dbconfig/20200708-094539-marostegui.json [09:45:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:33] (03PS3) 10Hashar: Explicitly mentions the repository in scap::sources [puppet] - 10https://gerrit.wikimedia.org/r/610254 (https://phabricator.wikimedia.org/T257413) [09:47:49] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/610254 (https://phabricator.wikimedia.org/T257413) (owner: 10Hashar) [09:48:53] (03CR) 10Elukey: "Looks good to me, I left a note but it is not blocking. Please run pcc again just to be sure that nothing unexpected is raised, after that" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/609783 (owner: 10Jbond) [09:50:46] !log elukey@cumin1001 START - Cookbook sre.hadoop.stop-cluster [09:50:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:02] (03CR) 10Muehlenhoff: java: update java.security to support specifying different EDG's (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/609783 (owner: 10Jbond) [09:52:03] (03PS1) 10Marostegui: production-m1.sql: Add cas grants [puppet] - 10https://gerrit.wikimedia.org/r/610259 (https://phabricator.wikimedia.org/T256120) [09:52:05] (03CR) 10Elukey: [C: 03+1] java: update java.security to support specifying different EDG's (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/609783 (owner: 10Jbond) [09:53:05] (03PS2) 10Marostegui: production-m1.sql: Add cas grants [puppet] - 10https://gerrit.wikimedia.org/r/610259 (https://phabricator.wikimedia.org/T256120) [09:53:26] (03PS2) 10Muehlenhoff: Stop installing git-lfs from stretch-backports [puppet] - 10https://gerrit.wikimedia.org/r/610015 (https://phabricator.wikimedia.org/T256877) [09:56:00] !log kormat@cumin2001 START - Cookbook sre.hosts.downtime [09:56:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:43] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [09:57:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:14] (03Abandoned) 10Ema: 5.1.3-1wm16: add discard patches [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/608606 (https://phabricator.wikimedia.org/T236754) (owner: 10Ema) [09:58:35] !log kormat@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:58:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:12] PROBLEM - k8s API server requests latencies on argon is CRITICAL: instance=10.64.32.133 verb=PATCH https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:03:10] (03CR) 10Jbond: [C: 03+1] "LGTM thx" [puppet] - 10https://gerrit.wikimedia.org/r/610259 (https://phabricator.wikimedia.org/T256120) (owner: 10Marostegui) [10:03:10] (03CR) 10Marostegui: [C: 03+2] production-m1.sql: Add cas grants [puppet] - 10https://gerrit.wikimedia.org/r/610259 (https://phabricator.wikimedia.org/T256120) (owner: 10Marostegui) [10:03:12] (03CR) 10Ema: [C: 03+1] Remove cas-puppetboard from caches [puppet] - 10https://gerrit.wikimedia.org/r/610255 (owner: 10Muehlenhoff) [10:03:14] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] paws: add project to our prometheus alert-manager system [puppet] - 10https://gerrit.wikimedia.org/r/610175 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [10:03:16] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Change toolforge error pages to use toolforge logo instead of toollabs logo [puppet] - 10https://gerrit.wikimedia.org/r/610026 (owner: 10Majavah) [10:03:21] 10Operations, 10Wikimedia-Mailing-lists: "Uncaught bounce notification" from Yahoo and AOL - https://phabricator.wikimedia.org/T257241 (10jcrespo) > Is there any way for us to configure bounce message detection ourselves I really don't see any useful configuration https://www.gnu.org/software/mailman/mailman-... [10:04:53] (03CR) 10Jbond: [C: 03+2] java: update java.security to support specifying different EDG's [puppet] - 10https://gerrit.wikimedia.org/r/609783 (owner: 10Jbond) [10:05:08] 10Operations, 10Wikimedia-Mailing-lists: "Uncaught bounce notification" from Yahoo and AOL - https://phabricator.wikimedia.org/T257241 (10jcrespo) p:05Triage→03Medium [10:06:53] (03CR) 10Jbond: [C: 03+2] profile::idp: enable java::Security [puppet] - 10https://gerrit.wikimedia.org/r/609784 (owner: 10Jbond) [10:08:58] RECOVERY - k8s API server requests latencies on argon is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api [10:10:51] (03PS1) 10Jbond: java: use content not source on file resource [puppet] - 10https://gerrit.wikimedia.org/r/610262 [10:11:15] (03CR) 10Jbond: [V: 03+2 C: 03+2] java: use content not source on file resource [puppet] - 10https://gerrit.wikimedia.org/r/610262 (owner: 10Jbond) [10:13:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11815 and previous config saved to /var/cache/conftool/dbconfig/20200708-101313-marostegui.json [10:13:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:38] 10Operations, 10Keyholder: After arming a new key in keyholder, the identity file path does not show up - https://phabricator.wikimedia.org/T257329 (10jcrespo) @Dzahn Nothing broke in fact, but the comment looks odd and consusing, so I would like to regenerate for this, and also to change the default size of t... [10:17:52] 10Operations, 10Keyholder: After arming a new key in keyholder, the identity file path does not show up - https://phabricator.wikimedia.org/T257329 (10jcrespo) I wonder if we should stop using the path, or just a proper identifier like netbox does. [10:19:10] (03PS2) 10Giuseppe Lavagetto: restbase: add new service-proxy endpoints [puppet] - 10https://gerrit.wikimedia.org/r/610256 (https://phabricator.wikimedia.org/T255133) [10:19:12] (03PS2) 10Giuseppe Lavagetto: restbase: use envoy for contacting MediaWiki, parsoid [puppet] - 10https://gerrit.wikimedia.org/r/610257 (https://phabricator.wikimedia.org/T255133) [10:20:02] (03PS1) 10Jbond: idp: update database to use m1 [puppet] - 10https://gerrit.wikimedia.org/r/610264 (https://phabricator.wikimedia.org/T256120) [10:20:12] (03PS2) 10Muehlenhoff: Remove cas-puppetboard from caches [puppet] - 10https://gerrit.wikimedia.org/r/610255 (https://phabricator.wikimedia.org/T238924) [10:20:45] (03CR) 10Jbond: [C: 03+2] idp: update database to use m1 [puppet] - 10https://gerrit.wikimedia.org/r/610264 (https://phabricator.wikimedia.org/T256120) (owner: 10Jbond) [10:20:55] (03CR) 10Hashar: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/474/deploy1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/610254 (https://phabricator.wikimedia.org/T257413) (owner: 10Hashar) [10:22:22] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/610189 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [10:25:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1084', diff saved to https://phabricator.wikimedia.org/P11816 and previous config saved to /var/cache/conftool/dbconfig/20200708-102500-marostegui.json [10:25:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:54] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P11817 and previous config saved to /var/cache/conftool/dbconfig/20200708-102553-marostegui.json [10:25:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:33] (03CR) 10Muehlenhoff: [C: 03+2] Remove cas-puppetboard from caches [puppet] - 10https://gerrit.wikimedia.org/r/610255 (https://phabricator.wikimedia.org/T238924) (owner: 10Muehlenhoff) [10:27:35] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/23759/" [puppet] - 10https://gerrit.wikimedia.org/r/610256 (https://phabricator.wikimedia.org/T255133) (owner: 10Giuseppe Lavagetto) [10:28:14] <_joe_> moritzm: can I merge your patch? [10:30:12] (03PS1) 10Lucas Werkmeister (WMDE): DNM: Load WikibaseClient using extension registration in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 [10:30:52] (03CR) 10Lucas Werkmeister (WMDE): [C: 04-2] "DNM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (owner: 10Lucas Werkmeister (WMDE)) [10:31:03] (03CR) 10Jforrester: "🤩" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (owner: 10Lucas Werkmeister (WMDE)) [10:32:30] (03CR) 10Lucas Werkmeister (WMDE): [C: 04-2] "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (owner: 10Lucas Werkmeister (WMDE)) [10:32:35] ack, please do [10:33:47] jouncebot: next [10:33:48] In 0 hour(s) and 26 minute(s): European mid-day backport window(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200708T1100) [10:35:39] (03CR) 10Lucas Werkmeister (WMDE): [C: 04-2] DNM: Load WikibaseClient using extension registration in beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (owner: 10Lucas Werkmeister (WMDE)) [10:35:59] (03PS3) 10Alexandros Kosiaris: proton: Add upload-lb IPs to calico configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/610058 [10:36:01] (03PS1) 10Alexandros Kosiaris: calico: Add text-lb.{eqsin,esams,ulsfo} IPs [deployment-charts] - 10https://gerrit.wikimedia.org/r/610266 [10:36:09] 10Operations, 10Scap, 10Patch-For-Review, 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO (Release-Engineering-Team-TODO (2020-07-01 to 2020-09-30 (Q1))): scap configuration in puppet defaults to forge the git repo name with 'med... - https://phabricator.wikimedia.org/T257413 [10:36:29] (03PS1) 10Hashar: scap::sources stop assuming mediawiki/services as a prefix [puppet] - 10https://gerrit.wikimedia.org/r/610267 (https://phabricator.wikimedia.org/T257413) [10:37:06] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/610267 (https://phabricator.wikimedia.org/T257413) (owner: 10Hashar) [10:37:25] (03PS1) 10Majavah: Add nature.com to commonswiki wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610268 (https://phabricator.wikimedia.org/T254342) [10:37:44] (03CR) 10jerkins-bot: [V: 04-1] scap::sources stop assuming mediawiki/services as a prefix [puppet] - 10https://gerrit.wikimedia.org/r/610267 (https://phabricator.wikimedia.org/T257413) (owner: 10Hashar) [10:38:57] (03PS4) 10MarcoAurelio: [arwiki] Grant 'patrolmarks' to all [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609990 (https://phabricator.wikimedia.org/T257106) [10:39:22] (03PS3) 10MarcoAurelio: [hiwikibooks] Translate sitename for hi.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609991 (https://phabricator.wikimedia.org/T256587) [10:40:31] (03PS2) 10Hashar: scap::sources stop assuming mediawiki/services as a prefix [puppet] - 10https://gerrit.wikimedia.org/r/610267 (https://phabricator.wikimedia.org/T257413) [10:41:09] (03PS2) 10MarcoAurelio: Undeploy graphoid for phase 1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610234 (https://phabricator.wikimedia.org/T257402) (owner: 10Seddon) [10:43:19] (03CR) 10Jforrester: DNM: Load WikibaseClient using extension registration in beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (owner: 10Lucas Werkmeister (WMDE)) [10:43:42] 10Operations, 10Patch-For-Review: serve our production ssh known_hosts file over public HTTPS - https://phabricator.wikimedia.org/T257219 (10jcrespo) p:05Triage→03Medium Feel free to alter the priority, only setting it so to remove it from untriaged tickets list. [10:43:48] (03CR) 10Jforrester: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (owner: 10Lucas Werkmeister (WMDE)) [10:44:27] 10Operations, 10Prod-Kubernetes, 10serviceops, 10Kubernetes, 10Patch-For-Review: Move proton to use TLS only - https://phabricator.wikimedia.org/T255877 (10jcrespo) p:05Triage→03Medium [10:44:34] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/610267 (https://phabricator.wikimedia.org/T257413) (owner: 10Hashar) [10:44:37] 10Operations, 10observability, 10Patch-For-Review: Leverage Grafana annotations to show events in graphs - https://phabricator.wikimedia.org/T222826 (10akosiaris) [10:44:42] (03CR) 10MarcoAurelio: [C: 04-1] Undeploy graphoid for phase 1 wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610234 (https://phabricator.wikimedia.org/T257402) (owner: 10Seddon) [10:44:59] (03PS3) 10Giuseppe Lavagetto: restbase: use envoy for contacting MediaWiki, parsoid [puppet] - 10https://gerrit.wikimedia.org/r/610257 (https://phabricator.wikimedia.org/T255133) [10:45:01] (03PS1) 10Giuseppe Lavagetto: services_proxy: fix the mwapi-async listener [puppet] - 10https://gerrit.wikimedia.org/r/610269 [10:45:10] !log installing json-c security updates [10:45:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:48] 10Operations, 10Keyholder: After arming a new key in keyholder, the identity file path does not show up - https://phabricator.wikimedia.org/T257329 (10jcrespo) p:05Triage→03Low a:03jcrespo [10:45:53] (03CR) 10Lucas Werkmeister (WMDE): [C: 04-2] DNM: Load WikibaseClient using extension registration in beta (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (owner: 10Lucas Werkmeister (WMDE)) [10:46:18] (03CR) 10Alexandros Kosiaris: [C: 03+2] calico: Add text-lb.{eqsin,esams,ulsfo} IPs [deployment-charts] - 10https://gerrit.wikimedia.org/r/610266 (owner: 10Alexandros Kosiaris) [10:47:16] (03PS1) 10Muehlenhoff: Add library hint for json-c [puppet] - 10https://gerrit.wikimedia.org/r/610270 [10:47:21] (03Merged) 10jenkins-bot: calico: Add text-lb.{eqsin,esams,ulsfo} IPs [deployment-charts] - 10https://gerrit.wikimedia.org/r/610266 (owner: 10Alexandros Kosiaris) [10:48:27] (03CR) 10Alexandros Kosiaris: [C: 03+2] proton: Add upload-lb IPs to calico configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/610058 (owner: 10Alexandros Kosiaris) [10:49:29] (03Merged) 10jenkins-bot: proton: Add upload-lb IPs to calico configs [deployment-charts] - 10https://gerrit.wikimedia.org/r/610058 (owner: 10Alexandros Kosiaris) [10:49:54] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/23760/" [puppet] - 10https://gerrit.wikimedia.org/r/610269 (owner: 10Giuseppe Lavagetto) [10:50:25] 10Operations: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10jcrespo) Before touching this, I would like Moritz CC'd to ok this request. [10:50:26] (03PS1) 10Kormat: Revert "es2020: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/610092 [10:50:41] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [10:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:58] !log apply calico egress policies [10:51:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:12] !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [10:51:14] (03CR) 10Kormat: [C: 03+2] Revert "es2020: Disable notifications" [puppet] - 10https://gerrit.wikimedia.org/r/610092 (owner: 10Kormat) [10:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:23] !log akosiaris@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' . [10:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:21] (03CR) 10Hashar: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1003/476/deploy1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/610267 (https://phabricator.wikimedia.org/T257413) (owner: 10Hashar) [10:57:50] (03PS3) 10Seddon: Undeploy graphoid for phase 1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610234 (https://phabricator.wikimedia.org/T257402) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy European mid-day backport window(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200708T1100). [11:00:04] awight, tgr, hauskatze, and Majavah: A patch you scheduled for European mid-day backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:12] here [11:00:56] meow [11:01:01] (03CR) 10CDanis: [C: 03+1] varnish: apply 'public_clouds_shutdown' to all requests [puppet] - 10https://gerrit.wikimedia.org/r/610031 (owner: 10Ema) [11:01:28] here [11:01:55] I can deploy. [11:02:00] by the skin of my teeth [11:02:14] How much skin do people have on their teeth, anyway? [11:02:15] actually got an urgent thing to do, I'll deploy my patch at us morning window if I can't make it back on time [11:02:42] (03CR) 10Awight: [C: 03+2] "BACON" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610056 (https://phabricator.wikimedia.org/T257306) (owner: 10Awight) [11:03:26] Majavah: If you want to hand off testing & validation to someone else who knows your patch, feel free. [11:03:32] (03Merged) 10jenkins-bot: Provision WMDE TeWü survey for prototype 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610056 (https://phabricator.wikimedia.org/T257306) (owner: 10Awight) [11:05:33] awight: it's a commons upload-by-url domain, should be fairly simple [11:05:46] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P11818 and previous config saved to /var/cache/conftool/dbconfig/20200708-110546-marostegui.json [11:05:52] !log akosiaris@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' . [11:05:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:04] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: BACON: [[gerrit:610056|Provision WMDE TeWü survey for prototype 1 (T257306)]], file 1/2 (duration: 01m 16s) [11:06:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:09] T257306: Create QuickSurvey for Prototype 1 update - https://phabricator.wikimedia.org/T257306 [11:06:35] my patch is beta only. just needs a merge and a rebase in mw-staging. [11:06:55] tgr: +1 I'll get that now [11:07:10] (03CR) 10Awight: [C: 03+2] "BACON" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608892 (https://phabricator.wikimedia.org/T256828) (owner: 10Gergő Tisza) [11:07:21] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610056|Provision WMDE TeWü survey for prototype 1 (T257306)]], file 2/2 (duration: 01m 03s) [11:07:22] (03CR) 10jerkins-bot: [V: 04-1] Remove old incorrect GrowthExperiments survey config from beta kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608892 (https://phabricator.wikimedia.org/T256828) (owner: 10Gergő Tisza) [11:07:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:08:06] (03PS2) 10Arturo Borrero Gonzalez: toolforge: urlproxy: drop support for the legacy routing scheme [puppet] - 10https://gerrit.wikimedia.org/r/610029 (https://phabricator.wikimedia.org/T234617) [11:08:43] (03PS2) 10Awight: Remove old incorrect GrowthExperiments survey config from beta kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608892 (https://phabricator.wikimedia.org/T256828) (owner: 10Gergő Tisza) [11:08:57] (03CR) 10Awight: "PS 2: manual rebase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608892 (https://phabricator.wikimedia.org/T256828) (owner: 10Gergő Tisza) [11:09:04] (03CR) 10Awight: [C: 03+2] "BACON" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608892 (https://phabricator.wikimedia.org/T256828) (owner: 10Gergő Tisza) [11:09:47] (03Merged) 10jenkins-bot: Remove old incorrect GrowthExperiments survey config from beta kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608892 (https://phabricator.wikimedia.org/T256828) (owner: 10Gergő Tisza) [11:10:23] hauskatze: Can I deploy your config patches, or would you rather self-deploy? [11:11:02] awight: I can't self-deploy :-) [11:11:17] ah gotcha [11:13:28] (03PS5) 10Awight: [arwiki] Grant 'patrolmarks' to all [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609990 (https://phabricator.wikimedia.org/T257106) (owner: 10MarcoAurelio) [11:13:44] (03CR) 10Awight: [C: 03+2] "BACON. Look safe, community consensus obtained." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609990 (https://phabricator.wikimedia.org/T257106) (owner: 10MarcoAurelio) [11:13:48] PROBLEM - puppet last run on idp-test1001 is CRITICAL: CRITICAL: Puppet last ran 2 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:14:05] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: urlproxy: drop support for the legacy routing scheme [puppet] - 10https://gerrit.wikimedia.org/r/610029 (https://phabricator.wikimedia.org/T234617) (owner: 10Arturo Borrero Gonzalez) [11:14:36] (03Merged) 10jenkins-bot: [arwiki] Grant 'patrolmarks' to all [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609990 (https://phabricator.wikimedia.org/T257106) (owner: 10MarcoAurelio) [11:15:28] hauskatze: patrolmarks should be ready to test on mwdebug1001 [11:15:54] testing [11:16:06] !log akosiaris@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' . [11:16:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:55] awight: looks good to me [11:16:57] arwiki one [11:17:09] patrolmarks applies on Special:ListGroupRights as expected [11:17:25] Great, and I see the "!" on Special:RecentChanges [11:18:11] !log installing libgcrypt20 security updates [11:18:11] chachi :) [11:18:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:34] (03PS4) 10Awight: [hiwikibooks] Translate sitename for hi.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609991 (https://phabricator.wikimedia.org/T256587) (owner: 10MarcoAurelio) [11:18:40] (03CR) 10Awight: [C: 03+2] "BACON" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609991 (https://phabricator.wikimedia.org/T256587) (owner: 10MarcoAurelio) [11:18:40] !log akosiaris@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' . [11:18:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:02] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:609990|[arwiki] Grant 'patrolmarks' to all (T257106)]] (duration: 01m 04s) [11:19:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:07] T257106: Add patrolmarks right to (all) group on Arabic Wikipedia - https://phabricator.wikimedia.org/T257106 [11:19:12] RECOVERY - puppet last run on idp-test1001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:19:47] (03Merged) 10jenkins-bot: [hiwikibooks] Translate sitename for hi.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609991 (https://phabricator.wikimedia.org/T256587) (owner: 10MarcoAurelio) [11:20:31] hauskatze: hiwikibooks rename is on mwdebug1001 [11:20:38] checking [11:20:52] I guess I can pull the setting via the API and see if it renamed properly [11:21:00] ty [11:21:03] UI changes takes a bit of time due to caching [11:21:11] ah that explains it [11:22:24] awight: yup https://hi.wikibooks.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%B6%E0%A5%87%E0%A4%B7:ApiSandbox#action=query&format=json&meta=siteinfo [11:22:35] sitename changes in mwdebug as expected [11:23:16] oh, and in the web too [11:24:11] awight: looks good to me [11:24:31] * Majavah is back [11:24:34] hauskatze: ack [11:24:43] Majavah: good timing :-) [11:25:26] (03PS2) 10Awight: Add nature.com to commonswiki wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610268 (https://phabricator.wikimedia.org/T254342) (owner: 10Majavah) [11:25:28] (03CR) 10Awight: [C: 03+2] "BACON" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610268 (https://phabricator.wikimedia.org/T254342) (owner: 10Majavah) [11:26:01] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:609991|[hiwikibooks] Translate sitename for hi.wikibooks (T256587)]] (duration: 01m 03s) [11:26:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:05] T256587: Change project name for Hindi Wikibooks - https://phabricator.wikimedia.org/T256587 [11:26:16] (03Merged) 10jenkins-bot: Add nature.com to commonswiki wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610268 (https://phabricator.wikimedia.org/T254342) (owner: 10Majavah) [11:26:25] I'm not autopatrolled on commons, so testing is "doesn't cause syntax errors" type [11:26:33] thanks awight <3 [11:26:47] hauskatze: Thanks for writing the patches! [11:27:21] if everything were so simple :-) [11:27:29] :-D [11:29:00] !log installing freetype security updates [11:29:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:11] Majavah: Well, it's on mwdebug1001. I'm fine with going ahead to production. [11:29:15] (03PS1) 10Jbond: mariadb::misc: Add ipt6ables rules for idp_test and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/610275 [11:29:55] awight: mwdebug1001 seems ok, should be ok to sync [11:30:35] (03CR) 10jerkins-bot: [V: 04-1] mariadb::misc: Add ipt6ables rules for idp_test and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/610275 (owner: 10Jbond) [11:31:24] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610268|Add nature.com to commonswiki wgCopyUploadDomains (T254342)]] (duration: 01m 03s) [11:31:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:29] T254342: Add nature.com to the wgCopyUploadsDomains whitelist of Wikimedia Commons - https://phabricator.wikimedia.org/T254342 [11:31:38] (03PS2) 10Jbond: mariadb::misc: Add ipt6ables rules for idp_test and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/610275 (https://phabricator.wikimedia.org/T256120) [11:31:52] thank you awight [11:32:53] (03CR) 10Awight: [C: 03+2] "BACON. Not the fault of this patch, but I've rarely seen configuration begging so hard for a boolean toggle switch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610234 (https://phabricator.wikimedia.org/T257402) (owner: 10Seddon) [11:33:01] (03PS4) 10Awight: Undeploy graphoid for phase 1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610234 (https://phabricator.wikimedia.org/T257402) (owner: 10Seddon) [11:33:05] (03CR) 10Awight: [C: 03+2] "BACON" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610234 (https://phabricator.wikimedia.org/T257402) (owner: 10Seddon) [11:33:20] Majavah: Any time! [11:33:53] (03Merged) 10jenkins-bot: Undeploy graphoid for phase 1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610234 (https://phabricator.wikimedia.org/T257402) (owner: 10Seddon) [11:35:49] seddon: graphoid disablement is ready to test on mwdebug1001 [11:36:07] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for json-c [puppet] - 10https://gerrit.wikimedia.org/r/610270 (owner: 10Muehlenhoff) [11:36:32] awight: testing [11:40:15] awight: testing complete. all good. You may proceed [11:40:24] seddon: Thanks :-) [11:40:31] lunch, see you [11:41:41] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610234|Undeploy graphoid for phase 1 wikis (T257402)]] (duration: 01m 03s) [11:41:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:47] T257402: Undeploy graphoid for phase 1 wiki's - https://phabricator.wikimedia.org/T257402 [11:42:03] !log EU BACON complete [11:42:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:44:14] thanks awight! :) [11:45:45] seddon: I'm sure there's a hydroelectric dam in the Appalachians breathing a sigh of relief, about now. [11:49:34] 10Operations: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10MoritzMuehlenhoff) What is meant to include this, only CI images or also anything in prod? [11:58:11] <3 BACON [12:01:26] 10Operations: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10hashar) For all production machines as well, that will benefit scap deployed repositories and make it slightly faster. [12:03:34] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:05:26] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:05:44] (03CR) 10Filippo Giunchedi: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/610045 (owner: 10Jbond) [12:06:52] (03PS1) 10Ayounsi: Renumber NTT links in eqiad/eqord [homer/public] - 10https://gerrit.wikimedia.org/r/610278 (https://phabricator.wikimedia.org/T254877) [12:07:04] (03PS9) 10Jbond: SSHFP: add a text file with the SSHFB of all hosts [puppet] - 10https://gerrit.wikimedia.org/r/609796 (https://phabricator.wikimedia.org/T257219) [12:07:07] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service: [EPIC] Deploy push-notifications service to production - https://phabricator.wikimedia.org/T256237 (10MSantos) [12:08:04] (03CR) 10Ayounsi: [C: 03+2] Renumber NTT links in eqiad/eqord [homer/public] - 10https://gerrit.wikimedia.org/r/610278 (https://phabricator.wikimedia.org/T254877) (owner: 10Ayounsi) [12:11:51] (03PS10) 10Jbond: SSHFP: add a text file with the SSHFB of all hosts [puppet] - 10https://gerrit.wikimedia.org/r/609796 (https://phabricator.wikimedia.org/T257219) [12:15:48] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [12:16:24] (03PS11) 10Jbond: SSHFP: add a text file with the SSHFB of all hosts [puppet] - 10https://gerrit.wikimedia.org/r/609796 (https://phabricator.wikimedia.org/T257219) [12:17:40] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [12:22:01] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/610275 (https://phabricator.wikimedia.org/T256120) (owner: 10Jbond) [12:22:34] (03CR) 10Ayounsi: [C: 03+2] Netflow: send as little options templates as possible [homer/public] - 10https://gerrit.wikimedia.org/r/609426 (https://phabricator.wikimedia.org/T240658) (owner: 10Ayounsi) [12:23:03] (03Merged) 10jenkins-bot: Netflow: send as little options templates as possible [homer/public] - 10https://gerrit.wikimedia.org/r/609426 (https://phabricator.wikimedia.org/T240658) (owner: 10Ayounsi) [12:24:24] (03PS3) 10Jbond: mariadb::misc: Add iptables rules for idp_test and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/610275 (https://phabricator.wikimedia.org/T256120) [12:24:42] (03CR) 10Jbond: mariadb::misc: Add iptables rules for idp_test and cleanup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/610275 (https://phabricator.wikimedia.org/T256120) (owner: 10Jbond) [12:25:23] 10Operations: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10MoritzMuehlenhoff) Is there a measurable performance benefit of protocol-2 over the default git? We can do that, but it comes at a cost as we need to backport future git vulnerabilties to... [12:27:04] (03CR) 10Marostegui: [C: 03+1] mariadb::misc: Add iptables rules for idp_test and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/610275 (https://phabricator.wikimedia.org/T256120) (owner: 10Jbond) [12:27:10] moritzm: the idea of backporting _vulnerabilities_ is amusing me greatly (i know that's not what you meant :) [12:29:34] (03PS12) 10Jbond: SSHFP: add a text file with the SSHFB of all hosts [puppet] - 10https://gerrit.wikimedia.org/r/609796 (https://phabricator.wikimedia.org/T257219) [12:29:53] (03CR) 10Jbond: [C: 03+2] mariadb::misc: Add iptables rules for idp_test and cleanup [puppet] - 10https://gerrit.wikimedia.org/r/610275 (https://phabricator.wikimedia.org/T256120) (owner: 10Jbond) [12:30:06] 10Operations: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10CDanis) >>! In T257308#6289071, @MoritzMuehlenhoff wrote: > Is there a measurable performance benefit of protocol-2 over the default git? We can do that, but it comes at a cost as we need... [12:32:44] PROBLEM - Postgres Replication Lag on maps2001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 305333856 and 10 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:33:02] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 33796472 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:33:56] (03PS1) 10Kormat: mysql: Add unit tests. [software/spicerack] - 10https://gerrit.wikimedia.org/r/610282 (https://phabricator.wikimedia.org/T255409) [12:34:34] RECOVERY - Postgres Replication Lag on maps2001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 77512 and 71 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:34:47] 10Operations: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10MoritzMuehlenhoff) Ok, fair enough! Let's go ahead with component/git, then. Importing the last stretch-backports isn't good idea, though, it's quite outdated (Sat, 05 Jan 2019 14:12:21 -... [12:34:52] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 157248 and 90 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:34:56] kormat: oh well :-) [12:36:45] (03CR) 10jerkins-bot: [V: 04-1] mysql: Add unit tests. [software/spicerack] - 10https://gerrit.wikimedia.org/r/610282 (https://phabricator.wikimedia.org/T255409) (owner: 10Kormat) [12:38:25] 10Operations, 10netops, 10Patch-For-Review: fastnetmon spamming /var/log on netflow hosts leading to disk saturation - https://phabricator.wikimedia.org/T240658 (10ayounsi) 05Stalled→03Resolved a:03ayounsi Confirmed that the logs are now quieter, from ~42/min to ~4/10min right now in esams. Which I th... [12:41:41] !log Deploy schema change on s7 codfw, lag is expected [12:41:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:22] (03CR) 10Jbond: "Have now updated to create the following files" [puppet] - 10https://gerrit.wikimedia.org/r/609796 (https://phabricator.wikimedia.org/T257219) (owner: 10Jbond) [12:51:49] (03PS1) 10Jbond: apereo_cas: disable TLS hostname verification [puppet] - 10https://gerrit.wikimedia.org/r/610284 (https://phabricator.wikimedia.org/T256120) [12:52:55] (03PS2) 10Ema: varnish: Facebook temporary experiment is permanent [puppet] - 10https://gerrit.wikimedia.org/r/610028 (https://phabricator.wikimedia.org/T192688) [12:53:29] (03PS1) 10Elukey: Set BigTop for the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/610285 [12:53:37] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/610284 (https://phabricator.wikimedia.org/T256120) (owner: 10Jbond) [12:53:51] (03CR) 10Marostegui: [C: 03+1] apereo_cas: disable TLS hostname verification [puppet] - 10https://gerrit.wikimedia.org/r/610284 (https://phabricator.wikimedia.org/T256120) (owner: 10Jbond) [12:54:13] (03CR) 10Ema: [C: 03+2] varnish: Facebook temporary experiment is permanent [puppet] - 10https://gerrit.wikimedia.org/r/610028 (https://phabricator.wikimedia.org/T192688) (owner: 10Ema) [12:55:02] (03CR) 10Jbond: [C: 03+2] apereo_cas: disable TLS hostname verification [puppet] - 10https://gerrit.wikimedia.org/r/610284 (https://phabricator.wikimedia.org/T256120) (owner: 10Jbond) [12:56:13] (03CR) 10Elukey: [C: 03+2] Set BigTop for the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/610285 (owner: 10Elukey) [13:00:04] twentyafterfour and James_F: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Mediawiki train - American+European Version (secondary timeslot). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200708T1300). [13:00:25] I think there's nothing to do for the train. [13:00:37] !log elukey@cumin1001 START - Cookbook sre.hadoop.stop-cluster [13:00:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:23] (03PS3) 10Hashar: ci: remove Apache config for nightlies [puppet] - 10https://gerrit.wikimedia.org/r/607075 [13:03:41] (03PS4) 10Hashar: ci: switch integration.wikimedia.org to scap DocumentRoot [puppet] - 10https://gerrit.wikimedia.org/r/607076 (https://phabricator.wikimedia.org/T149924) [13:03:57] 10Operations: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10jcrespo) > I'll do a quick backport of the current version in buster-security. I will then not touch this and leave it to your discretion. [13:06:58] (03CR) 10Hashar: [C: 03+1] "/srv/deployment/integration/docroot is now populated by scap and successfully got deployed on contint2001 / contint1001 at /srv/deployment" [puppet] - 10https://gerrit.wikimedia.org/r/607076 (https://phabricator.wikimedia.org/T149924) (owner: 10Hashar) [13:07:18] (03PS2) 10Hashar: contint: move Apache config to flat file [puppet] - 10https://gerrit.wikimedia.org/r/607524 [13:07:24] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [13:07:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:31] (03PS2) 10Hashar: doc: move Apache config to flat file [puppet] - 10https://gerrit.wikimedia.org/r/607525 [13:07:48] 10Operations: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10MoritzMuehlenhoff) >>! In T257308#6289196, @jcrespo wrote: >> I'll do a quick backport of the current version in buster-security. > > I will then not touch this and leave it to your discr... [13:08:17] (03CR) 10Hashar: [C: 03+1] contint: move Apache config to flat file [puppet] - 10https://gerrit.wikimedia.org/r/607524 (owner: 10Hashar) [13:08:35] (03CR) 10Hashar: [C: 03+1] doc: move Apache config to flat file [puppet] - 10https://gerrit.wikimedia.org/r/607525 (owner: 10Hashar) [13:08:37] (03CR) 10Jbond: "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/610045 (owner: 10Jbond) [13:10:57] !log elukey@cumin1001 START - Cookbook sre.hadoop.change-distro [13:11:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:36] (03CR) 10Ema: [C: 03+2] varnish: apply 'public_clouds_shutdown' to all requests [puppet] - 10https://gerrit.wikimedia.org/r/610031 (owner: 10Ema) [13:18:34] 10Operations, 10Analytics-Clusters, 10netops, 10Patch-For-Review: Move netflow data to Eventgate Analytics - https://phabricator.wikimedia.org/T248865 (10Ottomata) > doesn't support emitting HTTP POST to an endpoint (like eventgate). Well if it doesn't support HTTP POST then you won't be moving it to Event... [13:18:38] (03CR) 10Hashar: [C: 03+1] "Note the Gerrit database connection parameters have been entirely removed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/606549" [puppet] - 10https://gerrit.wikimedia.org/r/609884 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [13:21:18] (03PS1) 10Muehlenhoff: Add component/git for stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/610289 (https://phabricator.wikimedia.org/T257308) [13:22:33] (03PS11) 10Jbond: profile::librenms: update to use lookup instead of hiera call [puppet] - 10https://gerrit.wikimedia.org/r/610018 (https://phabricator.wikimedia.org/T256958) [13:22:43] (03PS10) 10Jbond: librenms: add support for apereo cas [puppet] - 10https://gerrit.wikimedia.org/r/610030 (https://phabricator.wikimedia.org/T256958) [13:23:32] 10Operations, 10Patch-For-Review: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10hashar) >>! In T257308#6289071, @MoritzMuehlenhoff wrote: > Is there a measurable performance benefit of protocol-2 over the default git? We can do that, but it comes... [13:26:17] (03CR) 10Muehlenhoff: [C: 03+2] Add component/git for stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/610289 (https://phabricator.wikimedia.org/T257308) (owner: 10Muehlenhoff) [13:31:26] !log imported git 2.20.1-2+deb10u3~wmf1 for stretch-wikimedia component/git T257308 [13:31:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:32] T257308: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 [13:31:36] !log replacing ssh key for ci_docroot at deploy1001 [13:31:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:14] 10Operations, 10Patch-For-Review: Upload git 2.20 package from stretch-backports to component/git - https://phabricator.wikimedia.org/T257308 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff 2.20.1-2+deb10u3~wmf1 built and imported to component/git [13:32:16] 10Operations, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10MoritzMuehlenhoff) [13:32:45] key arming will complaing for some seconds [13:33:53] 10Operations, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10MoritzMuehlenhoff) [13:39:02] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0) [13:39:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:06] * elukey dances [13:47:31] RECOVERY - Keyholder SSH agent on deploy2001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder [13:49:43] 10Operations, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10hashar) [13:50:00] (03PS1) 10Jbond: librenms: update librenms to use apereo_cas SSO [puppet] - 10https://gerrit.wikimedia.org/r/610291 (https://phabricator.wikimedia.org/T256958) [13:53:22] 10Operations, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10MoritzMuehlenhoff) >>! In T256877#6284764, @aborrero wrote: > @MoritzMuehlenhoff I'm now thinking this is going to happen with every single debian release (archival of the backports repo). > > P... [13:57:23] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production infrastructure services for jgiannelos - https://phabricator.wikimedia.org/T257187 (10Mholloway) [13:58:24] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production infrastructure services for jgiannelos - https://phabricator.wikimedia.org/T257187 (10Mholloway) @jcrespo Done. [13:59:41] !log elukey@cumin1001 START - Cookbook sre.hadoop.stop-cluster [13:59:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:46] !log elukey@cumin1001 END (ERROR) - Cookbook sre.hadoop.stop-cluster (exit_code=97) [13:59:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:23] (03PS1) 10Elukey: Revert "Set BigTop for the Hadoop test cluster" [puppet] - 10https://gerrit.wikimedia.org/r/610093 [14:01:22] (03CR) 10Elukey: [C: 03+2] Revert "Set BigTop for the Hadoop test cluster" [puppet] - 10https://gerrit.wikimedia.org/r/610093 (owner: 10Elukey) [14:03:25] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [14:04:21] PROBLEM - PHP7 rendering on mw1346 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 869 bytes in 0.134 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [14:04:21] !log rebooting idp-test1001 for kernel update [14:04:24] (03PS11) 10Jbond: librenms: add support for apereo cas [puppet] - 10https://gerrit.wikimedia.org/r/610030 (https://phabricator.wikimedia.org/T256958) [14:04:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:27] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [14:04:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:59] !log elukey@cumin1001 START - Cookbook sre.hadoop.stop-cluster [14:05:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:48] (03PS2) 10Jbond: librenms: update librenms to use apereo_cas SSO [puppet] - 10https://gerrit.wikimedia.org/r/610291 (https://phabricator.wikimedia.org/T256958) [14:06:52] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [14:06:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:26] (03CR) 10RLazarus: [C: 03+1] annualreport: update redirect from 2018 to 2019 report [puppet] - 10https://gerrit.wikimedia.org/r/609888 (https://phabricator.wikimedia.org/T257257) (owner: 10Dzahn) [14:11:48] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [14:11:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:06] (03PS12) 10Jbond: librenms: add support for apereo cas [puppet] - 10https://gerrit.wikimedia.org/r/610030 (https://phabricator.wikimedia.org/T256958) [14:12:11] !log elukey@cumin1001 START - Cookbook sre.hadoop.change-distro [14:12:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:42] <_joe_> !log depooling mw1346 [14:12:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:58] (03CR) 10Ottomata: [C: 03+2] "I'm going to merge this, but unless there is some urgency I won't build and deploy a new deb package. Instead, we'll wait until the next " [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/602386 (https://phabricator.wikimedia.org/T229347) (owner: 10Ottomata) [14:15:29] !log switch icinga authentication to CAS SSO [14:15:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:10] (03CR) 10Jbond: [C: 03+2] icinga: switch icinga to use apereo cas for authentication [puppet] - 10https://gerrit.wikimedia.org/r/608305 (https://phabricator.wikimedia.org/T251513) (owner: 10Jbond) [14:16:31] (03CR) 10Ottomata: [C: 03+2] Default PYSPARK_PYTHON to exact versioned python executable used on driver. (031 comment) [debs/spark2] (debian) - 10https://gerrit.wikimedia.org/r/602386 (https://phabricator.wikimedia.org/T229347) (owner: 10Ottomata) [14:24:25] 10Operations, 10Keyholder: After arming a new key in keyholder, the identity file path does not show up - https://phabricator.wikimedia.org/T257329 (10jcrespo) 05Open→03Resolved I have documented the procedure to avoid confusion in the future. Also the identity addition: https://wikitech.wikimedia.org/wiki... [14:24:52] <_joe_> !log php7adm /opcache-free on mw1346 [14:24:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:25:10] RECOVERY - PHP7 rendering on mw1346 is OK: HTTP OK: HTTP/1.1 200 OK - 82426 bytes in 0.266 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [14:25:20] <_joe_> there it is, sigh [14:25:30] <_joe_> we really, really need to stop revalidating opcache [14:25:31] T253673 then? [14:25:31] T253673: Avoid php-opcache corruption in WMF production - https://phabricator.wikimedia.org/T253673 [14:25:50] <_joe_> yes [14:25:56] <_joe_> please report the whole stack trace [14:26:28] <_joe_> !log repooling mw1346 [14:26:28] I will paste it there, ok and you cleared the opcache to "solve" it, right? [14:26:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:22] (03PS1) 10Muehlenhoff: Make debmonitor point to the main Icinga vhost now that CAS is the default [puppet] - 10https://gerrit.wikimedia.org/r/610298 [14:31:07] !log installing gdk-pixbuf security updates [14:31:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:51] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99) [14:31:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:56] (03PS3) 10Jbond: librenms: update librenms to use apereo_cas SSO [puppet] - 10https://gerrit.wikimedia.org/r/610291 (https://phabricator.wikimedia.org/T256958) [14:35:54] (03PS1) 10Jbond: icinga: permissions add both cases for Vgutierrez [puppet] - 10https://gerrit.wikimedia.org/r/610299 (https://phabricator.wikimedia.org/T256656) [14:36:18] vgutierrez: ^^^ to for the icinga workaround [14:37:05] (03CR) 10Vgutierrez: [C: 03+1] icinga: permissions add both cases for Vgutierrez [puppet] - 10https://gerrit.wikimedia.org/r/610299 (https://phabricator.wikimedia.org/T256656) (owner: 10Jbond) [14:37:15] (03CR) 10Jbond: [C: 03+2] icinga: permissions add both cases for Vgutierrez [puppet] - 10https://gerrit.wikimedia.org/r/610299 (https://phabricator.wikimedia.org/T256656) (owner: 10Jbond) [14:37:16] jbond42: <3 [14:37:32] :) [14:38:56] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [14:46:24] !log installing isc-dhcp security updates [14:46:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:55] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [14:46:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:19] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [14:48:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:42] (03PS1) 10Kormat: Pin to a working version of prospector [software/spicerack] - 10https://gerrit.wikimedia.org/r/610301 [14:59:23] (03PS8) 10MSantos: charts for push-notification service [deployment-charts] - 10https://gerrit.wikimedia.org/r/602390 (https://phabricator.wikimedia.org/T250491) [15:00:20] (03CR) 10jerkins-bot: [V: 04-1] charts for push-notification service [deployment-charts] - 10https://gerrit.wikimedia.org/r/602390 (https://phabricator.wikimedia.org/T250491) (owner: 10MSantos) [15:01:16] (03CR) 10Jbond: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/23772/netmon2001.wikimedia.org/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/610291 (https://phabricator.wikimedia.org/T256958) (owner: 10Jbond) [15:02:39] (03PS1) 10RLazarus: Disable IPv6 tests, which fail in the build container. [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/610303 [15:03:18] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10Services, 10Service-deployment-requests: New Service Request: Wikimedia push notification service - https://phabricator.wikimedia.org/T250452 (10MSantos) [15:03:19] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10serviceops: Deploy push-notifications service to Kubernetes - https://phabricator.wikimedia.org/T256973 (10MSantos) [15:03:38] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10Services, 10Service-deployment-requests: New Service Request: Wikimedia push notification service - https://phabricator.wikimedia.org/T250452 (10MSantos) [15:04:20] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [15:04:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:31] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Disable IPv6 tests, which fail in the build container. [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/610303 (owner: 10RLazarus) [15:04:36] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:06:22] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [15:06:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:50] (03CR) 10RLazarus: [C: 03+2] Disable IPv6 tests, which fail in the build container. [debs/envoyproxy] - 10https://gerrit.wikimedia.org/r/610303 (owner: 10RLazarus) [15:10:16] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:10:52] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10Services, 10Service-deployment-requests: New Service Request: Wikimedia push notification service - https://phabricator.wikimedia.org/T250452 (10MSantos) [15:12:41] !log rebooting people1002 (people.wikimedia.org) for kernel security update [15:12:44] !log jmm@cumin2001 START - Cookbook sre.hosts.reboot-single [15:12:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:24] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service: [EPIC] Deploy push-notifications service to production - https://phabricator.wikimedia.org/T256237 (10MSantos) [15:13:29] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10Services, 10Service-deployment-requests: New Service Request: Wikimedia push notification service - https://phabricator.wikimedia.org/T250452 (10MSantos) [15:15:29] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production infrastructure services for jgiannelos - https://phabricator.wikimedia.org/T257187 (10jcrespo) The rest of the groups seem reasonable and within the approval of @dcipoletti, so scheduling this for deploy. [15:16:39] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) [15:16:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:01] 10Operations, 10Analytics: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) [15:18:17] (03CR) 10Jbond: [C: 03+1] Make debmonitor point to the main Icinga vhost now that CAS is the default [puppet] - 10https://gerrit.wikimedia.org/r/610298 (owner: 10Muehlenhoff) [15:19:39] (03PS2) 10Kormat: mysql: Add unit tests. [software/spicerack] - 10https://gerrit.wikimedia.org/r/610282 (https://phabricator.wikimedia.org/T255409) [15:24:05] 10Operations, 10Analytics: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10elukey) Hi! We are trying to move users away from Hue if possible, what is your use case? Have you tried, by any chance, https://superset.wikimedia.org/superset/sqllab ? [15:27:17] (03CR) 10Ppchelko: [C: 03+1] "I'll deploy this today in the config deploy window and then merge the EventBus change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610160 (https://phabricator.wikimedia.org/T229863) (owner: 10Ottomata) [15:27:21] (03PS2) 10Ppchelko: Add wgEventServiceDefault to refactor EventBus event stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610160 (https://phabricator.wikimedia.org/T229863) (owner: 10Ottomata) [15:32:54] (03PS1) 10Ebernhardson: Deploy analytics refinery to airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/610310 [15:36:11] (03CR) 10Jbond: "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/610045 (owner: 10Jbond) [15:39:47] (03CR) 10Elukey: Deploy analytics refinery to airflow instances (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/610310 (owner: 10Ebernhardson) [15:40:03] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users and nda groups for edtadros - https://phabricator.wikimedia.org/T256435 (10Edtadros) @Nuria I will have @Jrbranaa answer that since he is dealing with my contract now. [15:58:01] (03PS1) 10Arturo Borrero Gonzalez: toolforge: legacy-redirector: introduce the Access-Control-Allow-Origin header [puppet] - 10https://gerrit.wikimedia.org/r/610314 (https://phabricator.wikimedia.org/T257469) [15:58:36] (03PS2) 10Ebernhardson: Deploy analytics refinery to airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/610310 [15:58:39] (03CR) 10Ebernhardson: Deploy analytics refinery to airflow instances (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/610310 (owner: 10Ebernhardson) [16:04:27] (03CR) 10Ottomata: [C: 03+1] Stop installing git-lfs from stretch-backports [puppet] - 10https://gerrit.wikimedia.org/r/610015 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [16:05:37] (03PS2) 10Arturo Borrero Gonzalez: toolforge: legacy-redirector: introduce the Access-Control-Allow-Origin header [puppet] - 10https://gerrit.wikimedia.org/r/610314 (https://phabricator.wikimedia.org/T257469) [16:08:31] (03CR) 10BryanDavis: [C: 03+1] toolforge: legacy-redirector: introduce the Access-Control-Allow-Origin header [puppet] - 10https://gerrit.wikimedia.org/r/610314 (https://phabricator.wikimedia.org/T257469) (owner: 10Arturo Borrero Gonzalez) [16:11:02] 10Operations, 10Wikimedia-General-or-Unknown: Periodically run purgeExpiredBlocks.php maintenance script - https://phabricator.wikimedia.org/T257473 (10Reedy) [16:11:34] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: legacy-redirector: introduce the Access-Control-Allow-Origin header [puppet] - 10https://gerrit.wikimedia.org/r/610314 (https://phabricator.wikimedia.org/T257469) (owner: 10Arturo Borrero Gonzalez) [16:11:37] 10Operations, 10Analytics: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) My use case is to request data from event.WMDEBanner* tables. I'm using beeline right now, but find it hard to read at times. We're planning to request ingestion of Hive data to Druid for the... [16:13:25] 10Operations, 10Wikimedia-General-or-Unknown: Periodically run purgeExpiredBlocks.php maintenance script - https://phabricator.wikimedia.org/T257473 (10Reedy) 05Open→03Stalled Stalling until next weeks train has been run and the script is then on all WMF deployed MW branches [16:16:08] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops, 10cloud-services-team (Hardware): (Need By: 2020-06-12) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10wiki_willy) @Cmjohnson - I chatted with Arzhel a bit earlier today, and he's going to get these dedicated 10g switches... [16:16:13] 10Operations, 10Analytics: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10elukey) >>! In T257466#6290096, @kai.nissen wrote: > My use case is to request data from event.WMDEBanner* tables. I'm using beeline right now, but find it hard to read at times. > > We're planning to re... [16:17:29] 10Operations, 10serviceops, 10Wikimedia-production-error: PHP7 corruption: Method call executed on unrelated object (also: Call to undefined method) - https://phabricator.wikimedia.org/T245183 (10Krinkle) Copying here so that this task remains complete for analysis of the problem, separate from T253673 which... [16:19:12] (03CR) 10Elukey: Deploy analytics refinery to airflow instances (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/610310 (owner: 10Ebernhardson) [16:23:50] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: 2020-06-20) rack/setup/install cloudcephosd10[04-15].wikimedia.org - https://phabricator.wikimedia.org/T251619 (10Cmjohnson) I have updated the switch port descriptions but have not set any vlans. [16:27:29] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [16:27:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:28:29] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/23773/restbase1025.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/610257 (https://phabricator.wikimedia.org/T255133) (owner: 10Giuseppe Lavagetto) [16:32:25] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10Cmjohnson) +an-test-worker1001 1H IN A 10.65.0.68 +an-test-worker1002 1H IN A 10.65.0.69 +an-test-worker1003... [16:32:29] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [16:32:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:56] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10Cmjohnson) [16:34:14] 10Operations, 10Analytics: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) Great, that's all I need! Sorry, I missed the documentation update. Does that mean, I can also just add `event.wmdebanner*` tables as described in [Analytics/Systems/Superset#Druid_datasource... [16:35:26] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10Cmjohnson) [16:37:01] 10Operations, 10Analytics: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10elukey) >>! In T257466#6290255, @kai.nissen wrote: > Great, that's all I need! Sorry, I missed the documentation update. > > Does that mean, I can also just add `event.wmdebanner*` tables as described in... [16:37:51] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [16:37:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:45] (03CR) 10Elukey: [C: 03+2] Deploy analytics refinery to airflow instances [puppet] - 10https://gerrit.wikimedia.org/r/610310 (owner: 10Ebernhardson) [16:39:53] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10Cmjohnson) +an-test-coord1001 1H IN A 10.65.0.73 +an-test-master1001 1H IN A 10.65.0.71 +an-test-master1002... [16:40:16] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [16:40:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:23] <_joe_> !log restarting restbase on restbase2010 to route calls to mediawiki, parsoid via envoy [16:40:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:46] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10Cmjohnson) [16:42:16] 10Operations, 10ops-eqiad, 10netops: (Need by: 2019-09-30) upgrade msw1-eqiad from EX4200 to EX4300 - https://phabricator.wikimedia.org/T225121 (10Cmjohnson) [16:43:20] (03PS1) 10Elukey: role::druid::test_analytics::worker: silence icinga notifications [puppet] - 10https://gerrit.wikimedia.org/r/610321 [16:44:17] 10Operations, 10ops-eqiad, 10netops: (Need by: 2019-09-30) upgrade msw1-eqiad from EX4200 to EX4300 - https://phabricator.wikimedia.org/T225121 (10Cmjohnson) 05Open→03Resolved netbox updated, old switch removed. updated cable id's. Resolving [16:45:01] 10Operations, 10ops-eqiad: Audit msw1-eqiad cables - https://phabricator.wikimedia.org/T245188 (10Cmjohnson) 05Open→03Resolved Audit is complete, labels were updated in netbox [16:46:46] (03CR) 10Elukey: [C: 03+2] role::druid::test_analytics::worker: silence icinga notifications [puppet] - 10https://gerrit.wikimedia.org/r/610321 (owner: 10Elukey) [16:50:13] (03PS1) 10Elukey: Set BigTop for the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/610322 [16:57:44] <_joe_> !log restarting restbase across the fleet to transition to using envoy [16:57:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:49] (03CR) 10Elukey: [C: 03+2] Set BigTop for the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/610322 (owner: 10Elukey) [17:02:17] 10Operations, 10ops-eqiad, 10netops: (Need by: 2019-09-30) upgrade msw1-eqiad from EX4200 to EX4300 - https://phabricator.wikimedia.org/T225121 (10ayounsi) 05Resolved→03Open I think there is something wrong: https://netbox.wikimedia.org/dcim/devices/50/ is the old one but still have all the cables https... [17:08:08] !log elukey@cumin1001 START - Cookbook sre.hadoop.stop-cluster [17:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:16:19] !log elukey@cumin1001 END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) [17:16:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:16:47] !log elukey@cumin1001 START - Cookbook sre.hadoop.change-distro [17:16:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:57] 10Operations, 10Analytics: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) 05Open→03Resolved a:03kai.nissen Yes, it works fine! I was already going beyond and trying > The documentation that you pointed out is related to druid datasources, so basically after h... [17:18:07] 10Operations, 10Analytics: Grant user knissen access to Hue - https://phabricator.wikimedia.org/T257466 (10kai.nissen) a:05kai.nissen→03None [17:21:45] (03PS1) 10BryanDavis: wmcs: Migrate wmcs-novastats-flavorreport to py3 [puppet] - 10https://gerrit.wikimedia.org/r/610326 [17:22:24] (03PS8) 10DCausse: [wcqs] gui custom config [puppet] - 10https://gerrit.wikimedia.org/r/606297 (https://phabricator.wikimedia.org/T251514) (owner: 10Mstyles) [17:23:40] (03CR) 10jerkins-bot: [V: 04-1] [wcqs] gui custom config [puppet] - 10https://gerrit.wikimedia.org/r/606297 (https://phabricator.wikimedia.org/T251514) (owner: 10Mstyles) [17:24:08] (03CR) 10Andrew Bogott: [C: 03+2] wmcs: Migrate wmcs-novastats-flavorreport to py3 [puppet] - 10https://gerrit.wikimedia.org/r/610326 (owner: 10BryanDavis) [17:26:14] (03CR) 10Muehlenhoff: [C: 03+2] Make debmonitor point to the main Icinga vhost now that CAS is the default [puppet] - 10https://gerrit.wikimedia.org/r/610298 (owner: 10Muehlenhoff) [17:26:37] (03PS9) 10DCausse: [wcqs] gui custom config [puppet] - 10https://gerrit.wikimedia.org/r/606297 (https://phabricator.wikimedia.org/T251514) (owner: 10Mstyles) [17:31:49] (03CR) 10Bstorm: [C: 03+2] paws: add project to our prometheus alert-manager system [puppet] - 10https://gerrit.wikimedia.org/r/610175 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [17:34:59] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99) [17:35:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:40:00] (03CR) 10Bstorm: "> Patch Set 2: Code-Review+1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/610189 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [17:40:28] 10Operations, 10Wikimedia-Mailing-lists: Create secondary mailinglist for german arbcom - https://phabricator.wikimedia.org/T256306 (10Dzahn) Hi @Luke081515 so the reason jcrespo said this is because the English list arbcom-l used to have a special login on top of the normal mailman auth to access the list arc... [17:40:54] (03CR) 10Bstorm: "I was figuring the prometheus dashboard will let me know :-D" [puppet] - 10https://gerrit.wikimedia.org/r/610189 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [18:00:04] twentyafterfour and James_F: (Dis)respected human, time to deploy Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200708T1800). Please do the needful. [18:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Morning backport window(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200708T1800). [18:00:04] Jdlrobson and Pchelolo: A patch you scheduled for Morning backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:12] I'll do it [18:00:22] (03PS8) 10Ryan Kemper: Scale largest shards to be closer to 30GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608965 (https://phabricator.wikimedia.org/T256928) [18:01:19] (03CR) 10Ppchelko: [C: 03+2] Add wgEventServiceDefault to refactor EventBus event stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610160 (https://phabricator.wikimedia.org/T229863) (owner: 10Ottomata) [18:02:19] (03Merged) 10jenkins-bot: Add wgEventServiceDefault to refactor EventBus event stream config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610160 (https://phabricator.wikimedia.org/T229863) (owner: 10Ottomata) [18:04:45] !log ppchelko@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 T229863 (duration: 01m 04s) [18:04:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:50] T229863: Refactor EventBus mediawiki configuration - https://phabricator.wikimedia.org/T229863 [18:08:15] hm.. the logstash check failed, but seems it was an unlucky coincidence [18:09:14] 10Operations, 10Wikimedia-Mailing-lists: Create secondary mailinglist for german arbcom - https://phabricator.wikimedia.org/T256306 (10Luke081515) Hi @Dzahn yes, that's correct, a login like the one for arbcomde-l would be enough for us :) [18:11:28] !log ppchelko@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 T229863, IS.php (duration: 01m 03s) [18:11:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:33] T229863: Refactor EventBus mediawiki configuration - https://phabricator.wikimedia.org/T229863 [18:12:03] (03CR) 10Ppchelko: [C: 03+2] Disable HTCP purging everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607593 (https://phabricator.wikimedia.org/T250781) (owner: 10Ppchelko) [18:12:13] 10Operations, 10Wikimedia-Mailing-lists: Create secondary mailinglist for german arbcom - https://phabricator.wikimedia.org/T256306 (10Dzahn) @Luke081515 The list has been created with you as the initial admin. You should have received mail with a randomly generated password. Please do set the list descriptio... [18:12:51] (03Merged) 10jenkins-bot: Disable HTCP purging everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607593 (https://phabricator.wikimedia.org/T250781) (owner: 10Ppchelko) [18:14:46] 10Operations, 10Wikimedia-Mailing-lists: Create secondary mailinglist for german arbcom - https://phabricator.wikimedia.org/T256306 (10Dzahn) 05Open→03Resolved a:03Dzahn [18:15:50] 10Operations, 10Mail, 10observability, 10User-MoritzMuehlenhoff: Fix paniclog alert to only sent mails once - https://phabricator.wikimedia.org/T257016 (10Dzahn) [18:16:54] (03PS1) 10Elukey: sre.hadoop: add logging and more backup actions [cookbooks] - 10https://gerrit.wikimedia.org/r/610336 (https://phabricator.wikimedia.org/T244499) [18:17:20] !log ppchelko@deploy1001 Synchronized wmf-config/reverse-proxy.php: Disable HTCP purging everywhere gerrit:607593 T250781 reverse-proxy.php (duration: 01m 04s) [18:17:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:25] T250781: Move all purge traffic to kafka - https://phabricator.wikimedia.org/T250781 [18:17:38] 10Operations, 10SRE-tools: Improve sre.hosts.decommission - https://phabricator.wikimedia.org/T257297 (10Dzahn) [18:17:59] (03PS2) 10Elukey: sre.hadoop: add logging and more backup actions [cookbooks] - 10https://gerrit.wikimedia.org/r/610336 (https://phabricator.wikimedia.org/T244499) [18:18:05] 10Operations, 10SRE-tools: Improve sre.hosts.decommission (additionally find host yaml files) - https://phabricator.wikimedia.org/T257297 (10Dzahn) [18:18:41] !log ppchelko@deploy1001 Synchronized wmf-config/wikitech.php: Disable HTCP purging everywhere gerrit:607593 T250781 wikitech.php (duration: 01m 04s) [18:18:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:59] (03CR) 10Dzahn: [C: 03+2] Create sysop-it.wikipedia.org [dns] - 10https://gerrit.wikimedia.org/r/609551 (https://phabricator.wikimedia.org/T256545) (owner: 10Urbanecm) [18:20:05] (03PS2) 10Dzahn: Create sysop-it.wikipedia.org [dns] - 10https://gerrit.wikimedia.org/r/609551 (https://phabricator.wikimedia.org/T256545) (owner: 10Urbanecm) [18:20:20] !log ppchelko@deploy1001 Synchronized wmf-config/CommonSettings.php: Disable HTCP purging everywhere gerrit:607593 T250781 CS.php (duration: 01m 03s) [18:20:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:47] (03CR) 10Elukey: [C: 03+2] sre.hadoop: add logging and more backup actions [cookbooks] - 10https://gerrit.wikimedia.org/r/610336 (https://phabricator.wikimedia.org/T244499) (owner: 10Elukey) [18:22:58] (03PS2) 10Ppchelko: Cleanup: remove temporary wmgDisableHTCP variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607596 (https://phabricator.wikimedia.org/T250781) [18:23:36] (03CR) 10Ppchelko: [C: 03+2] Cleanup: remove temporary wmgDisableHTCP variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607596 (https://phabricator.wikimedia.org/T250781) (owner: 10Ppchelko) [18:24:22] (03Merged) 10jenkins-bot: Cleanup: remove temporary wmgDisableHTCP variable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607596 (https://phabricator.wikimedia.org/T250781) (owner: 10Ppchelko) [18:27:12] !log ppchelko@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Cleanup: remove temporary wmgDisableHTCP variable gerrit:607596 T250781 IS.php (duration: 01m 01s) [18:27:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:17] T250781: Move all purge traffic to kafka - https://phabricator.wikimedia.org/T250781 [18:28:03] 10Operations, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team), 10MW-1.35-notes (1.35.0-wmf.39; 2020-06-30), 10Patch-For-Review: Move all purge traffic to kafka - https://phabricator.wikimedia.org/T250781 (10Pchelolo) 05Open→03Resolved a:03Pchelolo After deploying the latest config cha... [18:28:13] 10Operations, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): Move wikitech purges to kafka - https://phabricator.wikimedia.org/T254828 (10Pchelolo) 05Open→03Resolved a:03Pchelolo [18:28:15] 10Operations, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team), 10MW-1.35-notes (1.35.0-wmf.39; 2020-06-30), 10Patch-For-Review: Move all purge traffic to kafka - https://phabricator.wikimedia.org/T250781 (10Pchelolo) [18:28:53] 10Operations, 10serviceops, 10Wikimedia-production-error: PHP7 corruption: Method call executed on unrelated object (also: Call to undefined method) - https://phabricator.wikimedia.org/T245183 (10mmodell) Happened twice recently on host: `wtp1040` timestamp: `2020-07-08T18:06:14` reqid: `1e314c17-ae0b-48... [18:28:56] I see Jon Robson had a config change scheduled, but he doesn't seem to be here and I have no idea what that change does, so I'm not going to deploy it [18:29:58] (03CR) 10Dzahn: "Also the gerrit db password has been removed from the private repo and since the Gerrit services were restarted. So it seems kind of impos" [puppet] - 10https://gerrit.wikimedia.org/r/609884 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [18:33:37] Pchelolo: hey, are you still deploying? :-) [18:33:48] Urbanecm_: I'm done with mine [18:33:52] thanks! [18:35:06] (03PS2) 10Urbanecm: Change bnwiki logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607852 (https://phabricator.wikimedia.org/T255328) [18:36:01] (03CR) 10Urbanecm: [C: 03+2] Change bnwiki logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607852 (https://phabricator.wikimedia.org/T255328) (owner: 10Urbanecm) [18:36:42] (03Merged) 10jenkins-bot: Change bnwiki logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607852 (https://phabricator.wikimedia.org/T255328) (owner: 10Urbanecm) [18:37:25] (03CR) 10Bstorm: "Juuuust in case, ran the compiler: https://puppet-compiler.wmflabs.org/compiler1002/23774/tools-prometheus-03.tools.eqiad.wmflabs/index.ht" [puppet] - 10https://gerrit.wikimedia.org/r/610189 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [18:37:48] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:38:13] (03CR) 10Bstorm: [C: 03+2] "Also updated the security groups. Should work now, I think, but I'll make sure after merge." [puppet] - 10https://gerrit.wikimedia.org/r/610189 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [18:38:15] * Urbanecm_ halts deploying [18:39:42] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:39:42] (03PS1) 10C. Scott Ananian: Make GC in PHP 7.2 configurable in Parsoid, but don't change production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610341 (https://phabricator.wikimedia.org/T257462) [18:42:48] seems like a temporary spike, continuing (it touches static files only anyway) [18:45:20] (03CR) 10Dzahn: [C: 03+2] releases::mediawiki: support rsyncing files to multiple secondaries [puppet] - 10https://gerrit.wikimedia.org/r/610193 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [18:45:40] (03PS4) 10Dzahn: releases::mediawiki: support rsyncing files to multiple secondaries [puppet] - 10https://gerrit.wikimedia.org/r/610193 (https://phabricator.wikimedia.org/T247652) [18:46:01] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: f42cdf2: Change bnwiki logo (T255328) (duration: 01m 04s) [18:46:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:06] T255328: Update logo for Bengali Wikipedia - https://phabricator.wikimedia.org/T255328 [18:46:28] (03CR) 10Bstorm: "Turns out that https://hub.paws.wmcloud.org/hub/metrics is "not a valid hostname". I did that wrong. Reverting this in order to fix promet" [puppet] - 10https://gerrit.wikimedia.org/r/610189 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [18:46:50] (03PS1) 10Bstorm: Revert "tools-prometheus: set up prometheus to get paws metrics" [puppet] - 10https://gerrit.wikimedia.org/r/610100 [18:48:17] (03CR) 10Bstorm: [C: 03+2] Revert "tools-prometheus: set up prometheus to get paws metrics" [puppet] - 10https://gerrit.wikimedia.org/r/610100 (owner: 10Bstorm) [18:50:00] (03PS1) 10Urbanecm: Add scan-bugs.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610346 (https://phabricator.wikimedia.org/T256569) [18:50:32] (03CR) 10Urbanecm: [C: 03+2] "B&C" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610346 (https://phabricator.wikimedia.org/T256569) (owner: 10Urbanecm) [18:51:25] (03Merged) 10jenkins-bot: Add scan-bugs.org to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610346 (https://phabricator.wikimedia.org/T256569) (owner: 10Urbanecm) [18:53:00] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for mooeypoo - https://phabricator.wikimedia.org/T257502 (10Mooeypoo) [18:54:39] (03PS1) 10Urbanecm: Add *.nga.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610348 (https://phabricator.wikimedia.org/T256518) [18:54:52] (03CR) 10Urbanecm: [C: 03+2] Add *.nga.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610348 (https://phabricator.wikimedia.org/T256518) (owner: 10Urbanecm) [18:55:16] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 2e5943ddb30e08607a9ffb6ed05a042e8367e2e1: Add scan-bugs.org to $wgCopyUploadsDomains (T256569) (duration: 01m 04s) [18:55:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:21] T256569: Add scan-bugs.org to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T256569 [18:55:41] (03Merged) 10jenkins-bot: Add *.nga.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610348 (https://phabricator.wikimedia.org/T256518) (owner: 10Urbanecm) [18:58:05] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 091442cf035a6d76f1211291afbb3193c513595d: Add *.nga.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T256518) (duration: 01m 04s) [18:58:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:11] T256518: Add *.nga.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons - https://phabricator.wikimedia.org/T256518 [18:59:02] * Urbanecm_ done [19:00:04] twentyafterfour and James_F: #bothumor My software never has bugs. It just develops random features. Rise for Mediawiki train - American+European Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200708T1900). [19:10:03] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for mooeypoo - https://phabricator.wikimedia.org/T257502 (10kchapman) I approve this :) [19:12:21] (03PS1) 1020after4: group1 wikis to 1.35.0-wmf.40 refs T256668 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610350 [19:12:23] (03CR) 1020after4: [C: 03+2] group1 wikis to 1.35.0-wmf.40 refs T256668 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610350 (owner: 1020after4) [19:13:17] (03Merged) 10jenkins-bot: group1 wikis to 1.35.0-wmf.40 refs T256668 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610350 (owner: 1020after4) [19:17:54] !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.40 refs T256668 [19:17:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:18:00] T256668: 1.35.0-wmf.40 deployment blockers - https://phabricator.wikimedia.org/T256668 [19:18:58] !log twentyafterfour@deploy1001 Synchronized php: group1 wikis to 1.35.0-wmf.40 refs T256668 (duration: 01m 04s) [19:19:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:06] No apparent change in error rate with wmf.40 on group1 - refs T256668 [19:38:02] finally [19:56:41] 10Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: Implement redirect for hide banner cookie issue - https://phabricator.wikimedia.org/T251780 (10MBeat33) p:05Lowest→03High Post meeting followup, to request a Cookie-Setting-Thing (CST) URL that Donor Ser... [19:57:29] (03CR) 10Dzahn: "> Patch Set 5:" [puppet] - 10https://gerrit.wikimedia.org/r/606549 (https://phabricator.wikimedia.org/T254158) (owner: 10Dzahn) [20:00:04] halfak and accraze: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Graphoid / ORES . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200708T2000). [20:00:34] (03CR) 10Dzahn: "> > I guess we should remove grants and/or rename its tables to make sure nothing really uses" [puppet] - 10https://gerrit.wikimedia.org/r/606549 (https://phabricator.wikimedia.org/T254158) (owner: 10Dzahn) [20:02:30] (03CR) 10Dzahn: "we can either first close the firewall or first remove the grants. both should achieve the same thing to ensure nothing is using it. renam" [puppet] - 10https://gerrit.wikimedia.org/r/609884 (https://phabricator.wikimedia.org/T239151) (owner: 10Dzahn) [20:06:52] 10Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: Implement redirect for hide banner cookie issue - https://phabricator.wikimedia.org/T251780 (10TSkaff) @MBeat33 I just emailed Peter to ask what he thinks about the complexity of this; I could have just ping... [20:08:47] !log start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T256012) [20:08:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:53] T256012: Add Wikidata support for shnwiktionary - https://phabricator.wikimedia.org/T256012 [20:24:31] (03PS10) 10DCausse: [wcqs] gui custom config [puppet] - 10https://gerrit.wikimedia.org/r/606297 (https://phabricator.wikimedia.org/T251514) (owner: 10Mstyles) [20:27:36] !log end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T256012) [20:27:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:27:41] T256012: Add Wikidata support for shnwiktionary - https://phabricator.wikimedia.org/T256012 [20:28:18] (03CR) 10Ebernhardson: "> Patch Set 7:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608965 (https://phabricator.wikimedia.org/T256928) (owner: 10Ryan Kemper) [20:40:19] (03PS1) 10Andrew Bogott: Add dummy passwords for profile::openstack::::galera::prometheus_db_pass [labs/private] - 10https://gerrit.wikimedia.org/r/610364 [20:40:22] (03PS1) 10Andrew Bogott: wmcs galera: rework use of prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/610365 (https://phabricator.wikimedia.org/T242455) [20:40:30] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:41:07] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Add dummy passwords for profile::openstack::::galera::prometheus_db_pass [labs/private] - 10https://gerrit.wikimedia.org/r/610364 (owner: 10Andrew Bogott) [20:41:32] (03CR) 10jerkins-bot: [V: 04-1] wmcs galera: rework use of prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/610365 (https://phabricator.wikimedia.org/T242455) (owner: 10Andrew Bogott) [20:42:20] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [20:44:32] (03PS2) 10Andrew Bogott: wmcs galera: rework use of prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/610365 (https://phabricator.wikimedia.org/T242455) [20:45:08] (03CR) 10Ryan Kemper: "> Patch Set 8:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608965 (https://phabricator.wikimedia.org/T256928) (owner: 10Ryan Kemper) [20:45:43] (03CR) 10jerkins-bot: [V: 04-1] wmcs galera: rework use of prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/610365 (https://phabricator.wikimedia.org/T242455) (owner: 10Andrew Bogott) [20:47:43] (03PS1) 10Andrew Bogott: Add more passwords for profile::openstack::::galera::prometheus_db_pass [labs/private] - 10https://gerrit.wikimedia.org/r/610366 [20:49:35] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] Add more passwords for profile::openstack::::galera::prometheus_db_pass [labs/private] - 10https://gerrit.wikimedia.org/r/610366 (owner: 10Andrew Bogott) [20:53:46] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] wmcs galera: rework use of prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/610365 (https://phabricator.wikimedia.org/T242455) (owner: 10Andrew Bogott) [21:02:37] (03PS5) 10Dzahn: releases::mediawiki: support rsyncing files to multiple secondaries [puppet] - 10https://gerrit.wikimedia.org/r/610193 (https://phabricator.wikimedia.org/T247652) [21:23:02] 10Operations, 10SRE-Access-Requests: Requesting access to centralauth database for Jennifer Wang - https://phabricator.wikimedia.org/T255836 (10jwang) Hi, I only need read only permission. I have already been in analytics-privatedata-users group. But I failed to ssh deployment.eqiad.wmnet. Not sure what's mis... [21:39:53] (03PS1) 10Andrew Bogott: openstack::db::project_grants: remove unused db_host arg [puppet] - 10https://gerrit.wikimedia.org/r/610376 [21:39:55] (03PS1) 10Andrew Bogott: openstack::db::project_grants: make privs configurable [puppet] - 10https://gerrit.wikimedia.org/r/610377 [21:39:57] (03PS1) 10Andrew Bogott: galera: generate grants script for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610378 [21:41:27] (03CR) 10jerkins-bot: [V: 04-1] galera: generate grants script for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610378 (owner: 10Andrew Bogott) [21:42:15] (03CR) 10Andrew Bogott: [C: 03+2] openstack::db::project_grants: remove unused db_host arg [puppet] - 10https://gerrit.wikimedia.org/r/610376 (owner: 10Andrew Bogott) [21:42:25] (03CR) 10Andrew Bogott: [C: 03+2] openstack::db::project_grants: make privs configurable [puppet] - 10https://gerrit.wikimedia.org/r/610377 (owner: 10Andrew Bogott) [21:43:59] (03PS2) 10Andrew Bogott: galera: generate grants script for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610378 [21:45:09] (03CR) 10jerkins-bot: [V: 04-1] galera: generate grants script for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610378 (owner: 10Andrew Bogott) [21:46:17] (03PS3) 10Andrew Bogott: galera: generate grants script for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610378 [21:47:28] (03CR) 10jerkins-bot: [V: 04-1] galera: generate grants script for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610378 (owner: 10Andrew Bogott) [21:52:14] (03PS4) 10Andrew Bogott: galera: generate grants script for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610378 [21:53:37] (03PS1) 10Bstorm: tools-prometheus: set up prometheus to get paws metrics [puppet] - 10https://gerrit.wikimedia.org/r/610383 (https://phabricator.wikimedia.org/T256361) [21:55:48] !log rsyncing releases files from releases1001 to releases2002 and releases1002. deleting files from releases2002 not existing on releases1002 to make them mirrors ( T247652_ [21:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:56:54] !log deleting files from releases2001 that are not existing on releases1001 to make them mirrors. rsync with --delete and the command from quickdatacopy class (T247652) [21:56:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:56:58] T247652: replace backends for releases.wikimedia.org with buster VMs - https://phabricator.wikimedia.org/T247652 [21:58:21] (03CR) 10Bstorm: [C: 03+2] tools-prometheus: set up prometheus to get paws metrics [puppet] - 10https://gerrit.wikimedia.org/r/610383 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [21:59:43] (03PS5) 10Andrew Bogott: galera: generate grants script for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610378 [22:01:32] (03CR) 10Andrew Bogott: [C: 03+2] galera: generate grants script for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610378 (owner: 10Andrew Bogott) [22:05:54] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:07:48] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:08:33] (03PS1) 10Dzahn: rsync::quickdatacopy: add optional parameter to let rsync --delete files [puppet] - 10https://gerrit.wikimedia.org/r/610389 (https://phabricator.wikimedia.org/T247652) [22:09:44] (03CR) 10jerkins-bot: [V: 04-1] rsync::quickdatacopy: add optional parameter to let rsync --delete files [puppet] - 10https://gerrit.wikimedia.org/r/610389 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [22:10:37] (03PS2) 10Dzahn: rsync::quickdatacopy: add optional parameter to let rsync --delete files [puppet] - 10https://gerrit.wikimedia.org/r/610389 (https://phabricator.wikimedia.org/T247652) [22:11:48] (03CR) 10jerkins-bot: [V: 04-1] rsync::quickdatacopy: add optional parameter to let rsync --delete files [puppet] - 10https://gerrit.wikimedia.org/r/610389 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [22:25:22] (03PS1) 10Bstorm: tools-prometheus: fix a typo in the config that repeated an entry [puppet] - 10https://gerrit.wikimedia.org/r/610394 (https://phabricator.wikimedia.org/T256361) [22:27:48] (03CR) 10Bstorm: [C: 03+2] tools-prometheus: fix a typo in the config that repeated an entry [puppet] - 10https://gerrit.wikimedia.org/r/610394 (https://phabricator.wikimedia.org/T256361) (owner: 10Bstorm) [22:35:17] (03PS1) 10Ebernhardson: query_service: Move journal ownership to blazegraph class [puppet] - 10https://gerrit.wikimedia.org/r/610401 [22:35:44] (03PS1) 10Dzahn: releases: remove duplicate rsync code from blubber and parsoid classes [puppet] - 10https://gerrit.wikimedia.org/r/610402 [22:35:46] (03PS1) 10Dzahn: releases: move rsync code for all releases from mediawiki to common [puppet] - 10https://gerrit.wikimedia.org/r/610403 [22:35:48] (03PS1) 10Dzahn: releases: move more common code out of the mediawiki class [puppet] - 10https://gerrit.wikimedia.org/r/610404 [22:35:50] (03PS1) 10Dzahn: releases: switch reprepro file sync to support multiple destinations [puppet] - 10https://gerrit.wikimedia.org/r/610405 (https://phabricator.wikimedia.org/T247652) [22:35:52] (03PS1) 10Dzahn: releases: sync MediaWiki security patches to multiple servers [puppet] - 10https://gerrit.wikimedia.org/r/610406 [22:37:02] (03Abandoned) 10Dzahn: releases: also sync blubber,parsoid,reprepro files to multiple servers [puppet] - 10https://gerrit.wikimedia.org/r/610195 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [22:37:17] (03CR) 10jerkins-bot: [V: 04-1] releases: move more common code out of the mediawiki class [puppet] - 10https://gerrit.wikimedia.org/r/610404 (owner: 10Dzahn) [22:43:28] 10Operations, 10Keyholder: After arming a new key in keyholder, the identity file path does not show up - https://phabricator.wikimedia.org/T257329 (10Dzahn) Thanks for https://wikitech.wikimedia.org/w/index.php?title=Keyholder&type=revision&diff=1872867&oldid=1871121 :) [22:48:16] 10Operations, 10Epic, 10Goal: automatically collect network error reports from users' browsers - https://phabricator.wikimedia.org/T257527 (10CDanis) [22:51:47] (03PS1) 10Andrew Bogott: wmcs galera: adjust firewall for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610410 [22:53:23] (03CR) 10Andrew Bogott: [C: 03+2] wmcs galera: adjust firewall for prometheus access [puppet] - 10https://gerrit.wikimedia.org/r/610410 (owner: 10Andrew Bogott) [23:00:04] RoanKattouw, Niharika, and Urbanecm: (Dis)respected human, time to deploy Evening backport window(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200708T2300). Please do the needful. [23:00:40] (03PS2) 10Ebernhardson: query_service: Remove more hardcoding of wdqs [puppet] - 10https://gerrit.wikimedia.org/r/610401 [23:06:17] (03CR) 10Ebernhardson: "These changes were all required to get wcqs-beta-01 to complete a full puppet run without errors." [puppet] - 10https://gerrit.wikimedia.org/r/610401 (owner: 10Ebernhardson) [23:09:04] (03PS1) 10Andrew Bogott: Revert "wmcs galera: adjust firewall for prometheus access" [puppet] - 10https://gerrit.wikimedia.org/r/610101 [23:10:32] (03CR) 10Andrew Bogott: [C: 03+2] Revert "wmcs galera: adjust firewall for prometheus access" [puppet] - 10https://gerrit.wikimedia.org/r/610101 (owner: 10Andrew Bogott) [23:28:04] 10Operations, 10Traffic, 10observability: Collect client network errors, deprecation, intervention and crash reports - https://phabricator.wikimedia.org/T207860 (10CDanis) [23:28:07] 10Operations, 10Epic, 10Goal: automatically collect network error reports from users' browsers - https://phabricator.wikimedia.org/T257527 (10CDanis) [23:36:18] 10Operations, 10SRE-Access-Requests: Requesting access to centralauth database for Jennifer Wang - https://phabricator.wikimedia.org/T255836 (10Niharika) @jcrespo You got it right. Jennifer needs access to the analytics copy. To give a little more context here, `centralauth` has a `localuser` table that has a... [23:36:27] (03PS1) 10Andrew Bogott: Prometheus: gather db stats from wmcs galera db hosts [puppet] - 10https://gerrit.wikimedia.org/r/610420 [23:37:40] (03CR) 10jerkins-bot: [V: 04-1] Prometheus: gather db stats from wmcs galera db hosts [puppet] - 10https://gerrit.wikimedia.org/r/610420 (owner: 10Andrew Bogott) [23:39:32] (03PS1) 10Nray: Enable limited-width layout for "Latest Vector" on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610424 (https://phabricator.wikimedia.org/T246420) [23:40:16] (03PS2) 10Andrew Bogott: Prometheus: gather db stats from wmcs galera db hosts [puppet] - 10https://gerrit.wikimedia.org/r/610420 [23:40:21] (03PS2) 10Nray: Enable limited-width layout for "Latest Vector" on Beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610424 (https://phabricator.wikimedia.org/T246420) [23:41:28] (03CR) 10jerkins-bot: [V: 04-1] Prometheus: gather db stats from wmcs galera db hosts [puppet] - 10https://gerrit.wikimedia.org/r/610420 (owner: 10Andrew Bogott) [23:43:12] (03PS3) 10Andrew Bogott: Prometheus: gather db stats from wmcs galera db hosts [puppet] - 10https://gerrit.wikimedia.org/r/610420