[00:00:04] twentyafterfour: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Phabricator update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T0000). [00:04:32] PROBLEM - PHP7 rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:05:02] (03PS1) 10Huji: Adding $wgCheckUserLogSuccessfulBotLogins [extensions/CheckUser] (wmf/1.35.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622631 (https://phabricator.wikimedia.org/T253802) [00:06:05] (03CR) 10Huji: "Of note, it said cherry-pick failed due to merge conflict. That has never happened to me in the past. I will try to find a way to fix it b" [extensions/CheckUser] (wmf/1.35.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622631 (https://phabricator.wikimedia.org/T253802) (owner: 10Huji) [00:06:24] RECOVERY - PHP7 rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 649 bytes in 5.868 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [00:06:25] (03CR) 10jerkins-bot: [V: 04-1] Adding $wgCheckUserLogSuccessfulBotLogins [extensions/CheckUser] (wmf/1.35.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622631 (https://phabricator.wikimedia.org/T253802) (owner: 10Huji) [00:07:30] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=pdu_sentry4 site=eqsin https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:09:28] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [00:21:28] (03CR) 10Dzahn: Initial configuration for jawikivoyage (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622689 (https://phabricator.wikimedia.org/T260320) (owner: 10Ladsgroup) [00:30:16] (03PS2) 10Huji: Adding $wgCheckUserLogSuccessfulBotLogins [extensions/CheckUser] (wmf/1.35.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622631 (https://phabricator.wikimedia.org/T253802) [00:41:17] (03PS1) 10Huji: Adding $wgCheckUserLogSuccessfulBotLogins [extensions/CheckUser] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622693 (https://phabricator.wikimedia.org/T253802) [00:42:56] (03Abandoned) 10Huji: Adding $wgCheckUserLogSuccessfulBotLogins [extensions/CheckUser] (wmf/1.35.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622631 (https://phabricator.wikimedia.org/T253802) (owner: 10Huji) [00:44:21] (03CR) 10Huji: [C: 04-1] "Will abandon it momentarily." [extensions/CheckUser] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622693 (https://phabricator.wikimedia.org/T253802) (owner: 10Huji) [00:54:07] (03Abandoned) 10Huji: Adding $wgCheckUserLogSuccessfulBotLogins [extensions/CheckUser] (wmf/1.36.0-wmf.5) - 10https://gerrit.wikimedia.org/r/622693 (https://phabricator.wikimedia.org/T253802) (owner: 10Huji) [00:56:32] 10Operations, 10TechCom-RFC, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) [01:03:29] 10Operations, 10Release-Engineering-Team, 10Scap, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): Deployment infrastructure for PHP microservices - https://phabricator.wikimedia.org/T261369 (10tstarling) [01:07:41] 10Operations, 10TechCom-RFC, 10serviceops, 10Platform Team Workboards (Clinic Duty Team): RFC: PHP microservice for containerized shell execution - https://phabricator.wikimedia.org/T260330 (10tstarling) Task description edit: * Changed the file API again as discussed * Stopped describing BoxedCommand as a... [01:15:32] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [01:17:30] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [01:22:19] (03PS1) 10Dzahn: DHCP: set pxelinux.pathprefix to not use http for install5001 [puppet] - 10https://gerrit.wikimedia.org/r/622696 (https://phabricator.wikimedia.org/T254157) [01:23:00] (03CR) 10Dzahn: [C: 03+2] DHCP: set pxelinux.pathprefix to not use http for install5001 [puppet] - 10https://gerrit.wikimedia.org/r/622696 (https://phabricator.wikimedia.org/T254157) (owner: 10Dzahn) [02:03:20] !log shutting down install3001,install4001,install5001 VMs (no OS yet, but please also don't delete, debugging in progress, shutting them down until I continue on T254157) [02:03:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:03:25] T254157: esams,ulsfo,eqsin: one VM request each for install_servers - https://phabricator.wikimedia.org/T254157 [03:18:52] (03PS1) 10Catrope: GrowthExperiments: Assign all homepage users to variant A [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622698 [04:01:24] !log ryankemper@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cloudelastic1005.wikimedia.org [04:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:03:56] !log ryankemper@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1005.wikimedia.org [04:03:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:04:36] !log ryankemper@puppetmaster1001 conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1006.wikimedia.org [04:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:07:55] 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10RKemper) [04:53:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P12374 and previous config saved to /var/cache/conftool/dbconfig/20200827-045329-marostegui.json [04:53:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:53:55] !log Stop db1074 and db2107 in sync to fix drifts on s2 change_tag - T260042 [04:53:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:53:58] T260042: Compare a few tables per section before the switchover - https://phabricator.wikimedia.org/T260042 [05:07:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P12375 and previous config saved to /var/cache/conftool/dbconfig/20200827-050727-marostegui.json [05:07:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:07:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12376 and previous config saved to /var/cache/conftool/dbconfig/20200827-050754-marostegui.json [05:07:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:10:04] PROBLEM - MariaDB Replica Lag: s1 on db2094 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 56337.35 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [05:10:12] ^ expired downtime [05:12:26] (03PS1) 10Marostegui: db1134: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622703 [05:12:55] (03CR) 10Marostegui: [C: 03+2] db1134: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622703 (owner: 10Marostegui) [05:15:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12377 and previous config saved to /var/cache/conftool/dbconfig/20200827-051546-marostegui.json [05:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:16:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12378 and previous config saved to /var/cache/conftool/dbconfig/20200827-051609-marostegui.json [05:16:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:24:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1074', diff saved to https://phabricator.wikimedia.org/P12379 and previous config saved to /var/cache/conftool/dbconfig/20200827-052413-marostegui.json [05:24:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:28:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P12380 and previous config saved to /var/cache/conftool/dbconfig/20200827-052818-marostegui.json [05:28:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:29:26] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P12381 and previous config saved to /var/cache/conftool/dbconfig/20200827-052925-marostegui.json [05:29:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:31:01] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12382 and previous config saved to /var/cache/conftool/dbconfig/20200827-053100-marostegui.json [05:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:35:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12383 and previous config saved to /var/cache/conftool/dbconfig/20200827-053509-marostegui.json [05:35:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:35:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12384 and previous config saved to /var/cache/conftool/dbconfig/20200827-053558-marostegui.json [05:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:38:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12385 and previous config saved to /var/cache/conftool/dbconfig/20200827-053814-marostegui.json [05:38:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:41:14] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1074 db1085 db1078', diff saved to https://phabricator.wikimedia.org/P12386 and previous config saved to /var/cache/conftool/dbconfig/20200827-054114-marostegui.json [05:41:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:42:59] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12387 and previous config saved to /var/cache/conftool/dbconfig/20200827-054259-marostegui.json [05:43:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:51:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12388 and previous config saved to /var/cache/conftool/dbconfig/20200827-055104-marostegui.json [05:51:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:51:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12389 and previous config saved to /var/cache/conftool/dbconfig/20200827-055126-marostegui.json [05:51:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:55:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12390 and previous config saved to /var/cache/conftool/dbconfig/20200827-055522-marostegui.json [05:55:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:58:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12391 and previous config saved to /var/cache/conftool/dbconfig/20200827-055815-marostegui.json [05:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:52] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12392 and previous config saved to /var/cache/conftool/dbconfig/20200827-060652-marostegui.json [06:06:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:08:22] (03CR) 10Urbanecm: "no reason for -2, this can get out" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599492 (https://phabricator.wikimedia.org/T253802) (owner: 10Huji) [06:08:33] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/599492 (https://phabricator.wikimedia.org/T253802) (owner: 10Huji) [06:12:21] (03CR) 10Elukey: [C: 03+2] Remove schema[12]00[12] from their LVS endpoint configs [puppet] - 10https://gerrit.wikimedia.org/r/622587 (https://phabricator.wikimedia.org/T255026) (owner: 10Elukey) [06:19:56] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-test-master1001.eqiad.wmne... [06:31:50] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [06:31:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:33:51] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-test-master1002.eqiad.wmne... [06:34:02] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [06:34:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:34:31] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-test-coord1001.eqiad.wmnet... [06:36:30] (03PS2) 10Giuseppe Lavagetto: Correctly treat fixtures files for new-style deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622581 [06:36:32] (03PS2) 10Giuseppe Lavagetto: Refresh the documentation of the helmfile.d/services [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) [06:36:34] (03PS2) 10Giuseppe Lavagetto: Add an helper script for the conversion to the new layout. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622583 (https://phabricator.wikimedia.org/T258572) [06:36:36] (03PS2) 10Giuseppe Lavagetto: Convert termbox to the new layout using the convert script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622584 (https://phabricator.wikimedia.org/T258572) [06:36:38] (03PS2) 10Giuseppe Lavagetto: Convert citoid to new layout using the conversion script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622585 (https://phabricator.wikimedia.org/T258572) [06:38:57] (03CR) 10jerkins-bot: [V: 04-1] Convert termbox to the new layout using the convert script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622584 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [06:39:25] (03CR) 10jerkins-bot: [V: 04-1] Refresh the documentation of the helmfile.d/services [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [06:39:27] (03CR) 10jerkins-bot: [V: 04-1] Convert citoid to new layout using the conversion script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622585 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [06:45:47] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [06:45:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:46:24] !log elukey@cumin1001 START - Cookbook sre.hosts.downtime [06:46:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:34] (03PS1) 10Elukey: Add terms for eventgate endpoints to the analyitcs-in4 filter [homer/public] - 10https://gerrit.wikimedia.org/r/622705 (https://phabricator.wikimedia.org/T261356) [06:49:10] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [06:49:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:47] !log elukey@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [06:50:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:59] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-master1001.eqiad.wmnet'] ` and were **ALL** successful. [06:53:13] (03CR) 10Muehlenhoff: [C: 04-1] "The patch is fine, but we still have plenty of jessie hosts, which would break, so -1 for now." [puppet] - 10https://gerrit.wikimedia.org/r/621365 (owner: 10Dzahn) [07:02:21] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [07:03:01] (03PS1) 10QChris: Add .gitreview [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/622706 [07:03:03] (03CR) 10QChris: [V: 03+2 C: 03+2] Add .gitreview [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/622706 (owner: 10QChris) [07:03:19] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 241, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:04:24] (03CR) 10Ayounsi: Add terms for eventgate endpoints to the analyitcs-in4 filter (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/622705 (https://phabricator.wikimedia.org/T261356) (owner: 10Elukey) [07:04:49] RECOVERY - MariaDB Replica Lag: s1 on db2094 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [07:04:53] (03CR) 10Elukey: Add terms for eventgate endpoints to the analyitcs-in4 filter (031 comment) [homer/public] - 10https://gerrit.wikimedia.org/r/622705 (https://phabricator.wikimedia.org/T261356) (owner: 10Elukey) [07:05:04] (03Abandoned) 10QChris: Add .gitreview [software/wmfbackups] - 10https://gerrit.wikimedia.org/r/622706 (owner: 10QChris) [07:05:44] (03PS2) 10Elukey: Add terms for eventgate endpoints to the analyitcs-in4 filter [homer/public] - 10https://gerrit.wikimedia.org/r/622705 (https://phabricator.wikimedia.org/T261356) [07:06:58] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-master1002.eqiad.wmnet'] ` and were **ALL** successful. [07:09:53] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-test-coord1001.eqiad.wmnet'] ` and were **ALL** successful. [07:10:03] (03CR) 10Ayounsi: [C: 03+1] Add terms for eventgate endpoints to the analyitcs-in4 filter [homer/public] - 10https://gerrit.wikimedia.org/r/622705 (https://phabricator.wikimedia.org/T261356) (owner: 10Elukey) [07:11:31] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 243, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:15:47] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [07:16:03] !log installing ghostscript security updates on stretch [07:16:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:21:34] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10elukey) Looks good! ` elukey@an-test-coord1001:~$ df -h Filesystem Size Used Avail Use% Mounted on udev 6... [07:24:28] (03CR) 10Ayounsi: [C: 03+2] Add security log {} stanza [homer/public] - 10https://gerrit.wikimedia.org/r/619995 (owner: 10Ayounsi) [07:24:51] (03CR) 10Ayounsi: [C: 03+2] Homer: add pfw support [puppet] - 10https://gerrit.wikimedia.org/r/619992 (owner: 10Ayounsi) [07:24:53] (03Merged) 10jenkins-bot: Add security log {} stanza [homer/public] - 10https://gerrit.wikimedia.org/r/619995 (owner: 10Ayounsi) [07:35:06] !log Move pc2010 under pc2007 T243373 [07:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:10] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [07:39:07] (03Abandoned) 10Elukey: role::analytics_cluster::coordinator: set hive.parquet.use-column-names to true [puppet] - 10https://gerrit.wikimedia.org/r/622607 (https://phabricator.wikimedia.org/T261261) (owner: 10Elukey) [07:39:47] (03PS6) 10Elukey: Move oozie server to an-scheduler1001 [puppet] - 10https://gerrit.wikimedia.org/r/618339 (https://phabricator.wikimedia.org/T257412) [07:43:18] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [07:45:06] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [08:01:52] !log manual cleanup of stale wdqs deploy crontab on wdqs1009 [08:01:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:06:59] (03PS1) 10Kormat: mariadb: Use hiera for shard+role for all profiles. [puppet] - 10https://gerrit.wikimedia.org/r/622743 [08:10:50] (03PS1) 10Elukey: Remove term es from analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/622744 [08:11:13] (03PS2) 10Kormat: mariadb: Use hiera for shard+role for all profiles. [puppet] - 10https://gerrit.wikimedia.org/r/622743 [08:11:31] (03PS1) 10Effie Mouzeli: mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622745 (https://phabricator.wikimedia.org/T261167) [08:13:21] (03PS1) 10Elukey: Remove term labstore1003 from analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/622746 [08:13:22] !log Enable replication codfw -> eqiad on pc1 T243373 [08:13:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:13:27] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [08:13:50] (03CR) 10Ayounsi: [C: 03+1] Remove term es from analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/622744 (owner: 10Elukey) [08:14:23] (03CR) 10Ayounsi: [C: 03+1] Remove term labstore1003 from analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/622746 (owner: 10Elukey) [08:17:06] (03PS2) 10Effie Mouzeli: mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622745 (https://phabricator.wikimedia.org/T261167) [08:18:08] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622745 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [08:22:38] (03PS1) 10Vgutierrez: ATS: Honour disable_dns_resolution [puppet] - 10https://gerrit.wikimedia.org/r/622748 [08:24:28] (03PS2) 10Vgutierrez: ATS: Honour disable_dns_resolution [puppet] - 10https://gerrit.wikimedia.org/r/622748 [08:28:05] (03CR) 10Vgutierrez: "pcc is happy: https://puppet-compiler.wmflabs.org/compiler1001/24724/cp3050.esams.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/622748 (owner: 10Vgutierrez) [08:28:44] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, I think in beta too there are no jessie hosts either with graphite but please double check" [puppet] - 10https://gerrit.wikimedia.org/r/621364 (owner: 10Dzahn) [08:33:35] (03PS2) 10Filippo Giunchedi: alertmanager: assign AM-specific active_host/partners variables [puppet] - 10https://gerrit.wikimedia.org/r/622608 (https://phabricator.wikimedia.org/T258948) [08:34:17] (03PS4) 10Vgutierrez: Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (https://phabricator.wikimedia.org/T260702) [08:34:19] (03PS3) 10Vgutierrez: Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) [08:34:48] (03CR) 10jerkins-bot: [V: 04-1] Update debian/control [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621693 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:34:50] (03CR) 10jerkins-bot: [V: 04-1] Release 6.0.6-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/621694 (https://phabricator.wikimedia.org/T260702) (owner: 10Vgutierrez) [08:35:27] (03PS3) 10Effie Mouzeli: mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622745 (https://phabricator.wikimedia.org/T261167) [08:36:22] (03CR) 10Filippo Giunchedi: [C: 03+2] alertmanager: assign AM-specific active_host/partners variables [puppet] - 10https://gerrit.wikimedia.org/r/622608 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [08:41:35] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Sorry I missed that patch 😄" [cookbooks] - 10https://gerrit.wikimedia.org/r/621304 (owner: 10RLazarus) [08:44:32] !log enabling replication from pc2008 to pc1008 (pc2) T243373 [08:44:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:37] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [08:52:33] (03Abandoned) 10DCausse: [wdqs] attempt to fix updated entity id logs [puppet] - 10https://gerrit.wikimedia.org/r/551890 (owner: 10DCausse) [08:53:40] !log enabling replication from pc2009 to pc1009 (pc3) T243373 [08:53:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:44] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [08:54:00] (03CR) 10JMeybohm: [C: 03+1] Correctly treat fixtures files for new-style deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622581 (owner: 10Giuseppe Lavagetto) [08:55:39] (03CR) 10Filippo Giunchedi: [C: 03+1] profile: re-enable grafana rsync codfw->eqiad [puppet] - 10https://gerrit.wikimedia.org/r/622610 (https://phabricator.wikimedia.org/T259143) (owner: 10Cwhite) [08:55:43] (03CR) 10Filippo Giunchedi: [C: 03+2] profile: re-enable grafana rsync codfw->eqiad [puppet] - 10https://gerrit.wikimedia.org/r/622610 (https://phabricator.wikimedia.org/T259143) (owner: 10Cwhite) [09:03:40] (03CR) 10Lucas Werkmeister (WMDE): "I think this needs to be split up so that it can be deployed safely:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622612 (https://phabricator.wikimedia.org/T258060) (owner: 10Itamar Givon) [09:07:08] !log enabling replication from db2090 to db1081 (s4) T243373 [09:07:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:12] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [09:15:07] !log enabling replication from db2079 to db1109 (s8) T243373 [09:15:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:15:11] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [09:20:54] !log enabling replication from db2105 to db1123 (s3) T243373 [09:20:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:58] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [09:24:45] (03CR) 10JMeybohm: [C: 04-1] Refresh the documentation of the helmfile.d/services (036 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [09:30:13] (03PS1) 10Elukey: Remove old schema[12]00[12] from puppet [puppet] - 10https://gerrit.wikimedia.org/r/622754 (https://phabricator.wikimedia.org/T255026) [09:36:33] (03CR) 10Elukey: [C: 03+2] "Looks good! https://puppet-compiler.wmflabs.org/compiler1001/24726/" [puppet] - 10https://gerrit.wikimedia.org/r/622754 (https://phabricator.wikimedia.org/T255026) (owner: 10Elukey) [09:39:27] !log elukey@cumin1001 START - Cookbook sre.hosts.decommission [09:39:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:52] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:2020-08-17) label/setup/install pki1001 - https://phabricator.wikimedia.org/T259826 (10jbond) @Cmjohnson The standard partman recipe (modules/install_server/files/autoinstall/partman/standard.cfg) with raid10-4dev.cfg is fine. thanks [09:40:53] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [09:40:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:04] !log elukey@cumin1001 START - Cookbook sre.hosts.decommission [09:41:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:21] (03PS3) 10Kormat: mariadb: Use hiera for shard+role for all profiles. [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) [09:42:56] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [09:42:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:02] !log decommissioning vms schema[12]00[12] (replaced previously by schema[12]00[34] buster vms) [09:43:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:23] !log elukey@cumin1001 START - Cookbook sre.hosts.decommission [09:43:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:43:55] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn) [09:43:59] (03PS3) 10Giuseppe Lavagetto: Correctly treat fixtures files for new-style deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622581 [09:44:01] (03PS3) 10Giuseppe Lavagetto: Refresh the documentation of the helmfile.d/services [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) [09:44:03] (03PS3) 10Giuseppe Lavagetto: Add an helper script for the conversion to the new layout. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622583 (https://phabricator.wikimedia.org/T258572) [09:44:05] (03PS3) 10Giuseppe Lavagetto: Convert termbox to the new layout using the convert script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622584 (https://phabricator.wikimedia.org/T258572) [09:44:07] (03PS3) 10Giuseppe Lavagetto: Convert citoid to new layout using the conversion script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622585 (https://phabricator.wikimedia.org/T258572) [09:44:09] (03CR) 10Kormat: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [09:44:10] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [09:44:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:19] !log elukey@cumin1001 START - Cookbook sre.hosts.decommission [09:44:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:05] !log elukey@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [09:45:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:03] !log hnowlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [09:47:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:43] (03PS1) 10Elukey: Remove schema[12]00[12] records (VMs decommissioned) [dns] - 10https://gerrit.wikimedia.org/r/622755 (https://phabricator.wikimedia.org/T255026) [09:50:55] PROBLEM - Apache HTTP on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [09:51:04] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM, thank you Daniel!" [puppet] - 10https://gerrit.wikimedia.org/r/621759 (owner: 10Dzahn) [09:51:29] !log hnowlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [09:51:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:56] RECOVERY - Apache HTTP on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 634 bytes in 7.356 second response time https://wikitech.wikimedia.org/wiki/Application_servers [09:52:10] (03PS4) 10Kormat: mariadb: Use hiera for shard+role for all profiles. [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) [09:52:41] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=205 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [09:53:05] (03CR) 10Kormat: "PCC run for mariadb::misc hosts: https://puppet-compiler.wmflabs.org/compiler1001/24728/" [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [09:53:45] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [09:54:51] !log installing Java security updates on IDP* hosts [09:54:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:20] mmhh I got a report of no connectivity from 80.116.95.117 -> http://paste.debian.net/hidden/ffa5591b/ [09:55:28] cc XioNoX [09:56:08] something seems funky in CF network [09:56:34] indeed [09:57:01] godog: I guess their resolve to esams? [09:57:04] they* [09:57:20] XioNoX: I think so, I'm confirming [09:57:28] yes to esams [09:57:55] (03CR) 10JMeybohm: [C: 03+1] Correctly treat fixtures files for new-style deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622581 (owner: 10Giuseppe Lavagetto) [09:58:24] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Correctly treat fixtures files for new-style deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622581 (owner: 10Giuseppe Lavagetto) [09:58:50] godog: can you try the same command? Ideally I can find someone else going through CF for whom it's working [09:59:19] add -4 if you also have v6 connectivity [09:59:20] sure, will do now [10:00:04] mvolz: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Services – Citoid / Zotero deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T1000). [10:00:32] sadly I don't, I'm also not going through CF from home I think, http://paste.debian.net/hidden/4512afc2/ [10:00:51] (03Merged) 10jenkins-bot: Correctly treat fixtures files for new-style deployments. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622581 (owner: 10Giuseppe Lavagetto) [10:01:15] indeed, thx [10:01:50] godog: can you ask the user to run the same command to text-lb.eqiad.wikimedia.org ? [10:02:08] sure [10:03:38] !log ayounsi@cumin1001 START - Cookbook sre.network.cf [10:03:39] !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0) [10:03:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:31] (just checking the advertisement status ^) [10:09:57] XioNoX: to eqiad -> http://paste.debian.net/hidden/d2adc4df/ [10:10:18] (03CR) 10Muehlenhoff: [C: 03+2] Set U2F token expiry to 3650 on the production IDPs [puppet] - 10https://gerrit.wikimedia.org/r/622324 (https://phabricator.wikimedia.org/T258029) (owner: 10Muehlenhoff) [10:10:24] in the meantime I ran a ssl cert atlas probe from italy, https://atlas.ripe.net/measurements/26858670/#!probes [10:10:57] and traceroute -> https://atlas.ripe.net/measurements/26858959/#!probes [10:11:12] godog: I emailed CF, CCed noc [10:11:35] 10Operations: FY2020-2021 Q1 codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10Marostegui) [10:11:52] XioNoX: nice, thank you! appreciate it [10:19:27] !log ayounsi@cumin1001 START - Cookbook sre.network.cf [10:19:28] !log ayounsi@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0) [10:19:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:56] (03PS1) 10Effie Mouzeli: mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) [10:20:55] (03CR) 10Marostegui: mariadb: Use hiera for shard+role for all profiles. (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [10:21:05] (03PS2) 10Mvolz: Update zotero to 2020-08-07-190051-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/621686 [10:21:32] (03PS2) 10Effie Mouzeli: mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) [10:23:33] (03PS1) 10Effie Mouzeli: hiera: disable php-fpm restarts on mwdebug [puppet] - 10https://gerrit.wikimedia.org/r/622762 (https://phabricator.wikimedia.org/T253673) [10:23:50] (03CR) 10Mvolz: [C: 03+2] Update zotero to 2020-08-07-190051-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/621686 (owner: 10Mvolz) [10:23:57] !log enabling replication from es2021 to es1021 (es4) T243373 [10:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:01] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [10:26:11] (03Merged) 10jenkins-bot: Update zotero to 2020-08-07-190051-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/621686 (owner: 10Mvolz) [10:26:31] (03Abandoned) 10Effie Mouzeli: mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622745 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [10:28:14] (03PS3) 10Effie Mouzeli: mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) [10:28:41] !log mvolz@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' . [10:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:37] !log enabling replication from es2023 to es1024 (es5) T243373 [10:30:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:41] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [10:31:14] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10Cparle) thanks Nuria [10:32:22] (03Abandoned) 10Effie Mouzeli: hiera: disable php-fpm restarts on mwdebug [puppet] - 10https://gerrit.wikimedia.org/r/622762 (https://phabricator.wikimedia.org/T253673) (owner: 10Effie Mouzeli) [10:33:04] (03CR) 10Giuseppe Lavagetto: Refresh the documentation of the helmfile.d/services (036 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [10:33:25] (03PS4) 10Giuseppe Lavagetto: Refresh the documentation of the helmfile.d/services [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) [10:34:10] (03PS1) 10Effie Mouzeli: hiera: disable php-fpm restarts on mwdebug [puppet] - 10https://gerrit.wikimedia.org/r/622765 (https://phabricator.wikimedia.org/T253673) [10:35:55] (03CR) 10jerkins-bot: [V: 04-1] Refresh the documentation of the helmfile.d/services [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [10:37:10] (03CR) 10Giuseppe Lavagetto: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [10:40:21] (03PS5) 10Kormat: mariadb: Use hiera for shard+role for all profiles. [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) [10:40:47] (03CR) 10Effie Mouzeli: [V: 03+1] "PCC NOOPs as expected:" [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [10:42:04] (03CR) 10Effie Mouzeli: "Changes only on mwdebug1001: https://puppet-compiler.wmflabs.org/compiler1002/24734/" [puppet] - 10https://gerrit.wikimedia.org/r/622765 (https://phabricator.wikimedia.org/T253673) (owner: 10Effie Mouzeli) [10:42:19] (03CR) 10Marostegui: mariadb: Use hiera for shard+role for all profiles. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [10:46:43] (03PS2) 10Filippo Giunchedi: pontoon: latest additions to observability stack [puppet] - 10https://gerrit.wikimedia.org/r/622568 [10:47:59] !log mvolz@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' . [10:48:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:48] (03CR) 10Filippo Giunchedi: [C: 03+2] pontoon: latest additions to observability stack [puppet] - 10https://gerrit.wikimedia.org/r/622568 (owner: 10Filippo Giunchedi) [10:51:17] !log enabling replication from db2123 to db1100 (s5) T243373 [10:51:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:21] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [10:51:27] (03CR) 10Giuseppe Lavagetto: "If you set the opcache limit to zero, it should never restart anyways, correct?" [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [10:53:22] (03PS4) 10Filippo Giunchedi: grafana: set root_url to fix dashboard redirects [puppet] - 10https://gerrit.wikimedia.org/r/622594 (https://phabricator.wikimedia.org/T261184) [10:53:51] (03CR) 10Filippo Giunchedi: [C: 03+2] grafana: set root_url to fix dashboard redirects [puppet] - 10https://gerrit.wikimedia.org/r/622594 (https://phabricator.wikimedia.org/T261184) (owner: 10Filippo Giunchedi) [10:54:59] (03CR) 10Jbond: "See inline, sounds like the script always needs to exists however if ensure == absent and opcache_limit > 0, then the script will be remov" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [10:56:02] !log bounce grafana to apply new settings [10:56:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:24] 10Operations, 10observability, 10Patch-For-Review: Grafana link redirecting to port :3000 - https://phabricator.wikimedia.org/T261184 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Can confirm that this is fixed now, https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&f... [10:57:53] !log mvolz@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' . [10:57:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:39] (03CR) 10Effie Mouzeli: [V: 03+1] "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: That opportune time is upon us again. Time for a European mid-day backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T1100). [11:00:05] Nikerabbit: A patch you scheduled for European mid-day backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:16] I can deploy today! [11:00:40] (unless Nikerabbit wants to self-service?) [11:01:30] Urbanecm: please go ahead unless you don't want to do it :D [11:01:46] I'm totally fine doing so, just saying you don't have to use my services ;) [11:01:54] (03PS2) 10Urbanecm: Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622473 (https://phabricator.wikimedia.org/T131300) (owner: 10Nikerabbit) [11:01:59] (03CR) 10Urbanecm: [C: 03+2] Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622473 (https://phabricator.wikimedia.org/T131300) (owner: 10Nikerabbit) [11:02:43] (03Merged) 10jenkins-bot: Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622473 (https://phabricator.wikimedia.org/T131300) (owner: 10Nikerabbit) [11:03:15] Oh merging to mediawiki-config is that fast? [11:03:31] Nikerabbit: definitely faster than mediawiki/core :) [11:03:37] (03PS1) 10Marostegui: db2135: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622792 [11:03:53] Nikerabbit: pulled to mwdebug1002 for testing, can you have a look, please? [11:03:58] barely enough time to fetch a cup of $beverage [11:04:07] Urbanecm: yes I will attempt to test it [11:04:20] (03CR) 10Marostegui: [C: 03+2] db2135: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/622792 (owner: 10Marostegui) [11:04:35] 10Operations, 10observability, 10Graphite, 10audits-data-retention: graphite-web logs are not rotated - https://phabricator.wikimedia.org/T86546 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Thanks for following up @Dzahn ! Yes resolving because this is fixed now: ` root@graphite1004:/var/log/grap... [11:05:33] (03CR) 10Effie Mouzeli: [V: 03+1] "> Patch Set 3:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [11:06:29] (03PS4) 10Effie Mouzeli: mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) [11:07:14] Urbanecm: it works [11:07:26] excellent, syncing [11:07:33] (03PS2) 10Effie Mouzeli: hiera: disable php-fpm restarts on mwdebug [puppet] - 10https://gerrit.wikimedia.org/r/622765 (https://phabricator.wikimedia.org/T253673) [11:08:40] (03CR) 10Kosta Harlan: [C: 03+1] GrowthExperiments: Assign all homepage users to variant A [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622698 (owner: 10Catrope) [11:09:16] !log urbanecm@deploy1001 Synchronized wmf-config/CommonSettings.php: 34994d39f92b23934929c66f3e15aa332683e746: Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki (T131300) (duration: 01m 03s) [11:09:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:21] T131300: JavaScript error when translation helper fails - https://phabricator.wikimedia.org/T131300 [11:09:23] Nikerabbit: here you go :) [11:10:22] Urbanecm: 5/5 would recommend to a friend [11:10:30] hehe :) [11:12:08] !log EU B&C done [11:12:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:26] (03CR) 10Effie Mouzeli: "PCC as expected https://puppet-compiler.wmflabs.org/compiler1003/24738/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [11:15:34] (03PS1) 10Hnowlan: api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) [11:16:01] (03CR) 10Jbond: "looks fine to me but not familiar enough with scap to vote" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [11:16:37] (03CR) 10jerkins-bot: [V: 04-1] api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) (owner: 10Hnowlan) [11:19:09] (03CR) 10Effie Mouzeli: [C: 03+2] mediawiki::php::restarts: Allow disabling of php-fpm restarts [puppet] - 10https://gerrit.wikimedia.org/r/622761 (https://phabricator.wikimedia.org/T261167) (owner: 10Effie Mouzeli) [11:20:59] (03PS3) 10Effie Mouzeli: hiera: disable php-fpm restarts on mwdebug [puppet] - 10https://gerrit.wikimedia.org/r/622765 (https://phabricator.wikimedia.org/T253673) [11:21:38] (03CR) 10Effie Mouzeli: [C: 03+2] hiera: disable php-fpm restarts on mwdebug [puppet] - 10https://gerrit.wikimedia.org/r/622765 (https://phabricator.wikimedia.org/T253673) (owner: 10Effie Mouzeli) [11:22:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust db2126 weight T243373', diff saved to https://phabricator.wikimedia.org/P12394 and previous config saved to /var/cache/conftool/dbconfig/20200827-112213-marostegui.json [11:22:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:17] thanks effie :) [11:22:17] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [11:22:33] (03PS2) 10Hnowlan: api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) [11:26:04] (03CR) 10Hnowlan: [C: 03+2] api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) (owner: 10Hnowlan) [11:28:28] (03CR) 10jerkins-bot: [V: 04-1] api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) (owner: 10Hnowlan) [11:30:09] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:32:11] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:36:04] (03CR) 10Hnowlan: api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) (owner: 10Hnowlan) [11:36:34] (03PS3) 10Hnowlan: api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) [11:39:01] (03CR) 10jerkins-bot: [V: 04-1] api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) (owner: 10Hnowlan) [11:45:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust s1 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12395 and previous config saved to /var/cache/conftool/dbconfig/20200827-114509-marostegui.json [11:45:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:15] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [11:49:46] !log uploaded python3.4 3.4.2-1+deb8u7+wmf1 for jessie-wikimedia T259102 [11:49:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust s4 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12396 and previous config saved to /var/cache/conftool/dbconfig/20200827-115110-marostegui.json [11:51:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:14] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [11:53:38] (03PS4) 10Hnowlan: api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) [11:56:01] !log Lift range blocks exceeding wgBlockCIDRLimit via custom script from F32197596 (ruwiki, ruwikiquote; T243980) [11:56:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:56:04] T243980: Find rangeblocks exceeding $wgBlockCIDRLimit to review/lift them - https://phabricator.wikimedia.org/T243980 [11:58:48] (03PS1) 10Hnowlan: api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622799 (https://phabricator.wikimedia.org/T235277) [11:59:35] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust s5 eqiad weights T243373', diff saved to https://phabricator.wikimedia.org/P12397 and previous config saved to /var/cache/conftool/dbconfig/20200827-115934-marostegui.json [11:59:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:59:39] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [12:02:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust s5 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12398 and previous config saved to /var/cache/conftool/dbconfig/20200827-120211-marostegui.json [12:02:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust s6 weights T243373', diff saved to https://phabricator.wikimedia.org/P12399 and previous config saved to /var/cache/conftool/dbconfig/20200827-120816-marostegui.json [12:08:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:20] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [12:08:37] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=pdu_sentry4 site=eqsin https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:10:19] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [12:12:36] 10Operations, 10Release-Engineering-Team, 10Scap, 10Patch-For-Review, 10Security: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261167 (10jijiki) @Urbanecm I sent out an email, but sadly it went to the wrong list and I didn't notice (I was th... [12:12:56] 10Operations, 10Release-Engineering-Team, 10Scap, 10Patch-For-Review, 10Security: `scap sync-file` cannot restart php on mwdebug1001, sudo wants password - https://phabricator.wikimedia.org/T261167 (10jijiki) 05Open→03Resolved [12:13:51] !log restart db1095 [12:13:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:54] !log enabling replication from db2129 to db1093 (s6) T243373 [12:14:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:57] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [12:22:59] (03PS1) 10JMeybohm: Fix race on concurrent helm repo update calls [deployment-charts] - 10https://gerrit.wikimedia.org/r/622802 (https://phabricator.wikimedia.org/T261313) [12:24:10] !log Fix password format for in db2129 (s6 codfw master) T243373 [12:24:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:13] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [12:30:03] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust s7 weights T243373', diff saved to https://phabricator.wikimedia.org/P12400 and previous config saved to /var/cache/conftool/dbconfig/20200827-123003-marostegui.json [12:30:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:07] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [12:30:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust s7 weights T243373', diff saved to https://phabricator.wikimedia.org/P12401 and previous config saved to /var/cache/conftool/dbconfig/20200827-123028-marostegui.json [12:30:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:30:46] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10elukey) Please also add a kerberos identity! [12:31:21] PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 23699976 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:33:17] RECOVERY - Postgres Replication Lag on maps2003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 192592 and 84 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [12:35:06] !log restart db1139 [12:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:18] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10jcrespo) [12:43:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'Adjust s8 weights T243373', diff saved to https://phabricator.wikimedia.org/P12402 and previous config saved to /var/cache/conftool/dbconfig/20200827-124338-marostegui.json [12:43:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:43] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [12:48:03] PROBLEM - Rate of JVM GC Old generation-s runs - logstash1010-production-logstash-eqiad on logstash1010 is CRITICAL: 102.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1010&panelId=37 [12:52:45] !log restart db1140 [12:52:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:09] (03CR) 10Gehel: [C: 03+1] "LGTM, but maybe volans wants to have a last look before we merge" [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [13:01:31] !log enabling replication from db2118 to db1086 (s7) T243373 [13:01:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:36] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [13:03:03] 10Operations, 10GrowthExperiments-NewcomerTasks, 10Product-Infrastructure-Team-Backlog, 10serviceops: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10kostajh) [13:03:29] !deploy python3.4 security update to canaries on jessie [13:03:34] !log deploy python3.4 security update to canaries on jessie [13:03:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:52] (03CR) 10Gehel: [C: 03+2] [wdqs] cleanup the munge path when doing a data-reload [cookbooks] - 10https://gerrit.wikimedia.org/r/622550 (owner: 10DCausse) [13:04:02] 10Operations, 10GrowthExperiments-NewcomerTasks, 10Product-Infrastructure-Team-Backlog, 10serviceops: Service operations setup for Add a Link project - https://phabricator.wikimedia.org/T258978 (10kostajh) >>! In T258978#6408429, @Joe wrote: > I have a few questions for you, before giving a refined recomme... [13:06:52] (03PS1) 10JMeybohm: Convert mathoid to the new layout using the convert script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622806 (https://phabricator.wikimedia.org/T258572) [13:07:02] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Fix race on concurrent helm repo update calls [deployment-charts] - 10https://gerrit.wikimedia.org/r/622802 (https://phabricator.wikimedia.org/T261313) (owner: 10JMeybohm) [13:07:08] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10Marostegui) p:05Triage→03Medium [13:07:24] !log deploy python3.4 security update to kraz [13:07:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:33] (03CR) 10JMeybohm: [C: 03+1] "Great!" [deployment-charts] - 10https://gerrit.wikimedia.org/r/622583 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [13:10:07] !log restart db2097 [13:10:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:42] (03CR) 10JMeybohm: [C: 03+2] Fix race on concurrent helm repo update calls [deployment-charts] - 10https://gerrit.wikimedia.org/r/622802 (https://phabricator.wikimedia.org/T261313) (owner: 10JMeybohm) [13:13:24] (03Merged) 10jenkins-bot: Fix race on concurrent helm repo update calls [deployment-charts] - 10https://gerrit.wikimedia.org/r/622802 (https://phabricator.wikimedia.org/T261313) (owner: 10JMeybohm) [13:14:02] !log enabling replication from db2096 to db1103 (x1) T243373 [13:14:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:06] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [13:18:44] !log enabling replication from db2107 to db1122 (s2) T243373 [13:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:57] (03PS8) 10Volans: sre.discovery: Refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [13:29:31] !log restart jvm daemons on analytics1042, aqs1004, kafka-jumbo1001 to pick up new openjdk upgrades (canaries) [13:29:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:27] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10Cmjohnson) 05Open→03Resolved Thanks @elukey [13:37:11] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10Cmjohnson) 05Open→03Resolved [13:38:17] (03PS1) 10Ayounsi: Add cloudsw switches to rancid [puppet] - 10https://gerrit.wikimedia.org/r/622809 [13:39:27] (03PS5) 10Hnowlan: api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) [13:39:29] (03CR) 10Ayounsi: [C: 03+2] Add cloudsw switches to rancid [puppet] - 10https://gerrit.wikimedia.org/r/622809 (owner: 10Ayounsi) [13:41:01] PROBLEM - Apache HTTP on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [13:41:42] (03CR) 10Hnowlan: [C: 03+2] api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) (owner: 10Hnowlan) [13:43:15] (03Merged) 10jenkins-bot: api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) (owner: 10Hnowlan) [13:44:53] RECOVERY - Apache HTTP on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 635 bytes in 9.151 second response time https://wikitech.wikimedia.org/wiki/Application_servers [13:45:14] (03CR) 10Ppchelko: [C: 03+1] api-gateway: Make JWT issuer configurable. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622794 (https://phabricator.wikimedia.org/T235277) (owner: 10Hnowlan) [13:50:31] !log hnowlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . [13:50:32] !log disabling GTID on db2107 (s2) T243373 [13:50:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:38] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [13:50:49] 10Operations, 10Analytics-Clusters, 10Analytics-Radar, 10observability, 10Patch-For-Review: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10elukey) @herron Hi! What is the status of the task? Anything that I can help with? [13:51:24] !log disabling GTID on db2105 (s3) T243373 [13:51:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:51:52] !log hnowlan@deploy1001 helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [13:51:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:05] !log disabling GTID on db2090 (s4) T243373 [13:52:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:46] !log disabling GTID on db2123 (s5) T243373 [13:52:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:53] (03PS1) 10Elukey: install_server: set kafka-jumbo nodes' PXE settings to Buster [puppet] - 10https://gerrit.wikimedia.org/r/622812 (https://phabricator.wikimedia.org/T255123) [13:53:53] !log disabling GTID on db2129 (s6), db2118 (s7), db2079 (s8) T243373 [13:53:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:09] !log hnowlan@deploy1001 helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' . [13:54:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:09] (03CR) 10Elukey: [C: 03+2] install_server: set kafka-jumbo nodes' PXE settings to Buster [puppet] - 10https://gerrit.wikimedia.org/r/622812 (https://phabricator.wikimedia.org/T255123) (owner: 10Elukey) [13:56:31] !log disabling GTID on db2096 (x1), es2021 (es4), es2023 (es5) T243373 [13:56:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:35] T243373: Enable DB replication codfw -> eqiad before the switchover - https://phabricator.wikimedia.org/T243373 [13:58:39] !log disabling GTID on pc2007 (pc1), pc2008 (pc2), pc2009 (pc3) T243373 [13:58:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:33] PROBLEM - DPKG on dbprov2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [14:01:05] will check later [14:01:25] (03Abandoned) 10Vgutierrez: ATS: Honour disable_dns_resolution [puppet] - 10https://gerrit.wikimedia.org/r/622748 (owner: 10Vgutierrez) [14:05:46] (03PS6) 10Kormat: mariadb: Use hiera for shard+role for all profiles. [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) [14:08:16] (03PS1) 10Cmjohnson: Add production dns for an-worker1104-1117 both ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622821 (https://phabricator.wikimedia.org/T259071) [14:08:37] (03CR) 10jerkins-bot: [V: 04-1] Add production dns for an-worker1104-1117 both ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622821 (https://phabricator.wikimedia.org/T259071) (owner: 10Cmjohnson) [14:09:00] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "The patch is in good shape, but there are a couple things that need to be fixed in service-route.py" (035 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [14:14:39] 10Operations, 10ops-codfw, 10decommission-hardware, 10serviceops: decommission mc2028.codfw.wmnet - https://phabricator.wikimedia.org/T261168 (10Papaul) [14:14:50] 10Operations, 10ops-codfw, 10Patch-For-Review: mc2028 regular and mgmt interface down - https://phabricator.wikimedia.org/T260224 (10Papaul) [14:14:56] 10Operations, 10ops-codfw, 10decommission-hardware, 10serviceops: decommission mc2028.codfw.wmnet - https://phabricator.wikimedia.org/T261168 (10Papaul) 05Open→03Resolved complete [14:15:10] (03CR) 10Volans: [C: 03+1] "Code LGTM, I didn't test it but poked around dnspython APIs and indeed it seems not possible to use a simpler abstraction in order to set " (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/621721 (https://phabricator.wikimedia.org/T260663) (owner: 10JMeybohm) [14:15:28] (03CR) 10RLazarus: [C: 03+1] decom mw2135 through mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/621783 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [14:16:14] (03CR) 10RLazarus: [C: 03+1] "Nit: Update the commit message to make clear this also removes mw2187-99?" [puppet] - 10https://gerrit.wikimedia.org/r/621783 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [14:18:42] (03CR) 10Ottomata: [C: 03+1] "TY!" [homer/public] - 10https://gerrit.wikimedia.org/r/622705 (https://phabricator.wikimedia.org/T261356) (owner: 10Elukey) [14:20:39] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10fdans) @JAllemandou and I just had a chat about these changes. Before proceeding with any of the ways Joseph described above, @faidon: how importa... [14:21:03] 10Operations, 10Wikimedia-Mailing-lists: Disable google code in mailinglists - https://phabricator.wikimedia.org/T261084 (10jijiki) So the verdict is to delete those lists? @Dzahn will this remove the archives as well? [14:22:59] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10CDanis) It's critical that this data remain real-time, even if some of the fields aren't available in the real-time data. [14:31:11] (03CR) 10Kormat: "Fresh PCC runs:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [14:31:36] !log replacing msw-c5,c6,c7 and fmsw-c8 [14:31:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:01] (03PS1) 10Jbond: cfssl: update ca_constraint parameter [puppet] - 10https://gerrit.wikimedia.org/r/622822 [14:32:14] (03CR) 10Marostegui: [C: 03+1] mariadb: Use hiera for shard+role for all profiles. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [14:32:37] (03CR) 10Jbond: [C: 03+2] cfssl: update ca_constraint parameter [puppet] - 10https://gerrit.wikimedia.org/r/622822 (owner: 10Jbond) [14:32:45] (03PS2) 10Cmjohnson: Add production dns for an-worker1104-1117 both ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622821 (https://phabricator.wikimedia.org/T259071) [14:32:53] PROBLEM - Apache HTTP on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [14:32:53] PROBLEM - PHP7 rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [14:32:57] (03PS2) 10RLazarus: sre.switchdc.mediawiki: Add -ro targets to the TTL steps also. [cookbooks] - 10https://gerrit.wikimedia.org/r/621304 [14:33:08] (03CR) 10Kormat: [C: 03+2] mariadb: Use hiera for shard+role for all profiles. [puppet] - 10https://gerrit.wikimedia.org/r/622743 (https://phabricator.wikimedia.org/T256972) (owner: 10Kormat) [14:33:59] 10Operations: incident 20170323-wikibase did not trigger Icinga paging - https://phabricator.wikimedia.org/T161528 (10lmata) Untagging the Observability project for now as there doesn't seem to be an action item for the team. Please add back if there is anything we missed. [14:34:28] andrewbogott: ok to merge https://gerrit.wikimedia.org/r/c/operations/dns/+/622266 ? [14:34:46] (03CR) 10RLazarus: [C: 03+2] sre.switchdc.mediawiki: Add -ro targets to the TTL steps also. [cookbooks] - 10https://gerrit.wikimedia.org/r/621304 (owner: 10RLazarus) [14:34:51] RECOVERY - Apache HTTP on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 635 bytes in 9.734 second response time https://wikitech.wikimedia.org/wiki/Application_servers [14:35:20] (03CR) 10Andrew Bogott: [C: 03+1] wmnet: Decrease m5-master TTL to 1M [dns] - 10https://gerrit.wikimedia.org/r/622266 (https://phabricator.wikimedia.org/T260324) (owner: 10Marostegui) [14:35:32] thanks! [14:35:32] marostegui: yep! [14:35:39] (03Merged) 10jenkins-bot: sre.switchdc.mediawiki: Add -ro targets to the TTL steps also. [cookbooks] - 10https://gerrit.wikimedia.org/r/621304 (owner: 10RLazarus) [14:35:51] (03PS2) 10Marostegui: wmnet: Decrease m5-master TTL to 1M [dns] - 10https://gerrit.wikimedia.org/r/622266 (https://phabricator.wikimedia.org/T260324) [14:35:55] (03CR) 10Cmjohnson: [C: 03+2] Add production dns for an-worker1104-1117 both ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622821 (https://phabricator.wikimedia.org/T259071) (owner: 10Cmjohnson) [14:36:49] RECOVERY - PHP7 rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 302 Found - 648 bytes in 8.552 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [14:37:01] PROBLEM - Host restbase2020.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:37:03] PROBLEM - Host mc2031.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:37:03] PROBLEM - Host mc2030.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:37:08] !log imported openjdk 8u265-b01-1~deb10u1 to buster-wikimedia (forward port of latest Java 8 security update) [14:37:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:31] RECOVERY - DPKG on dbprov2001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [14:37:51] PROBLEM - Host conf2002.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:38:03] PROBLEM - Host db2114.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:38:09] uh? [14:38:15] PROBLEM - Host db2080.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:38:19] PROBLEM - Host db2102.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:38:24] papaul or XioNoX is that some maintenance going on? ^ [14:38:29] PROBLEM - Host ps1-c5-codfw is DOWN: PING CRITICAL - Packet loss = 100% [14:38:34] (03CR) 10Marostegui: [C: 03+2] wmnet: Decrease m5-master TTL to 1M [dns] - 10https://gerrit.wikimedia.org/r/622266 (https://phabricator.wikimedia.org/T260324) (owner: 10Marostegui) [14:38:47] marostegui: yes i just log it like 8 minutes ago [14:38:49] (03PS3) 10Marostegui: wmnet: Decrease m5-master TTL to 1M [dns] - 10https://gerrit.wikimedia.org/r/622266 (https://phabricator.wikimedia.org/T260324) [14:38:58] papaul: ah sorry, missed it [14:39:05] PROBLEM - Host ganeti2011.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:39:09] (03PS1) 10JMeybohm: helmfile_convert_diff.sh copy old helmfiles from parent on review [deployment-charts] - 10https://gerrit.wikimedia.org/r/622823 (https://phabricator.wikimedia.org/T258572) [14:39:12] marostegui: np [14:39:15] PROBLEM - Host restbase2016.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:39:15] PROBLEM - Host scb2001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:39:18] (03CR) 10Elukey: [C: 03+2] Add terms for eventgate endpoints to the analyitcs-in4 filter [homer/public] - 10https://gerrit.wikimedia.org/r/622705 (https://phabricator.wikimedia.org/T261356) (owner: 10Elukey) [14:39:29] PROBLEM - Host logstash2022.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:39:47] PROBLEM - Host ores2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:39:47] PROBLEM - Host parse2012.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:40:04] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Add an helper script for the conversion to the new layout. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622583 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [14:40:25] PROBLEM - Host parse2011.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:40:25] PROBLEM - Host parse2013.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:40:25] PROBLEM - Host phab2001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:40:33] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Refresh the documentation of the helmfile.d/services [deployment-charts] - 10https://gerrit.wikimedia.org/r/622582 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [14:40:47] PROBLEM - Host scb2006.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:40:48] (03PS4) 10Giuseppe Lavagetto: Add an helper script for the conversion to the new layout. [deployment-charts] - 10https://gerrit.wikimedia.org/r/622583 (https://phabricator.wikimedia.org/T258572) [14:40:53] PROBLEM - Host elastic2032.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:40:53] PROBLEM - Host elastic2033.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:41:01] PROBLEM - Host db2090.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:41:01] PROBLEM - Host db2126.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:41:09] PROBLEM - Host deploy2001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:41:23] PROBLEM - Host logstash2002.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:41:31] PROBLEM - Host ganeti2012.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:41:41] RECOVERY - Host ps1-c5-codfw is UP: PING OK - Packet loss = 0%, RTA = 34.12 ms [14:42:01] PROBLEM - Host wdqs2001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:42:06] (03CR) 10Giuseppe Lavagetto: [C: 03+1] helmfile_convert_diff.sh copy old helmfiles from parent on review [deployment-charts] - 10https://gerrit.wikimedia.org/r/622823 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm) [14:42:21] PROBLEM - Host mwlog2001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:42:38] (03PS2) 10JMeybohm: helmfile_convert_diff.sh copy old helmfiles from parent on review [deployment-charts] - 10https://gerrit.wikimedia.org/r/622823 (https://phabricator.wikimedia.org/T258572) [14:42:41] PROBLEM - Host rdb2005.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:42:45] (03CR) 10Volans: "Thanks for all the fixes, code LGTM, just couple of nits inline." (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [14:42:51] PROBLEM - Host maps2003.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [14:42:58] !log add eventgate-related terms to analytics-in4 filter on cr1/cr2-eqiad (ref https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622705) [14:43:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:03] RECOVERY - Host restbase2020.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.97 ms [14:43:09] RECOVERY - Host mc2031.mgmt is UP: PING WARNING - Packet loss = 66%, RTA = 34.87 ms [14:43:20] (03CR) 10JMeybohm: [C: 03+2] helmfile_convert_diff.sh copy old helmfiles from parent on review [deployment-charts] - 10https://gerrit.wikimedia.org/r/622823 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm) [14:44:03] RECOVERY - Host db2114.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.00 ms [14:44:09] !log restarting tomcat on idp-test* hosts to pick up Java update [14:44:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:15] RECOVERY - Host db2080.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.71 ms [14:44:19] RECOVERY - Host db2102.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.73 ms [14:44:36] (03Merged) 10jenkins-bot: helmfile_convert_diff.sh copy old helmfiles from parent on review [deployment-charts] - 10https://gerrit.wikimedia.org/r/622823 (https://phabricator.wikimedia.org/T258572) (owner: 10JMeybohm) [14:44:53] RECOVERY - Host scb2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.90 ms [14:45:15] RECOVERY - Host restbase2016.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.05 ms [14:45:27] RECOVERY - Host logstash2022.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.46 ms [14:45:45] RECOVERY - Host ores2006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.72 ms [14:45:47] RECOVERY - Host parse2012.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.98 ms [14:46:25] RECOVERY - Host parse2013.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.95 ms [14:47:28] RECOVERY - Host wdqs2001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.77 ms [14:47:32] RECOVERY - Host db2090.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.69 ms [14:47:32] RECOVERY - Host db2126.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.97 ms [14:47:32] RECOVERY - Host deploy2001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.73 ms [14:47:32] RECOVERY - Host elastic2032.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.57 ms [14:47:34] RECOVERY - Host elastic2033.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.71 ms [14:47:34] RECOVERY - Host ganeti2012.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.99 ms [14:47:34] RECOVERY - Host logstash2002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.99 ms [14:47:48] RECOVERY - Host parse2011.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.86 ms [14:47:50] RECOVERY - Host mwlog2001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.38 ms [14:48:02] RECOVERY - Host rdb2005.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.44 ms [14:48:04] RECOVERY - Host maps2003.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.70 ms [14:48:07] !log installing Java security updates on aqs, hadoop and kafka-jumbo [14:48:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:16] RECOVERY - Host mc2030.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.63 ms [14:49:04] RECOVERY - Host conf2002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.77 ms [14:50:16] RECOVERY - Host ganeti2011.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.06 ms [14:50:26] RECOVERY - Host scb2001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.66 ms [14:50:58] (03PS4) 10JMeybohm: Convert termbox to the new layout using the convert script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622584 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [14:53:59] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10jijiki) [14:54:06] (03CR) 10JMeybohm: [C: 04-1] "There is a diff for staging (which is probably okay as it's just `tls.telemetry.enabled: true` I think), but nevertheless:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/622584 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [14:54:30] (03CR) 10JMeybohm: [C: 03+2] Convert citoid to new layout using the conversion script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622585 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [14:54:42] (03PS4) 10JMeybohm: Convert citoid to new layout using the conversion script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622585 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [14:55:16] (03PS3) 10Filippo Giunchedi: prometheus: minimal default alerts for Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/622557 (https://phabricator.wikimedia.org/T258948) [14:55:18] (03PS2) 10Filippo Giunchedi: prometheus: move beta to its own profile [puppet] - 10https://gerrit.wikimedia.org/r/622561 (https://phabricator.wikimedia.org/T258948) [14:55:21] (03PS4) 10Filippo Giunchedi: prometheus: add 'alertmanagers' setting to all instances [puppet] - 10https://gerrit.wikimedia.org/r/622558 (https://phabricator.wikimedia.org/T258948) [14:55:22] (03PS3) 10Filippo Giunchedi: icinga: redirect to https if not already proxied [puppet] - 10https://gerrit.wikimedia.org/r/622566 (https://phabricator.wikimedia.org/T258948) [14:55:24] (03PS1) 10Filippo Giunchedi: alertmanager: use port 9094 for cluster/HA [puppet] - 10https://gerrit.wikimedia.org/r/622825 (https://phabricator.wikimedia.org/T258948) [14:55:51] (03CR) 10JMeybohm: [C: 03+1] Convert citoid to new layout using the conversion script [deployment-charts] - 10https://gerrit.wikimedia.org/r/622585 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [14:55:58] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10jijiki) @Cparle I understand you have written the requested information already, I would appreciate it if you'd do once more on the task description, so to keep things as tidy a... [14:57:37] (03CR) 10Filippo Giunchedi: [C: 03+2] alertmanager: use port 9094 for cluster/HA [puppet] - 10https://gerrit.wikimedia.org/r/622825 (https://phabricator.wikimedia.org/T258948) (owner: 10Filippo Giunchedi) [14:58:07] kormat: merging your change too [14:58:32] RECOVERY - Host phab2001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.57 ms [14:58:43] godog: oh, thanks! [14:59:09] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10Cparle) [14:59:28] 10Operations, 10SRE-Access-Requests: Request for access to analytics-privatedata-users - https://phabricator.wikimedia.org/T260450 (10Cparle) done [14:59:43] np [15:04:40] PROBLEM - Rate of JVM GC Old generation-s runs - logstash1010-production-logstash-eqiad on logstash1010 is CRITICAL: 103.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1010&panelId=37 [15:08:20] 10Operations, 10Analytics-Clusters, 10Analytics-Radar, 10observability, 10Patch-For-Review: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10herron) [15:09:04] (03CR) 10Ottomata: [C: 03+1] Remove schema[12]00[12] records (VMs decommissioned) [dns] - 10https://gerrit.wikimedia.org/r/622755 (https://phabricator.wikimedia.org/T255026) (owner: 10Elukey) [15:09:29] (03PS1) 10Kormat: install_server: Add reuse-parts-test.cfg [puppet] - 10https://gerrit.wikimedia.org/r/622826 [15:10:39] 10Operations, 10Analytics-Clusters, 10Analytics-Radar, 10observability, 10Patch-For-Review: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10herron) Hey @elukey, prep work is done for the new hosts. Will be performing cut-over in the near future, will keep you on the cc. [15:11:01] (03PS3) 10JMeybohm: helmfile: refactor eventgate-analytics [deployment-charts] - 10https://gerrit.wikimedia.org/r/621286 (https://phabricator.wikimedia.org/T258572) [15:15:06] (03PS2) 10Kormat: install_server: Add reuse-parts-test.cfg [puppet] - 10https://gerrit.wikimedia.org/r/622826 [15:18:07] (03PS1) 10JMeybohm: Align all exsiting new-style helmfiles to example [deployment-charts] - 10https://gerrit.wikimedia.org/r/622827 (https://phabricator.wikimedia.org/T258572) [15:19:24] (03CR) 10JMeybohm: [C: 04-1] "helmfile.yaml needs to be aligned with what we have in _example_ now" [deployment-charts] - 10https://gerrit.wikimedia.org/r/619437 (https://phabricator.wikimedia.org/T258572) (owner: 10Giuseppe Lavagetto) [15:23:10] 10Operations, 10ops-codfw, 10netops: (Need by: ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Papaul) [15:24:25] (03PS1) 10Cmjohnson: Adding production dns for an-worker1096-1101 ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622828 (https://phabricator.wikimedia.org/T254892) [15:24:45] (03CR) 10jerkins-bot: [V: 04-1] Adding production dns for an-worker1096-1101 ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622828 (https://phabricator.wikimedia.org/T254892) (owner: 10Cmjohnson) [15:27:13] (03PS2) 10Cmjohnson: Adding production dns for an-worker1096-1101 ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622828 (https://phabricator.wikimedia.org/T254892) [15:27:35] (03CR) 10jerkins-bot: [V: 04-1] Adding production dns for an-worker1096-1101 ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622828 (https://phabricator.wikimedia.org/T254892) (owner: 10Cmjohnson) [15:28:21] (03PS1) 10Jbond: pki: add ability to generate intermidiates [puppet] - 10https://gerrit.wikimedia.org/r/622830 (https://phabricator.wikimedia.org/T259117) [15:29:43] (03PS3) 10Cmjohnson: Adding production dns for an-worker1096-1101 ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622828 (https://phabricator.wikimedia.org/T254892) [15:31:05] (03PS2) 10Jbond: pki: add ability to generate intermidiates [puppet] - 10https://gerrit.wikimedia.org/r/622830 (https://phabricator.wikimedia.org/T259117) [15:32:41] (03PS3) 10Jbond: pki: add ability to generate intermidiates [puppet] - 10https://gerrit.wikimedia.org/r/622830 (https://phabricator.wikimedia.org/T259117) [15:34:59] (03PS1) 10Bstorm: cloud-vps: alerts shouldn't go to noc@ for Toolforge admin stuff [puppet] - 10https://gerrit.wikimedia.org/r/622831 [15:36:24] (03CR) 10Jbond: [C: 03+2] pki: add ability to generate intermidiates [puppet] - 10https://gerrit.wikimedia.org/r/622830 (https://phabricator.wikimedia.org/T259117) (owner: 10Jbond) [15:36:37] (03PS1) 10Effie Mouzeli: admin: add ryanbrounley to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622832 (https://phabricator.wikimedia.org/T261324) [15:37:53] (03PS1) 10Effie Mouzeli: admin: add criley to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622833 (https://phabricator.wikimedia.org/T261160) [15:38:22] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw1267.eqiad.wmnet [15:38:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:38:32] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw1269.eqiad.wmnet [15:38:32] (03CR) 10Bstorm: [C: 03+2] cloud-vps: alerts shouldn't go to noc@ for Toolforge admin stuff [puppet] - 10https://gerrit.wikimedia.org/r/622831 (owner: 10Bstorm) [15:38:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:09] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cloud-vps: alerts shouldn't go to noc@ for Toolforge admin stuff [puppet] - 10https://gerrit.wikimedia.org/r/622831 (owner: 10Bstorm) [15:39:22] (03PS1) 10Effie Mouzeli: admin: add seve-kim to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622834 (https://phabricator.wikimedia.org/T261208) [15:39:29] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw1297.eqiad.wmnet [15:39:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:36] (03CR) 10Elukey: [C: 03+2] Remove schema[12]00[12] records (VMs decommissioned) [dns] - 10https://gerrit.wikimedia.org/r/622755 (https://phabricator.wikimedia.org/T255026) (owner: 10Elukey) [15:39:41] (03PS2) 10Elukey: Remove schema[12]00[12] records (VMs decommissioned) [dns] - 10https://gerrit.wikimedia.org/r/622755 (https://phabricator.wikimedia.org/T255026) [15:39:44] (03PS1) 10Jbond: pki: use safe_title [puppet] - 10https://gerrit.wikimedia.org/r/622835 [15:41:08] (03CR) 10Jbond: [C: 03+2] pki: use safe_title [puppet] - 10https://gerrit.wikimedia.org/r/622835 (owner: 10Jbond) [15:41:46] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw127[6-9].eqiad.wmnet [15:41:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:01] !log dzahn@cumin1001 conftool action : set/weight=1; selector: name=mw1276.eqiad.wmnet,service=canary [15:43:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:27] (03CR) 10Effie Mouzeli: [C: 03+2] admin: add ryanbrounley to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622832 (https://phabricator.wikimedia.org/T261324) (owner: 10Effie Mouzeli) [15:43:36] !log dzahn@cumin1001 conftool action : set/weight=1; selector: name=mw127[7-9].eqiad.wmnet,service=canary [15:43:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:19] 10Operations, 10Analytics-Clusters, 10Analytics-Radar, 10observability, 10Patch-For-Review: Move kafkamon hosts to Debian Buster - https://phabricator.wikimedia.org/T252773 (10herron) [15:46:32] (03PS1) 10Herron: prometheus: begin scraping buster kafkamon hosts [puppet] - 10https://gerrit.wikimedia.org/r/622836 (https://phabricator.wikimedia.org/T252773) [15:49:48] (03PS1) 10Effie Mouzeli: Revert "admin: add ryanbrounley to ldap_only_users" [puppet] - 10https://gerrit.wikimedia.org/r/622771 [15:51:06] (03CR) 10Effie Mouzeli: [C: 03+2] Revert "admin: add ryanbrounley to ldap_only_users" [puppet] - 10https://gerrit.wikimedia.org/r/622771 (owner: 10Effie Mouzeli) [15:51:48] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw128[0-9].eqiad.wmnet [15:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:51] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw1290.eqiad.wmnet [15:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:21] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10JMinor) [15:53:58] (03Abandoned) 10Effie Mouzeli: admin: add criley to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622833 (https://phabricator.wikimedia.org/T261160) (owner: 10Effie Mouzeli) [15:54:02] (03PS1) 10Jbond: pki: remove unused profiles for now [puppet] - 10https://gerrit.wikimedia.org/r/622837 [15:54:16] (03Abandoned) 10Effie Mouzeli: admin: add seve-kim to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622834 (https://phabricator.wikimedia.org/T261208) (owner: 10Effie Mouzeli) [15:54:46] (03CR) 10Jbond: [C: 03+2] pki: remove unused profiles for now [puppet] - 10https://gerrit.wikimedia.org/r/622837 (owner: 10Jbond) [15:55:57] (03CR) 10Bearloga: "Cool! LGTM? 😄" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622604 (https://phabricator.wikimedia.org/T259714) (owner: 10Bearloga) [15:57:49] (03PS1) 10Effie Mouzeli: admin: add ryanbrounley to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622839 (https://phabricator.wikimedia.org/T261324) [15:58:48] (03PS4) 10Cmjohnson: Adding production dns for an-worker1096-1101 ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622828 (https://phabricator.wikimedia.org/T254892) [15:58:59] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:59:14] (03PS1) 10Effie Mouzeli: admin: add criley to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622840 (https://phabricator.wikimedia.org/T261160) [15:59:33] <_joe_> mutante: I didn't suggest to move all the servers in api in eqiad to weight 30 [15:59:47] <_joe_> I sugested to leave the ones in the 12* range to 25 [16:00:02] (03PS1) 10Effie Mouzeli: admin: add seve-kim to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622841 (https://phabricator.wikimedia.org/T261208) [16:00:04] godog and _joe_: (Dis)respected human, time to deploy Puppet request window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T1600). Please do the needful. [16:00:08] <_joe_> anyways, meeting [16:00:55] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:01:26] (03CR) 10Effie Mouzeli: [C: 03+2] admin: add ryanbrounley to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622839 (https://phabricator.wikimedia.org/T261324) (owner: 10Effie Mouzeli) [16:01:41] (03PS1) 10Jbond: pki: fix heredoc command [puppet] - 10https://gerrit.wikimedia.org/r/622842 [16:02:21] _joe_: the ones that had 25 in eqiad are the same hardware type we said we want 30 in codfw. they are not the same that we said we want 25 in codfw [16:02:32] (03CR) 10Effie Mouzeli: [C: 03+2] admin: add criley to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622840 (https://phabricator.wikimedia.org/T261160) (owner: 10Effie Mouzeli) [16:02:41] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10JMinor) [16:02:51] (03CR) 10Jbond: [C: 03+2] pki: fix heredoc command [puppet] - 10https://gerrit.wikimedia.org/r/622842 (owner: 10Jbond) [16:04:20] (03CR) 10Effie Mouzeli: [C: 03+2] admin: add seve-kim to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/622841 (https://phabricator.wikimedia.org/T261208) (owner: 10Effie Mouzeli) [16:04:53] ah, i see. yea, there are 2 very similar types [16:05:29] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw128[0-9].eqiad.wmnet [16:05:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:26] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw1290.eqiad.wmnet [16:06:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:13] (03PS1) 10Jbond: pki: correct creates path [puppet] - 10https://gerrit.wikimedia.org/r/622843 [16:08:22] (03PS1) 10Bstorm: remove python2-only function from puppet_alert to move to py3 [puppet] - 10https://gerrit.wikimedia.org/r/622844 (https://phabricator.wikimedia.org/T218426) [16:08:33] (03PS2) 10Elukey: Remove term es from analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/622744 [16:08:44] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw127[6-9].eqiad.wmnet [16:08:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:54] (03CR) 10Jbond: [C: 03+2] pki: correct creates path [puppet] - 10https://gerrit.wikimedia.org/r/622843 (owner: 10Jbond) [16:09:07] !log dzahn@cumin1001 conftool action : set/weight=1; selector: name=mw127[6-9].eqiad.wmnet,service=canary [16:09:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:17] (03CR) 10Elukey: [C: 03+2] Remove term es from analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/622744 (owner: 10Elukey) [16:10:29] (03PS2) 10Elukey: Remove term labstore1003 from analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/622746 [16:11:39] (03CR) 10Elukey: [C: 03+2] Remove term labstore1003 from analytics-in4 [homer/public] - 10https://gerrit.wikimedia.org/r/622746 (owner: 10Elukey) [16:12:28] (03PS1) 10Bstorm: move puppet_alert script to python3 [puppet] - 10https://gerrit.wikimedia.org/r/622846 [16:12:58] !log remove some old/stale terms from analytics-in4 on cr1/cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622746, https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622744) [16:13:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:34] (03PS2) 10Bstorm: move puppet_alert script to python3 [puppet] - 10https://gerrit.wikimedia.org/r/622846 (https://phabricator.wikimedia.org/T218426) [16:14:01] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw126[1-9].eqiad.wmnet [16:14:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:01] (03CR) 10Cmjohnson: [C: 03+2] Adding production dns for an-worker1096-1101 ipv4/ipv6 [dns] - 10https://gerrit.wikimedia.org/r/622828 (https://phabricator.wikimedia.org/T254892) (owner: 10Cmjohnson) [16:18:55] (03CR) 10Bstorm: [C: 04-1] "Seems we are missing some openstack python3 libraries on at least some projects. I checked on a canary as a likely "vanilla" VM:" [puppet] - 10https://gerrit.wikimedia.org/r/622846 (https://phabricator.wikimedia.org/T218426) (owner: 10Bstorm) [16:19:49] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Seve Kim - https://phabricator.wikimedia.org/T261208 (10jijiki) 05Open→03Resolved Done:) [16:19:56] !log dzahn@cumin1001 conftool action : set/weight=1; selector: name=mw126[1-5].eqiad.wmnet,service=canary [16:19:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:59] 10Operations, 10LDAP-Access-Requests: LDAP access to wmf for Ryan Brounley - https://phabricator.wikimedia.org/T261324 (10jijiki) 05Open→03Resolved a:03jijiki Done:) [16:20:07] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: Product Analytics/Superset Access: LDAP access to the wmf group for Chelsea Riley - https://phabricator.wikimedia.org/T261160 (10jijiki) 05Open→03Resolved Done:) [16:21:27] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw127[0-5].eqiad.wmnet [16:21:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:22:43] <_joe_> mutante: not really [16:23:17] !log hnowlan@deploy1001 helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' . [16:23:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:20] <_joe_> mw1261-1299 all have 40 cores [16:23:29] (03CR) 10Bstorm: [C: 04-1] "I think this is stalled behind https://phabricator.wikimedia.org/T218423" [puppet] - 10https://gerrit.wikimedia.org/r/622846 (https://phabricator.wikimedia.org/T218426) (owner: 10Bstorm) [16:24:06] 10Operations, 10Data-Services, 10SRE-Access-Requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Enable access for wmcs-admins to run wmcs-prefixed cookbooks on cumin hosts - https://phabricator.wikimedia.org/T261145 (10jijiki) @Volans @MoritzMuehlenhoff @jbond Please advise if we should move th... [16:25:06] (03CR) 10Herron: "PCC https://puppet-compiler.wmflabs.org/compiler1001/24755/" [puppet] - 10https://gerrit.wikimedia.org/r/622836 (https://phabricator.wikimedia.org/T252773) (owner: 10Herron) [16:27:02] _joe_: got it. don't worry while in meeting, already reverted those back to 25 that should be and keeping track [16:27:09] fixing [16:28:38] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Elitre) [16:35:30] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw1297.eqiad.wmnet [16:35:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:14] done with eqiad. 25 up to 1299, others 30. moving on to codfw [16:38:28] starting to depool the old codfw servers for decom [16:41:17] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw218[7-9].codfw.wmnet [16:41:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:42] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10wiki_willy) a:03Jclark-ctr [16:45:32] 10Operations, 10ops-eqiad, 10DBA: db1139 memory errors on boot 2020-08-27 - https://phabricator.wikimedia.org/T261405 (10wiki_willy) Hi @Jclark-ctr - can you take a look at this one? It was just purchased last year, so it's still under warranty. Thanks, Willy [16:46:41] (03PS2) 10Dzahn: decom mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/621783 (https://phabricator.wikimedia.org/T260654) [16:47:03] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw219[0-9].codfw.wmnet [16:47:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:09] !log depooling mw2187 - mw2199 - old codfw appservers of type A to be decom'ed, previously weight 10 (T260654) [16:48:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:12] T260654: Decommission mw[2135-2214].codfw.wmnet - https://phabricator.wikimedia.org/T260654 [16:49:52] !log re-weighted appservers and api appservers in eqiad - hardware type G = weight 25, all other types = weight 30 (T261159) [16:49:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:55] T261159: assess and re-evaluate 'weight' settings of appservers in codfw - https://phabricator.wikimedia.org/T261159 [16:54:40] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw222[4-9].codfw.wmnet [16:54:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:07] 10Operations, 10LDAP-Access-Requests: LDAP access to wmf for Ryan Brounley - https://phabricator.wikimedia.org/T261324 (10RBrounley_WMF) Yay, thank you! [16:59:27] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw2230.codfw.wmnet [16:59:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:01] jouncebot: refresh [17:00:02] I refreshed my knowledge about deployments. [17:00:16] Amir1: hey! [17:00:18] ready? [17:01:28] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw2231.codfw.wmnet [17:01:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:02:54] (03PS2) 10Herron: prometheus: switch over to buster kafkamon hosts [puppet] - 10https://gerrit.wikimedia.org/r/622836 (https://phabricator.wikimedia.org/T252773) [17:04:07] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw223[2-9].codfw.wmnet [17:04:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:43] (03PS1) 10Urbanecm: Initial configuration for apiportalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622853 (https://phabricator.wikimedia.org/T246945) [17:10:04] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) [17:10:35] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for apiportalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622853 (https://phabricator.wikimedia.org/T246945) (owner: 10Urbanecm) [17:11:29] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw224[0-2].codfw.wmnet [17:11:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:12:22] !log pt1979@cumin2001 START - Cookbook sre.dns.netbox [17:12:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:13:22] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw225[4-8].codfw.wmnet [17:13:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:15:11] RECOVERY - Rate of JVM GC Old generation-s runs - logstash1010-production-logstash-eqiad on logstash1010 is OK: (C)100 gt (W)80 gt 77.29 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-logstash-eqiad&var-instance=logstash1010&panelId=37 [17:15:26] (03PS2) 10Urbanecm: Initial configuration for apiportalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622853 (https://phabricator.wikimedia.org/T246945) [17:17:36] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw226[8-9].codfw.wmnet [17:17:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:09] mutante: just to be sure, I hope your conftool actions don't mean "don't touch prod"? :-) [17:18:17] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) [17:18:24] !log pt1979@cumin2001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [17:18:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:29] (03PS2) 10Urbanecm: Initial configuration for jawikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622689 (https://phabricator.wikimedia.org/T260320) (owner: 10Ladsgroup) [17:20:43] Urbanecm: no, it does not. it is now codfw-only [17:20:49] okay, cool! [17:23:09] nice to see the ja.wikivoyage config :) [17:23:10] (03PS3) 10Urbanecm: Initial configuration for jawikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622689 (https://phabricator.wikimedia.org/T260320) (owner: 10Ladsgroup) [17:23:36] (03CR) 10Urbanecm: Initial configuration for jawikivoyage (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622689 (https://phabricator.wikimedia.org/T260320) (owner: 10Ladsgroup) [17:23:44] you might have seen the restbase part was already done [17:23:57] yup [17:24:06] (thanks to the new live gerrit patches query) [17:24:10] I don't know who made it, but it's really cool [17:24:20] i read add_a_wiki yesterday and saw some improvements compared to the past [17:24:23] fewer steps [17:24:58] Urbanecm: oh, you mean the "auto-check" part on the ticket? that was surprising .. in a great way [17:25:00] hopefully Amir's Fully Automated Resource Tackle will even lower that [17:25:23] no, I mean this https://usercontent.irccloud-cdn.com/file/6hb8SBen/image.png [17:25:50] (the auto-checklist is the fully automated resource tackle, see https://github.com/Ladsgroup/Phabricator-maintenance-bot/blob/master/new_wikis_handler.py) [17:26:57] oh yea, that is of course really cool a well, it just wasn't new to me as opposed to the other thing. thank you [17:27:17] I'm now waiting for Ami.r1, I'm not sure whether we should do https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/622689/3/dblists/cirrussearch-big-indices.dblist [17:27:36] (the yaml file apparently comes from cswiki, but the wiki isn't going to be big in my opinion) [17:28:14] keep in mind to not use "aawiki" as dummy wiki anymore (just learned that yesterday).. because DBA wants to move more wikis to new shard [17:28:45] yup :) [17:28:55] muswiki is the new bosschief dummy wiki [17:29:08] lol, ack [17:29:47] !log cdanis@cumin1001 START - Cookbook sre.network.cf [17:29:48] !log cdanis@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0) [17:29:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:30:50] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw227[0-7].codfw.wmnet [17:30:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:31:03] !log dzahn@cumin1001 conftool action : set/weight=1; selector: name=mw227[0-7].codfw.wmnet,service=canary [17:31:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:35:10] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10AntiCompositeNumber) https://lists.wikimedia.org/pipermail/maps-l/2020-August/001729.html [17:35:28] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Elitre) [17:39:14] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10AntiCompositeNumber) Note on https://wiki.openstreetmap.org/wiki/Tile_servers#Base_maps is updated with the deprecation notice. I'll remove it from... [17:39:32] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10AntiCompositeNumber) [17:39:55] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw23[0-7][0-9].codfw.wmnet [17:39:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:53] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10Elitre) >>! In T261424#6416783, @AntiCompositeNumber wrote: > Note on https://wiki.openstreetmap.org/wiki/Tile_servers#Base_maps is updated with the... [17:45:28] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw213[5-9].codfw.wmnet [17:45:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:32] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw214[0-7].codfw.wmnet [17:46:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:23] (03PS4) 10Urbanecm: Initial configuration for jawikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622689 (https://phabricator.wikimedia.org/T260320) (owner: 10Ladsgroup) [17:47:41] !log dzahn@cumin1001 conftool action : set/weight=10; selector: name=mw213[5-9].codfw.wmnet [17:47:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:48:00] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for jawikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622689 (https://phabricator.wikimedia.org/T260320) (owner: 10Ladsgroup) [17:48:15] !log dzahn@cumin1001 conftool action : set/weight=10; selector: name=mw214[0-7].codfw.wmnet [17:48:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:48:48] (03Merged) 10jenkins-bot: Initial configuration for jawikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622689 (https://phabricator.wikimedia.org/T260320) (owner: 10Ladsgroup) [17:49:23] (03CR) 10Herron: icinga: support contactgroups stubs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622588 (owner: 10Filippo Giunchedi) [17:50:52] !log dzahn@cumin1001 conftool action : set/weight=10; selector: name=mw2200.codfw.wmnet [17:50:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:51:05] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw2200.codfw.wmnet [17:51:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:52:03] (03PS1) 10BryanDavis: wmcs: expose metricsinfra alert manager [puppet] - 10https://gerrit.wikimedia.org/r/622858 [17:52:14] !log dzahn@cumin1001 conftool action : set/weight=10; selector: name=mw220[1-9].codfw.wmnet [17:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:52:25] jouncebot: next [17:52:26] In 0 hour(s) and 7 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T1800) [17:52:28] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw220[1-9].codfw.wmnet [17:52:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:02] (03CR) 10jerkins-bot: [V: 04-1] wmcs: expose metricsinfra alert manager [puppet] - 10https://gerrit.wikimedia.org/r/622858 (owner: 10BryanDavis) [17:53:14] volans: I'm currently creating a new wiki, fyi [17:53:41] Urbanecm: ack, thanks for the headsup [17:53:53] I was indeed checking what's in-flight [17:54:06] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating jawikivoyage (T260320) (duration: 01m 07s) [17:54:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:09] (03PS2) 10BryanDavis: wmcs: expose metricsinfra alert manager [puppet] - 10https://gerrit.wikimedia.org/r/622858 [17:54:09] T260320: Create Wikivoyage Japanese - https://phabricator.wikimedia.org/T260320 [17:54:11] !log dzahn@cumin1001 conftool action : set/weight=10; selector: name=mw221[0-4].codfw.wmnet [17:54:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:25] !log dzahn@cumin1001 conftool action : set/pooled=no; selector: name=mw221[0-4].codfw.wmnet [17:54:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:55:12] (03CR) 10jerkins-bot: [V: 04-1] wmcs: expose metricsinfra alert manager [puppet] - 10https://gerrit.wikimedia.org/r/622858 (owner: 10BryanDavis) [17:55:13] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating jawikivoyage (T260320) (duration: 01m 03s) [17:55:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:55:55] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw221[5-9].codfw.wmnet [17:55:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:10] !log dzahn@cumin1001 conftool action : set/weight=1; selector: name=mw221[5-9].codfw.wmnet,service=canary [17:56:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:15] (03PS3) 10BryanDavis: wmcs: expose metricsinfra alert manager [puppet] - 10https://gerrit.wikimedia.org/r/622858 [17:56:49] !log urbanecm@deploy1001 Synchronized dblists: Creating jawikivoyage (T260320) (duration: 00m 58s) [17:56:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:57:21] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw222[0-3].codfw.wmnet [17:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:10] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating jawikivoyage (T260320) [17:58:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:23] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: Creating jawikivoyage (T260320) (duration: 01m 03s) [17:59:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:26] T260320: Create Wikivoyage Japanese - https://phabricator.wikimedia.org/T260320 [17:59:43] !log dzahn@cumin1001 conftool action : set/weight=25; selector: name=mw224[4-5].codfw.wmnet [17:59:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:59] !log dzahn@cumin1001 conftool action : set/weight=1; selector: name=mw224[4-5].codfw.wmnet,service=canary [18:00:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor I � Unicode. All rise for Morning backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T1800). [18:00:04] RoanKattouw: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:11] RoanKattouw: please do not deploy now [18:00:17] (finishing a new wiki) [18:00:21] I will do the deployment, since I'm the only customer [18:00:25] OK [18:00:31] I was going to take a 5-minute break anyway [18:00:35] I'll just wait for you to ping me [18:00:36] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating jawikivoyage (T260320) (duration: 01m 02s) [18:00:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:41] cool, that should do RoanKattouw :) [18:00:53] Oh you're done? [18:00:54] (03PS1) 10Urbanecm: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622859 [18:00:56] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622859 (owner: 10Urbanecm) [18:00:59] RoanKattouw: I'm almost done [18:01:06] ^this is the only step, basically^ [18:01:10] (iw-cache) [18:01:36] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622859 (owner: 10Urbanecm) [18:02:11] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw225[0-9].codfw.wmnet,cluster=api_appserver [18:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:47] !log urbanecm@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s) [18:02:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:03:19] RoanKattouw: done for now :) [18:03:30] !log Creating jawikivoyage is done (T260320) [18:03:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:15] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw22[6-8][0-9].codfw.wmnet,cluster=api_appserver [18:05:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:07] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw2290.codfw.wmnet,cluster=api_appserver [18:06:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:48] !log dzahn@cumin1001 conftool action : set/weight=30; selector: name=mw229[1-9].codfw.wmnet,cluster=api_appserver [18:07:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:57] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:12:29] 10Operations, 10Data-Services, 10SRE-Access-Requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Enable access for wmcs-admins to run wmcs-prefixed cookbooks on cumin hosts - https://phabricator.wikimedia.org/T261145 (10Volans) I'd say to discuss it in the meeting, but it was already announced b... [18:13:53] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:14:53] (03PS3) 10Catrope: Enable GrowthExperiments on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621108 (https://phabricator.wikimedia.org/T257490) [18:14:58] !log dzahn@cumin1001 conftool action : set/weight=10; selector: dc=eqiad,cluster=jobrunner,name=mw1318.eqiad.wmnet [18:14:59] (03CR) 10Catrope: [C: 03+2] Enable GrowthExperiments on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621108 (https://phabricator.wikimedia.org/T257490) (owner: 10Catrope) [18:15:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:42] !log cdanis@cumin1001 START - Cookbook sre.network.cf [18:16:42] !log cdanis@cumin1001 END (PASS) - Cookbook sre.network.cf (exit_code=0) [18:16:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:55] (03Merged) 10jenkins-bot: Enable GrowthExperiments on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/621108 (https://phabricator.wikimedia.org/T257490) (owner: 10Catrope) [18:17:44] !log dzahn@cumin1001 conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2249.codfw.wmnet,service=canary [18:17:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:53] !log dzahn@cumin1001 conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2250.codfw.wmnet,service=canary [18:17:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:11] (03PS1) 10Catrope: Actually enable homepage on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622861 (https://phabricator.wikimedia.org/T257490) [18:23:35] (03CR) 10Catrope: [C: 03+2] Actually enable homepage on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622861 (https://phabricator.wikimedia.org/T257490) (owner: 10Catrope) [18:24:23] (03Merged) 10jenkins-bot: Actually enable homepage on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622861 (https://phabricator.wikimedia.org/T257490) (owner: 10Catrope) [18:27:52] 10Operations, 10SRE-Access-Requests: Requesting access to production shell for Razzi Abuissa - https://phabricator.wikimedia.org/T261443 (10razzi) [18:30:24] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to production shell and wmf ldap access for Razzi Abuissa - https://phabricator.wikimedia.org/T261443 (10razzi) [18:31:31] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:31:46] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on ruwiki (T257490) (duration: 01m 03s) [18:31:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:50] T257490: Deploy Growth experiments at Russian Wikipedia - https://phabricator.wikimedia.org/T257490 [18:33:29] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:34:02] 10Operations, 10serviceops: assess and re-evaluate 'weight' settings of appservers in codfw - https://phabricator.wikimedia.org/T261159 (10Dzahn) 05Open→03Resolved a:03Dzahn This is done! The Google doc shows the exact changes made. In general: - oldest hardware is pooled=no - older hardware has weigh... [18:34:04] 10Operations: FY2020-2021 Q1 eqiad -> codfw switchover - https://phabricator.wikimedia.org/T243316 (10Dzahn) [18:37:24] (03PS2) 10Catrope: GrowthExperiments: Assign all homepage users to variant A [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622698 [18:37:37] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Assign all homepage users to variant A [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622698 (owner: 10Catrope) [18:37:58] (03PS27) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) [18:38:28] (03Merged) 10jenkins-bot: GrowthExperiments: Assign all homepage users to variant A [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622698 (owner: 10Catrope) [18:39:10] (03PS1) 10Ppchelko: Increase timeouts for connection to eventgate to match envoy config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622863 (https://phabricator.wikimedia.org/T249745) [18:39:20] Shall we start with apiportalwiki? [18:39:23] Urbanecm: ^ [18:39:40] (03CR) 10Ppchelko: "Matching https://github.com/wikimedia/puppet/blob/production/hieradata/common/profile/services_proxy/envoy.yaml#L23" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622863 (https://phabricator.wikimedia.org/T249745) (owner: 10Ppchelko) [18:39:49] Amir1: RoanKattouw is apparently still deploying :-) [18:40:06] Yes, almost done sorry [18:40:09] (anyway, feel free to review my patch, https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/622853) [18:40:09] let me know once it's done, sorry, I just got here [18:40:22] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) [18:40:49] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [18:41:43] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) [18:42:21] (03PS28) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) [18:43:09] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Assign all homepage users to variant A (duration: 01m 03s) [18:43:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:43:20] Amir1, Urbanecm: I'm done, go ahead [18:43:30] thanks! [18:43:32] !log mforns@deploy1001 Started deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] [18:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:43:50] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) [18:45:43] Amir1: waiting on your review :) [18:45:51] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/622853 [18:46:25] on it now [18:46:38] thanks [18:46:48] It misses something I have open here [18:46:51] I make a follow up [18:46:53] merge it [18:47:22] (03PS3) 10Urbanecm: Initial configuration for apiportalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622853 (https://phabricator.wikimedia.org/T246945) [18:47:28] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for apiportalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622853 (https://phabricator.wikimedia.org/T246945) (owner: 10Urbanecm) [18:47:32] Amir1: okay, merging [18:48:33] (03Merged) 10jenkins-bot: Initial configuration for apiportalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622853 (https://phabricator.wikimedia.org/T246945) (owner: 10Urbanecm) [18:49:49] Amir1: pulled onto deploy1001, is `mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=muswiki en wikimedia apiportalwiki api.wikimedia.org` good to you? [18:50:02] yup but don't run it yet [18:50:08] okay [18:50:10] If I can get the patch uploaded it [18:50:10] what's wrong? [18:50:25] I don't know, my git is stuck [18:50:46] (03PS1) 10Ladsgroup: Add apiportal to case of special mappings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622867 (https://phabricator.wikimedia.org/T246945) [18:50:53] Amir1: full git? or git-review? [18:50:55] whatever, worked ^ [18:51:02] (for git-review, git fetch helps me most of the cases) [18:51:07] no, I think git review was sorta stuck [18:51:26] @seen Hauskatze [18:51:26] mutante: Last time I saw hauskatze they were quitting the network with reason: Quit: Leaving N/A at 8/27/2020 11:55:35 AM (6h55m51s ago) [18:51:48] Amir1: yeah, happens to me too, git fetch usually fixes that :D [18:51:54] (at least on my side) [18:52:13] Urbanecm: also, I think it should have its own group [18:52:19] like "apiportal" [18:52:22] (03CR) 10Urbanecm: [C: 03+2] Add apiportal to case of special mappings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622867 (https://phabricator.wikimedia.org/T246945) (owner: 10Ladsgroup) [18:52:25] not wikimedia [18:52:27] Amir1: good point [18:52:33] anyway, merged your followup, lgtm [18:53:06] (03Merged) 10jenkins-bot: Add apiportal to case of special mappings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622867 (https://phabricator.wikimedia.org/T246945) (owner: 10Ladsgroup) [18:53:07] so, `mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=muswiki en apiportal apiportalwiki api.wikimedia.org`? [18:53:21] cool yup, just do a scap pull [18:53:27] (pull the mapping) [18:53:29] srure [18:53:33] !log mforns@deploy1001 Finished deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 10m 01s) [18:53:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:14] running then [18:54:50] !log mforns@deploy1001 Started deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] [18:54:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:58] !log mforns@deploy1001 Finished deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 00m 08s) [18:55:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:15] I think we have an issue Amir1 [18:55:20] oh what [18:55:31] api.wikimedia.org redirects to governancewiki [18:56:00] even with mwdebug1002 [18:56:16] hnowlan: do you know why? ^ [18:56:27] he's probably out for the day [18:56:33] let me take a look [18:56:54] (I've added the wiki to wikiversion.php manually, so it is supposed to work) [18:57:52] do we have the databases? [18:57:54] 12 target: http://api.wikimedia.org [18:57:54] 13 replacement: https://api-gateway.discovery.wmnet:8087 [18:58:07] this is traffic server config [18:58:19] https://gerrit.wikimedia.org/g/operations/puppet/+/011b7ff9ca3ef44b6a84fde02cf4f808c267c574/hieradata/common/profile/trafficserver/backend.yaml#12 [18:58:22] yup, was going to point to that [18:58:40] Hugh said that's fine [18:58:40] Amir1: yes, `wikiadmin@10.64.0.94(apiportalwiki)>` [18:59:01] it just passes to envoy and then hits the appservers [18:59:12] 10.64.0.94 resolves to db1082, and that's s5 (correct) [18:59:15] or apparently not :P [18:59:38] yup, we should check the envoy config for this [18:59:53] hey, sorry, was afk [19:00:04] marxarelli and longma: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Mediawiki train - American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T1900). [19:00:36] Amir1: previously hitting api.wikimedia.org redirected to foundation also, so it's not quite expected but that behaviour is not new at least [19:00:44] longma: marxarelli: could you please hold for a while? [19:00:49] but let me look at the logs for [19:00:51] np [19:01:25] Urbanecm: in which node you picked it? [19:01:28] mwdebgu1002? [19:01:30] yes [19:01:43] PROBLEM - Ensure local MW versions match expected deployment on mwmaint1002 is CRITICAL: CRITICAL: 1 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [19:01:47] (mwdebug1001 is taken by eff.ie) [19:01:57] ^^ icinga-wm's alert is caused by me^^ [19:03:24] I have a feeling that it redirects to foudnation.wikimedia.org when it can't recognize it [19:03:57] so it's not a big deal on its own [19:04:21] well I can continue, but in theory, the entry being both in wikiversion.json and wikiversion.php should be enough [19:04:25] aha, it might be that the mwdebug thingy doesn't work on apiportal [19:04:36] because it goes pass through varnish [19:04:38] also a possibility [19:04:49] Amir1: so, you think I should sync that and see? [19:04:53] so if it's deployed everywhere it should work [19:04:58] just wait a sec [19:05:00] sure [19:05:15] (03PS29) 10Ryan Kemper: elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) [19:05:30] (03PS3) 10Ottomata: wgEventStreams: Streams for testing MEP-based analytics instruments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622604 (https://phabricator.wikimedia.org/T259714) (owner: 10Bearloga) [19:06:01] it's actually rather easy to test [19:06:05] Urbanecm: marxarelli if you are holdin gthe train atm, mind if I deploy a config change? [19:06:10] let's hit it locally with curl [19:06:15] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/622604 [19:06:18] ottomata: I'm actually holding it because I'm creating a wiki [19:06:32] so no deployments please (unless it's urgent), thanks! [19:06:37] not urgent, can wait [19:06:39] thanks [19:07:23] it looks like requesting any unknown domain defaults to returning the foundation wiki [19:07:39] from the appservers envoy is hitting, that is [19:07:45] yup [19:07:47] that's why [19:08:00] but with the mapping, it should work [19:08:04] okay, so...continue then? [19:08:17] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/622867/1/multiversion/MWMultiVersion.php [19:08:27] (03CR) 10Gehel: [C: 03+2] elasticsearch: verify all write queues are empty [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [19:08:37] Amir1: ha! [19:08:39] apiportal [19:08:40] not apiportalwiki [19:08:59] well, it adds the "wiki" later [19:09:01] right? [19:09:17] none of the other wikis have it [19:09:27] okay, overlook [19:09:43] let me investigate for a bit [19:09:48] sure [19:09:59] let's deploy it for now, it can't break so far [19:10:15] okay, so syncing [19:11:08] so when I do ladsgroup@mwdebug1002:~$ curl http://0.0.0.0:80/ -H 'Host: api.wikimedia.org' I get domain not configured error but I get it for fa.wikipedia.org too [19:11:29] !log urbanecm@deploy1001 Synchronized wmf-config/db-eqiad.php: Creating apiportalwiki (T246945) (duration: 01m 03s) [19:11:33] PROBLEM - Ensure local MW versions match expected deployment on mwdebug1002 is CRITICAL: CRITICAL: 1 mismatched wikiversions https://wikitech.wikimedia.org/wiki/Application_servers [19:11:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:35] T246945: New Public Wiki for the API Portal - https://phabricator.wikimedia.org/T246945 [19:11:37] I think it's just the envoy thingy or varnish doesn't respect mwdebug header [19:11:40] Amir1: try curl -I -H 'Host: test.wikipedia.org' "http://$(hostname -i)/w/load.php" [19:11:48] yeah, that's plausible Amir1 [19:12:58] !log urbanecm@deploy1001 Synchronized wmf-config/db-codfw.php: Creating apiportalwiki (T246945) (duration: 01m 03s) [19:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:14:09] I think Envoy shouldn't interfere with headers passed to the origin much [19:14:18] !log urbanecm@deploy1001 Synchronized multiversion/MWMultiVersion.php: Creating apiportalwiki (T246945) (duration: 01m 03s) [19:14:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:07] but it has the varnish to pass everything, if it passes that, it's likely not get respected on the lvs level [19:15:15] I don't think lvs handles mwdebug [19:15:41] !log urbanecm@deploy1001 Synchronized dblists: Creating apiportalwiki (T246945) (duration: 01m 03s) [19:15:41] varnish supposed to handle that part [19:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:15:51] Amir1: or ATS, in those days [19:16:08] syncing wikiversions nwo, mapping+db already live [19:16:10] so it should work in theory [19:16:19] I don't know if that part is replaced yet but will definitely replace it [19:16:57] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: Creating apiportalwiki (T246945) [19:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:17:00] T246945: New Public Wiki for the API Portal - https://phabricator.wikimedia.org/T246945 [19:17:17] not yet fixed it seems [19:17:23] I check what's happening [19:17:29] Amir1: works for me locally with curl [19:17:31] it does work [19:17:31] RECOVERY - Ensure local MW versions match expected deployment on mwdebug1002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [19:17:33] just browser cache [19:17:37] aaah [19:17:39] cool then [19:18:07] location: https://api.wikimedia.org/wiki/ [19:18:07] location: https://foundation.wikimedia.org/wiki/ [19:18:07] location: https://foundation.wikimedia.org/wiki/Home [19:18:35] ah yeah [19:18:51] $ curl -I https://api.wikimedia.org goes to https://api.wikimedia.org/wiki/, that goes to foundationwiki [19:18:54] with server: envoy [19:18:55] Amir1: ^^ [19:19:06] same here [19:19:08] I check [19:19:13] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: Creating apiportalwiki (T246945) (duration: 01m 03s) [19:19:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:17] thanks, I'm finishing the syncs [19:19:31] RECOVERY - Ensure local MW versions match expected deployment on mwmaint1002 is OK: OKAY: wikiversions in sync https://wikitech.wikimedia.org/wiki/Application_servers [19:20:29] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Creating apiportalwiki (T246945) (duration: 01m 03s) [19:20:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:38] (03PS1) 10Urbanecm: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622869 [19:20:40] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622869 (owner: 10Urbanecm) [19:20:43] anyway, wiki should work, in theory [19:21:28] I think I'm getting close to find out what's wrong [19:21:32] if you curl an app server and set the host header, you can confirm it for that part [19:21:34] so don't worrl [19:21:34] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622869 (owner: 10Urbanecm) [19:22:42] !log urbanecm@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s) [19:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:48] Amir1: I should be done, in theory [19:22:55] oh hm. The apache config change to add api to `other_wikis` hasn't been merged or deployed. That would do it I guess? [19:23:10] Amir1: you fine to release the dragontrain? [19:23:20] Definitely [19:23:25] we can wait for a bit [19:23:48] hnowlan: let me take a look, I think that's not needed [19:23:53] if this is stable for now I can pick up investigating this tomorrow morning given that envoy is an unknown factor in all of this [19:23:55] but I check [19:23:59] (03PS5) 10Ppchelko: api-gateway: Restrict unauthenticated write HTTP methods, permit read HTTP methods [deployment-charts] - 10https://gerrit.wikimedia.org/r/613650 (https://phabricator.wikimedia.org/T256769) (owner: 10Hnowlan) [19:24:13] hnowlan: I don't think it's envoy, so don't worry [19:26:20] 10Operations, 10Traffic, 10fundraising-tech-ops: SSL cert for links.email.wikimedia.org - https://phabricator.wikimedia.org/T188561 (10CCogdill_WMF) Update! The link tracking domain removed support for TLS 1.0 and 1.1, so now score higher: https://www.immuniweb.com/ssl/?id=de0FYntW Can we revisit this? [19:26:23] Amir1: lmk if I can help somehow [19:26:30] Sure [19:26:45] hnowlan: I think the apache config is needed, in otherwikis [19:26:59] similar to iegcom [19:27:42] lol [19:27:52] :D [19:28:14] I was thinking we could add it to $wgLocalVirtualHosts [19:28:28] :-) [19:28:45] Amir1: so...we can handover, I think? [19:28:46] but let's not do that :D [19:28:50] definitely [19:29:01] hnowlan: for when you have time, can you make the puppet patch? [19:29:16] longma: marxarelli: We're done, I'm sorry for the delay [19:29:23] deployment is yours [19:30:08] no problem. thanks [19:30:31] Amir1: yep, can do. I don't think I'll be able to deploy it tonight though - haven't done an apache config deploy before and want to be careful [19:33:59] maybe mutante can help but I don't know if he's busy today [19:37:20] (03PS1) 10Dduvall: all wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622873 [19:37:22] (03CR) 10Dduvall: [C: 03+2] all wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622873 (owner: 10Dduvall) [19:38:04] (03Merged) 10jenkins-bot: all wikis to 1.36.0-wmf.6 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622873 (owner: 10Dduvall) [19:38:45] PROBLEM - BGP status on cr4-ulsfo is CRITICAL: BGP CRITICAL - AS6939/IPv4: Active - HE, AS6939/IPv6: Active - HE https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [19:41:06] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.6 [19:41:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:53] (03PS4) 10Herron: kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [19:44:36] 10Operations, 10ops-eqiad, 10decommission-hardware: Decommission old-msw1 - https://phabricator.wikimedia.org/T261449 (10Cmjohnson) [19:45:32] 10Operations, 10ops-eqiad, 10netops: eqiad row D switch fabric recabling - https://phabricator.wikimedia.org/T256112 (10wiki_willy) a:05ayounsi→03Cmjohnson [19:45:43] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 119 probes of 559 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [19:46:14] jouncebot: next [19:46:14] In 3 hour(s) and 13 minute(s): Evening backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T2300) [19:46:18] jouncebot: now [19:46:19] For the next 1 hour(s) and 13 minute(s): Mediawiki train - American Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T1900) [19:46:39] 10Operations, 10ops-eqiad, 10DC-Ops, 10netops, 10cloud-services-team (Hardware): (Need By: 2020-06-12) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10wiki_willy) a:05Cmjohnson→03Jclark-ctr [19:48:48] 10Operations, 10ops-eqiad: (Need by: 2020-06-30) replace scs-a8-eqiad - https://phabricator.wikimedia.org/T228919 (10Cmjohnson) This is still open and I am aware of it, the new console switch is here, it's a time-consuming process because each console cable needs to be snipped and re-done to match a standard... [19:49:31] uhhhh [19:49:34] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to production shell and wmf ldap access for Razzi Abuissa - https://phabricator.wikimedia.org/T261443 (10jijiki) p:05Triage→03Medium a:03razzi [19:49:38] I think something is wrong in ulsfo [19:50:02] (03PS1) 10Bstorm: cloud-vps: Add python3 client packages in cloud [puppet] - 10https://gerrit.wikimedia.org/r/622874 (https://phabricator.wikimedia.org/T218423) [19:51:15] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10jijiki) 05Open→03Stalled p:05Triage→03High [19:51:18] (03PS1) 10CDanis: depool ulsfo [dns] - 10https://gerrit.wikimedia.org/r/622875 [19:51:41] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 54 probes of 559 (alerts on 65) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [19:52:22] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Limit maps serving to Wikimedia hosted sites only - https://phabricator.wikimedia.org/T261424 (10jijiki) @JMinor Please change the task's status to 'Open' when everything is ready to be merged. Thank you! [19:52:53] (03CR) 10Bstorm: [C: 03+2] shared-storage: add specific NFS volume monitoring for cleanups [puppet] - 10https://gerrit.wikimedia.org/r/622655 (https://phabricator.wikimedia.org/T261335) (owner: 10Bstorm) [19:53:09] 10Operations, 10DNS, 10Traffic: 'skip_first' feature flag for gdnsd GeoIP plugin - https://phabricator.wikimedia.org/T261340 (10jijiki) p:05Triage→03Medium [19:57:18] !log dzahn@cumin1001 conftool action : set/pooled=inactive; selector: dc=codfw,cluster=appserver,name=mw21[8-9][0-9]*.codfw.wmnet [19:57:20] !log 1.36.0-wmf.6 promoted to all wikis (T257974). new errors appear to be related to T261345 but are known since 1.36.0-wmf.5 [19:57:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:25] T257974: 1.36.0-wmf.6 deployment blockers - https://phabricator.wikimedia.org/T257974 [19:57:25] T261345: Not an available content version. - https://phabricator.wikimedia.org/T261345 [19:58:18] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [19:58:19] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [19:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:27] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [19:58:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:31] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [19:58:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:59] (03PS1) 10Bstorm: Revert "shared-storage: add specific NFS volume monitoring for cleanups" [puppet] - 10https://gerrit.wikimedia.org/r/622773 [20:03:14] (03CR) 10Bstorm: [V: 03+2 C: 03+2] Revert "shared-storage: add specific NFS volume monitoring for cleanups" [puppet] - 10https://gerrit.wikimedia.org/r/622773 (owner: 10Bstorm) [20:08:41] (03PS5) 10Herron: kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [20:09:58] PROBLEM - NFS Share Volume Space /srv/misc on labstore1004 is CRITICAL: NRPE: Command check_NFS not defined https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Shared_storage%23NFS_volume_cleanup https://grafana.wikimedia.org/d/50z0i4XWz/tools-overall-nfs-storage-utilization?orgId=1 [20:11:54] (03PS6) 10Herron: kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [20:12:38] 10Operations, 10Mail: Create Group Aliases for itservices@ - https://phabricator.wikimedia.org/T259727 (10jijiki) @Herron do you happen to have any insight here? I am not sure I know where to look. Thank you! [20:14:22] marxarelli: Urbanecm ok if i deploy config now? [20:14:57] train seems good, so fine by me [20:20:44] thx [20:21:01] (03PS4) 10Ottomata: wgEventStreams: Streams for testing MEP-based analytics instruments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622604 (https://phabricator.wikimedia.org/T259714) (owner: 10Bearloga) [20:21:19] 10Operations, 10Data-Services, 10SRE-Access-Requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Enable access for wmcs-admins to run wmcs-prefixed cookbooks on cumin hosts - https://phabricator.wikimedia.org/T261145 (10bd808) [20:22:18] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 8 PDU Upgrade - Racks D3 and D4 - https://phabricator.wikimedia.org/T261452 (10wiki_willy) [20:22:38] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 8 PDU Upgrade - Racks D3 and D4 - https://phabricator.wikimedia.org/T261452 (10wiki_willy) [20:23:16] 10Operations, 10Mail: Forwarding or alias for fundraising@ - https://phabricator.wikimedia.org/T252932 (10Dzahn) @JGulingan ping [20:24:13] 10Operations, 10Mail: Create Group Aliases for itservices@ - https://phabricator.wikimedia.org/T259727 (10Dzahn) Hi @HMarcus in the past several group aliases have moved from us over to your team. For example fundraising group aliases in T128647 travel group aliases in T127549 and other subtasks of the... [20:24:15] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 8 PDU Upgrade - Racks D3 and D4 - https://phabricator.wikimedia.org/T261452 (10wiki_willy) List of hosts (and racks) from this maintenance window: an-test-coord1001 D3 aqs1009 D3 asw2-d3-eqiad D3 auth1002 D3 cablemgmt-wmf5286 D3 db1106 D3 db1140 D3 dbproxy1017... [20:24:55] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 8 PDU Upgrade 12pm-4pm UTC- Racks D3 and D4 - https://phabricator.wikimedia.org/T261452 (10wiki_willy) [20:30:22] (03PS7) 10Herron: kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [20:31:00] 10Operations, 10ops-eqiad, 10DC-Ops: Thur, Sept 10 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8 - https://phabricator.wikimedia.org/T261454 (10wiki_willy) [20:31:32] 10Operations, 10Data-Services, 10SRE-Access-Requests, 10Patch-For-Review, 10cloud-services-team (Kanban): Enable access for wmcs-admins to run wmcs-prefixed cookbooks on cumin hosts - https://phabricator.wikimedia.org/T261145 (10bd808) >>! In T261145#6416898, @Volans wrote: > As I commented in the CR my... [20:31:56] 10Operations, 10ops-eqiad, 10DC-Ops: Thur, Sept 10 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8 - https://phabricator.wikimedia.org/T261454 (10wiki_willy) List of hostnames in racks D7 and D8 below: analytics1077 D7 an-presto1003 D7 an-worker1094 D7 an-worker1095 D7 an-worker1101 D7 an-worker1115 D7 an-worker1... [20:32:23] 10Operations, 10ops-eqiad, 10DC-Ops: Wed, Sept 9 PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10wiki_willy) [20:32:44] (03CR) 10BryanDavis: "Configuration live tested on prometheus01.metricsinfra.eqiad.wmflabs." [puppet] - 10https://gerrit.wikimedia.org/r/622858 (owner: 10BryanDavis) [20:33:55] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 14 PDU Upgrade 12pm-4pm UTC- Racks C2 and C3 - https://phabricator.wikimedia.org/T261455 (10wiki_willy) [20:34:40] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 14 PDU Upgrade 12pm-4pm UTC- Racks C2 and C3 - https://phabricator.wikimedia.org/T261455 (10wiki_willy) List of hostnames in C2 and C3 below: analytics1028 C2 analytics1029 C2 analytics1030 C2 analytics1031 C2 analytics1064 C2 analytics1065 C2 analytics1066 C2... [20:35:27] 10Operations, 10ops-eqiad, 10DC-Ops: Mon, Sept 14 PDU Upgrade 12pm-4pm UTC- Racks C2 and C3 - https://phabricator.wikimedia.org/T261455 (10wiki_willy) [20:37:16] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 15 PDU Upgrade 12pm-4pm UTC- Racks C4 and C5 - https://phabricator.wikimedia.org/T261456 (10wiki_willy) [20:38:00] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 15 PDU Upgrade 12pm-4pm UTC- Racks C4 and C5 - https://phabricator.wikimedia.org/T261456 (10wiki_willy) List of hostnames in C4 and C5: an-worker1089 C4 an-worker1090 C4 an-worker1100 C4 an-worker1105 C4 an-worker1106 C4 an-worker1107 C4 an-worker1108 C4 asw2-c... [20:39:34] 10Operations, 10ops-eqiad, 10DC-Ops: Wed, Sept 16 PDU Upgrade 12pm-4pm UTC- Racks C6 and C7 - https://phabricator.wikimedia.org/T261457 (10wiki_willy) [20:39:56] 10Operations, 10ops-eqiad, 10DC-Ops: Wed, Sept 16 PDU Upgrade 12pm-4pm UTC- Racks C6 and C7 - https://phabricator.wikimedia.org/T261457 (10wiki_willy) List of hostnames in racks C6 and C7 below: alert1001 C6 asw2-c6-eqiad C6 asw-c6-eqiad C6 bast1002 C6 cablemgmt-wmf5283 C6 db1121 C6 db1134 C6 db1147 C6 gane... [20:41:36] (03CR) 10Ottomata: [C: 03+2] wgEventStreams: Streams for testing MEP-based analytics instruments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622604 (https://phabricator.wikimedia.org/T259714) (owner: 10Bearloga) [20:41:38] 10Operations, 10ops-eqiad, 10DC-Ops: Thur, Sept 17 PDU Upgrade 12pm-4pm UTC- Rack C1 (Fundraising) - https://phabricator.wikimedia.org/T261458 (10wiki_willy) [20:42:10] 10Operations, 10ops-eqiad, 10DC-Ops: Thur, Sept 17 PDU Upgrade 12pm-4pm UTC- Rack C1 (Fundraising) - https://phabricator.wikimedia.org/T261458 (10wiki_willy) List of hostnames in Fundraising rack C1 below: asw-c1-eqiad civi1001 fasw-c1a-eqiad fasw-c1b-eqiad fmsw-c1-eqiad fran1001 frauth1001 frav1002 frban10... [20:43:15] !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Streams for testing MEP-based analytics instruments - T259714 (duration: 00m 55s) [20:43:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:43:21] T259714: Test schema and stream for analytics events - https://phabricator.wikimedia.org/T259714 [20:44:03] 10Operations, 10ops-eqiad, 10DC-Ops: Mon, Sept 21 PDU Upgrade 12pm-4pm UTC- Racks D1 and D2 - https://phabricator.wikimedia.org/T261459 (10wiki_willy) [20:44:59] 10Operations, 10ops-eqiad, 10DC-Ops: Mon, Sept 21 PDU Upgrade 12pm-4pm UTC- Racks D1 and D2 - https://phabricator.wikimedia.org/T261459 (10wiki_willy) List of hostnames in racks D1 and D2: asw2-d1-eqiad D1 cablemgmt-wmf5284 D1 centrallog1001 D1 db1125 D1 db1136 D1 db1148 D1 dbproxy1016 D1 dns1002 D1 dumpsda... [20:45:03] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 132, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:45:05] PROBLEM - Router interfaces on cr4-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [20:46:04] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 8 PDU Upgrade 12pm-4pm UTC- Racks D3 and D4 - https://phabricator.wikimedia.org/T261452 (10wiki_willy) [20:46:06] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy) [20:46:42] 10Operations, 10ops-eqiad, 10DC-Ops: Wed, Sept 9 PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10wiki_willy) [20:46:44] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy) [20:47:04] 10Operations, 10ops-eqiad, 10DC-Ops: Thur, Sept 10 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8 - https://phabricator.wikimedia.org/T261454 (10wiki_willy) [20:47:07] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy) [20:47:23] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy) [20:47:25] 10Operations, 10ops-eqiad, 10DC-Ops: Mon, Sept 14 PDU Upgrade 12pm-4pm UTC- Racks C2 and C3 - https://phabricator.wikimedia.org/T261455 (10wiki_willy) [20:47:39] !log dzahn@cumin1001 conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw213[0-9].codfw.wmnet [20:47:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:41] 10Operations, 10ops-eqiad, 10DC-Ops: Tue, Sept 15 PDU Upgrade 12pm-4pm UTC- Racks C4 and C5 - https://phabricator.wikimedia.org/T261456 (10wiki_willy) [20:47:43] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy) [20:48:00] 10Operations, 10ops-eqiad, 10DC-Ops: Wed, Sept 16 PDU Upgrade 12pm-4pm UTC- Racks C6 and C7 - https://phabricator.wikimedia.org/T261457 (10wiki_willy) [20:48:02] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy) [20:48:08] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [20:48:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:10] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [20:48:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:26] 10Operations, 10ops-eqiad, 10DC-Ops: Thur, Sept 17 PDU Upgrade 12pm-4pm UTC- Rack C1 (Fundraising) - https://phabricator.wikimedia.org/T261458 (10wiki_willy) [20:48:28] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy) [20:48:45] !log dzahn@cumin1001 conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw214[0-7].codfw.wmnet [20:48:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:51] 10Operations, 10ops-eqiad, 10DC-Ops: Mon, Sept 21 PDU Upgrade 12pm-4pm UTC- Racks D1 and D2 - https://phabricator.wikimedia.org/T261459 (10wiki_willy) [20:48:53] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy) [20:48:56] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [20:48:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:58] (03PS1) 10Ottomata: wgEventStreams - Set canary_events_enabled: true for eventgate test streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622876 (https://phabricator.wikimedia.org/T251609) [20:48:59] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [20:49:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:49:41] !log dzahn@cumin1001 conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw220[0-9].codfw.wmnet [20:49:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:49:54] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [20:49:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:49:58] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [20:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:50:13] !log dzahn@cumin1001 conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw221[0-4].codfw.wmnet [20:50:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:50:22] !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime [20:50:24] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [20:50:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:50:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:59:39] 10Operations, 10Mail: Create Group Aliases for itservices@ - https://phabricator.wikimedia.org/T259727 (10HMarcus) 05Open→03Resolved a:03HMarcus Hi @Dzahn , Thanks for your follow-up. This can actually be closed as we came across an internal solve, however I would like to point out those previous ticket... [21:01:40] (03PS8) 10Herron: kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [21:02:08] (03PS1) 10Bstorm: shared-storage: add specific NFS volume monitoring for cleanups [puppet] - 10https://gerrit.wikimedia.org/r/622877 (https://phabricator.wikimedia.org/T261335) [21:03:17] (03CR) 10Dzahn: [C: 03+2] decom mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/621783 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [21:03:24] (03PS3) 10Dzahn: decom mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/621783 (https://phabricator.wikimedia.org/T260654) [21:04:41] (03CR) 10Dzahn: "confirmed they were already gone from https://config-master.wikimedia.org/pybal/codfw/" [puppet] - 10https://gerrit.wikimedia.org/r/621783 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [21:05:31] (03CR) 10Dzahn: "-40 servers also matches the doc" [puppet] - 10https://gerrit.wikimedia.org/r/621783 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [21:06:14] volans: i am about to use decom cookbook [21:06:27] (03PS2) 10Bstorm: shared-storage: add specific NFS volume monitoring for cleanups [puppet] - 10https://gerrit.wikimedia.org/r/622877 (https://phabricator.wikimedia.org/T261335) [21:07:00] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:07:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:08:05] mutante: ol do I have time to merge a patch? [21:08:10] *ok [21:08:22] volans: yes, you do. i did just a single host on purpose [21:08:28] and now I am removing the others from DHCP first [21:08:32] to get fewer warnings [21:08:46] ack, thx, give me 3 [21:09:18] (03PS1) 10Razzi: Add razzi to users [puppet] - 10https://gerrit.wikimedia.org/r/622878 [21:09:20] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [puppet] - 10https://gerrit.wikimedia.org/r/622878 (owner: 10Razzi) [21:09:54] (03PS2) 10Volans: sre.hosts.decommission: delete ifaces on Netbox [cookbooks] - 10https://gerrit.wikimedia.org/r/622676 (https://phabricator.wikimedia.org/T258729) [21:10:05] (03CR) 10Volans: [C: 03+2] sre.hosts.decommission: delete ifaces on Netbox [cookbooks] - 10https://gerrit.wikimedia.org/r/622676 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [21:10:20] !log dzahn@cumin1001 END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) [21:10:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:10:44] (03PS1) 10Dzahn: DHCP: remove mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/622879 (https://phabricator.wikimedia.org/T260654) [21:10:59] (03Merged) 10jenkins-bot: sre.hosts.decommission: delete ifaces on Netbox [cookbooks] - 10https://gerrit.wikimedia.org/r/622676 (https://phabricator.wikimedia.org/T258729) (owner: 10Volans) [21:12:03] (03CR) 10Dzahn: [C: 03+2] DHCP: remove mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/622879 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [21:12:18] mutante: patch merged and cumin hosts updated. Feel free to run it when you need, I'll keep an eye for issues [21:13:05] volans: ack, so first.. m2187 by itself [21:13:11] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:13:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:13:48] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:13:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:13:52] finds nothing in the repos, no warnings .. because they are part of regexes in site.pp and gone from DHCP... [21:13:54] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw2187.codfw.wmnet` - mw2187.codfw.wmnet (*... [21:13:55] volans: it did "Set Netbox status to Decommissioning and deleted all non-mgmt interfaces and related IPs [21:14:26] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:14:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:14:48] yep [21:14:51] all good from the logs [21:14:56] and.. detected one that is still in scap/dsh.yaml :) [21:15:13] ehehe [21:15:32] volans: is there an official word to "not" proceed, or just type 3 random strings :) [21:15:49] !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) [21:15:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:53] the latter [21:16:00] ack.. so that code 99 was me [21:16:01] I should modify the message to say that [21:16:40] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:16:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:17:23] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:17:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:17:28] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw2189.codfw.wmnet` - mw2189.codfw.wmnet (*... [21:17:33] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:17:33] !log dzahn@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [21:17:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:17:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:17:50] that was because i tried more than 5 without --force [21:17:58] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:18:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:18:09] k [21:18:34] (03PS9) 10Herron: kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [21:18:38] volans: mw2190 - mw2194 done if you want more to check in netbox [21:18:59] eh..in progress [21:19:42] all good, thx [21:20:18] (03PS10) 10Herron: kibana: move kibana.yml settings to parameters [puppet] - 10https://gerrit.wikimedia.org/r/622651 [21:20:32] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:20:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:20:38] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[2190-2194].codfw.wmnet` - mw2190.codfw.w... [21:20:40] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:20:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:21:50] the detecting of proxies really pays off :) [21:22:21] i will cancel another one. no issues with the cookbook [21:22:23] !log dzahn@cumin1001 END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) [21:22:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:22:51] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:22:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:22:57] (03PS1) 10Andrew Bogott: wmcs check_flavor_properties: adjust to reflect new quota plans [puppet] - 10https://gerrit.wikimedia.org/r/622880 [21:23:19] (03CR) 10jerkins-bot: [V: 04-1] wmcs check_flavor_properties: adjust to reflect new quota plans [puppet] - 10https://gerrit.wikimedia.org/r/622880 (owner: 10Andrew Bogott) [21:23:28] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:23:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:23:38] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw2195.codfw.wmnet` - mw2195.codfw.wmnet (*... [21:23:42] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:23:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:25:11] (03PS2) 10Andrew Bogott: wmcs check_flavor_properties: adjust to reflect new quota plans [puppet] - 10https://gerrit.wikimedia.org/r/622880 [21:25:19] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:25:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:25:37] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:25:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:25:48] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[2197-2199].codfw.wmnet` - mw2197.codfw.w... [21:26:46] mutante: thanks for testing it :) [21:27:04] (03CR) 10Andrew Bogott: [C: 03+2] wmcs check_flavor_properties: adjust to reflect new quota plans [puppet] - 10https://gerrit.wikimedia.org/r/622880 (owner: 10Andrew Bogott) [21:27:09] volans: you're welcome. I see no issues. [21:28:12] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:28:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:28:19] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[2135-2139].codfw.wmnet` - mw2135.codfw.w... [21:28:24] (03CR) 10Bstorm: labstore: add data types and some other style fixes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622666 (owner: 10Dzahn) [21:29:52] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:29:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:30:54] (03CR) 10Volans: "post-merge comment" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/619781 (https://phabricator.wikimedia.org/T261239) (owner: 10Ryan Kemper) [21:31:11] (03CR) 10Dzahn: "yep, thank you. there is no rush to this." [puppet] - 10https://gerrit.wikimedia.org/r/622666 (owner: 10Dzahn) [21:32:34] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:42] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[2140-2144].codfw.wmnet` - mw2140.codfw.w... [21:33:04] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:33:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:33:21] (03CR) 10MarcoAurelio: Add razzi to users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/622878 (owner: 10Razzi) [21:34:12] (03CR) 10Herron: "https://puppet-compiler.wmflabs.org/compiler1003/24765/" [puppet] - 10https://gerrit.wikimedia.org/r/622651 (owner: 10Herron) [21:34:42] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:34:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:49] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[2145-2147].codfw.wmnet` - mw2145.codfw.w... [21:35:44] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.40 [software/spicerack] - 10https://gerrit.wikimedia.org/r/622884 [21:37:57] (03CR) 10jerkins-bot: [V: 04-1] CHANGELOG: add changelogs for release v0.0.40 [software/spicerack] - 10https://gerrit.wikimedia.org/r/622884 (owner: 10Volans) [21:39:39] (03CR) 10Dzahn: "welcome to Wikimedia and Gerrit!" [puppet] - 10https://gerrit.wikimedia.org/r/622878 (owner: 10Razzi) [21:40:34] (03CR) 10Dzahn: "took the liberty to adjust the topic to "access-requests". if you click that you can see a bunch of other access requests grouped together" [puppet] - 10https://gerrit.wikimedia.org/r/622878 (owner: 10Razzi) [21:41:04] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:41:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:10] (03PS2) 10Volans: CHANGELOG: add changelogs for release v0.0.40 [software/spicerack] - 10https://gerrit.wikimedia.org/r/622884 [21:43:39] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:43:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:56] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:43:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:06] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[2200-2204].codfw.wmnet` - mw2200.codfw.w... [21:46:32] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:46:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:46:38] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[2205-2209].codfw.wmnet` - mw2205.codfw.w... [21:48:03] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.40 [software/spicerack] - 10https://gerrit.wikimedia.org/r/622884 (owner: 10Volans) [21:49:18] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:50:18] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:50:24] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.40 [software/spicerack] - 10https://gerrit.wikimedia.org/r/622884 (owner: 10Volans) [21:50:40] (03CR) 10Ottomata: "LGTM! Just add" [puppet] - 10https://gerrit.wikimedia.org/r/622878 (owner: 10Razzi) [21:53:21] (03PS1) 10Volans: Upstream release v0.0.40 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/622890 [21:55:53] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.40 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/622890 (owner: 10Volans) [21:57:27] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [21:57:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:58:12] (03Merged) 10jenkins-bot: Upstream release v0.0.40 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/622890 (owner: 10Volans) [21:59:41] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [21:59:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:59:48] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw[2210-2212,2214].codfw.wmnet` - mw2210.co... [22:03:30] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10RobH) [22:06:16] (03PS1) 10Volans: debian: change default target OS to buster [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/622893 [22:06:38] (03CR) 10Volans: [C: 03+2] debian: change default target OS to buster [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/622893 (owner: 10Volans) [22:06:46] PROBLEM - mediawiki-installation DSH group on mw2196 is CRITICAL: Host mw2196 is not in mediawiki-installation dsh group https://wikitech.wikimedia.org/wiki/Monitoring/check_dsh_groups [22:08:51] (03Merged) 10jenkins-bot: debian: change default target OS to buster [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/622893 (owner: 10Volans) [22:09:03] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10RobH) [22:14:48] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10RobH) [22:15:36] (03PS1) 10Dzahn: scap: remove proxy for codfw C4, mw2188 [puppet] - 10https://gerrit.wikimedia.org/r/622895 (https://phabricator.wikimedia.org/T260654) [22:16:35] (03CR) 10Dzahn: [C: 03+2] scap: remove proxy for codfw C4, mw2188 [puppet] - 10https://gerrit.wikimedia.org/r/622895 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [22:17:37] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [22:17:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:03] (03PS1) 10Ppchelko: Install OAuthRateLimiter extension I: Add i18n [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622896 (https://phabricator.wikimedia.org/T258423) [22:18:05] (03PS1) 10Ppchelko: Install OAuthRateLimiter extension II: Add flag to IS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622897 (https://phabricator.wikimedia.org/T258423) [22:18:07] (03PS1) 10Ppchelko: Install OAuthRateLimiter III: Install where enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622898 (https://phabricator.wikimedia.org/T258423) [22:18:09] (03PS1) 10Ppchelko: Install OAuthRateLimiter extension IV: Enable on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622899 [22:18:16] !log uploaded spicerack_0.0.40-1_amd64.deb to apt.wikimedia.org buster-wikimedia [22:18:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:23] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [22:18:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:18:30] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `mw2188.codfw.wmnet` - mw2188.codfw.wmnet (*... [22:19:44] (03CR) 10Ppchelko: [C: 04-2] "Depends on creating the DB table https://phabricator.wikimedia.org/T258711" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622899 (owner: 10Ppchelko) [22:22:58] (03PS1) 10Dzahn: mediawiki: replace mw2196 with mw2336 as mcrouter proxy [puppet] - 10https://gerrit.wikimedia.org/r/622900 (https://phabricator.wikimedia.org/T260654) [22:26:00] elukey: I need to replace an mcrouter proxy with a different host. Was it really just editing mcrouter_wancache.yaml and pick a random appserver or was there more to first make it usable as a proxy? [22:27:52] (03PS2) 10Dzahn: mediawiki: replace mw2196 with mw2336 as mcrouter proxy [puppet] - 10https://gerrit.wikimedia.org/r/622900 (https://phabricator.wikimedia.org/T260654) [22:28:05] (03PS3) 10Dzahn: mediawiki: replace mw2196 with mw2336 as mcrouter proxy [puppet] - 10https://gerrit.wikimedia.org/r/622900 (https://phabricator.wikimedia.org/T260654) [22:28:55] !log removing one file for legal compliance [22:28:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:29:07] (03PS2) 10Razzi: admin: Add razzi to users [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T252617) [22:30:26] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) [22:32:12] (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1002/24766/mw2336.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/622900 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [22:33:10] (03PS3) 10Razzi: admin: Add razzi to users and add to analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T252617) [22:35:00] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2187-mw2199, mw2135-mw2147, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) All are decom'ed and done except 1 host, mw2196, which is an mcrouter proxy. [22:38:42] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) ` member ge-1/0/21 { ... } + member xe-4/0/10; [edit interfaces] + xe-4/0/10 { + description ms-be... [22:39:36] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) [22:42:07] (03CR) 10Dzahn: "I think you got the wrong bug number and wanted "Bug: T261443" in the commit message. This is a good time show you can do this in the web " [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T252617) (owner: 10Razzi) [22:42:52] (03PS1) 10Dzahn: site: remove mw2187-mw2195, mw2197-mw2199, mw2135-mw2147, mw2200-mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/622902 (https://phabricator.wikimedia.org/T260654) [22:43:15] (03CR) 10Razzi: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T252617) (owner: 10Razzi) [22:44:05] (03PS4) 10Razzi: admin: Add razzi to users and add to analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) [22:44:23] (03PS1) 10Papaul: DNS: Add production DNS for ms-be2057 [dns] - 10https://gerrit.wikimedia.org/r/622903 [22:45:45] (03PS2) 10Ppchelko: Install OAuthRateLimiter III: Install where enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622898 (https://phabricator.wikimedia.org/T258423) [22:45:47] (03PS2) 10Ppchelko: Install OAuthRateLimiter extension IV: Enable on labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/622899 [22:47:30] (03CR) 10Papaul: [C: 03+2] DNS: Add production DNS for ms-be2057 [dns] - 10https://gerrit.wikimedia.org/r/622903 (owner: 10Papaul) [22:47:33] (03CR) 10Dzahn: [C: 03+1] admin: Add razzi to users and add to analytics groups [puppet] - 10https://gerrit.wikimedia.org/r/622878 (https://phabricator.wikimedia.org/T261443) (owner: 10Razzi) [22:49:11] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage, 10Patch-For-Review: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10Papaul) [22:55:58] (03CR) 10Cwhite: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/622836 (https://phabricator.wikimedia.org/T252773) (owner: 10Herron) [22:56:45] (03PS1) 10Dave Pifke: arclamp: provide Swift credentials to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/622904 (https://phabricator.wikimedia.org/T244776) [23:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor I � Unicode. All rise for Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200827T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:01:13] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TBD) install new controller into frdb1001 OR add to spares - https://phabricator.wikimedia.org/T261348 (10wiki_willy) a:03Jclark-ctr [23:01:19] 10Operations, 10serviceops, 10Patch-For-Review: Decommission mw2135-mw2147, mw2187-mw2199, mw2200-mw2214 (all PowerEdge R420) - https://phabricator.wikimedia.org/T260654 (10Dzahn) [23:01:47] (03CR) 10Dzahn: [C: 03+2] site: remove mw2187-mw2195, mw2197-mw2199, mw2135-mw2147, mw2200-mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/622902 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [23:05:02] (03PS2) 10Dave Pifke: arclamp: provide Swift credentials to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/622904 (https://phabricator.wikimedia.org/T244776) [23:05:47] (03PS1) 10Papaul: DHCP: Add MAC address for ms-be2057 [puppet] - 10https://gerrit.wikimedia.org/r/622905 (https://phabricator.wikimedia.org/T260188) [23:07:27] (03CR) 10Papaul: [C: 03+2] DHCP: Add MAC address for ms-be2057 [puppet] - 10https://gerrit.wikimedia.org/r/622905 (https://phabricator.wikimedia.org/T260188) (owner: 10Papaul) [23:09:12] (03PS3) 10Dave Pifke: arclamp: provide Swift credentials to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/622904 (https://phabricator.wikimedia.org/T244776) [23:13:30] (03PS4) 10Dave Pifke: arclamp: provide Swift credentials to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/622904 (https://phabricator.wikimedia.org/T244776) [23:15:17] (03PS1) 10Dzahn: site: remove mw2189-mw2195,mw2197-mw2199 [puppet] - 10https://gerrit.wikimedia.org/r/622906 (https://phabricator.wikimedia.org/T260654) [23:16:10] (03CR) 10Dzahn: [C: 03+2] site: remove mw2189-mw2195,mw2197-mw2199 [puppet] - 10https://gerrit.wikimedia.org/r/622906 (https://phabricator.wikimedia.org/T260654) (owner: 10Dzahn) [23:20:08] (03PS1) 10Papaul: Add ms-be2057 to site.pp with role insetup [puppet] - 10https://gerrit.wikimedia.org/r/622908 (https://phabricator.wikimedia.org/T260188) [23:20:59] (03CR) 10Dzahn: [C: 03+1] Add ms-be2057 to site.pp with role insetup [puppet] - 10https://gerrit.wikimedia.org/r/622908 (https://phabricator.wikimedia.org/T260188) (owner: 10Papaul) [23:21:23] (03Abandoned) 10Papaul: Add ms-be2057 to site.pp with role insetup [puppet] - 10https://gerrit.wikimedia.org/r/622908 (https://phabricator.wikimedia.org/T260188) (owner: 10Papaul) [23:25:19] (03Abandoned) 10Dzahn: apt: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/621365 (owner: 10Dzahn) [23:31:47] 10Operations, 10ops-codfw, 10DC-Ops, 10SRE-swift-storage, 10Patch-For-Review: (Need By: ASAP) rack/setup/install ms-be2057.codfw.wmnet (Test Server - Keep Boxes) - https://phabricator.wikimedia.org/T260188 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for... [23:34:30] (03CR) 10Dzahn: [C: 03+1] "lgtm afaict, + godog . https://puppet-compiler.wmflabs.org/compiler1003/24771/webperf2002.codfw.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/622904 (https://phabricator.wikimedia.org/T244776) (owner: 10Dave Pifke) [23:42:24] (03CR) 10Dzahn: "Did something break that made you revert?" [puppet] - 10https://gerrit.wikimedia.org/r/622773 (owner: 10Bstorm) [23:44:48] (03CR) 10Bstorm: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/622773 (owner: 10Bstorm) [23:46:36] (03CR) 10Dzahn: [C: 03+1] shared-storage: add specific NFS volume monitoring for cleanups [puppet] - 10https://gerrit.wikimedia.org/r/622877 (https://phabricator.wikimedia.org/T261335) (owner: 10Bstorm) [23:46:56] (03CR) 10Dzahn: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/622773 (owner: 10Bstorm) [23:51:43] !log pt1979@cumin2001 START - Cookbook sre.hosts.downtime [23:51:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:52:54] (03PS5) 10Dave Pifke: arclamp: provide Swift credentials to cron jobs [puppet] - 10https://gerrit.wikimedia.org/r/622904 (https://phabricator.wikimedia.org/T244776) [23:53:46] !log pt1979@cumin2001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) [23:53:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log