[00:00:15] RECOVERY - Check systemd state on netflow1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:21] RECOVERY - Check systemd state on netflow2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:25] RECOVERY - Check systemd state on netflow3001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:00:29] RECOVERY - Check systemd state on netflow5001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:01:17] RECOVERY - Check systemd state on netflow4001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:45:07] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:15:35] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [04:19:17] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:13:34] (03PS1) 10Marostegui: db1119: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/611957 (https://phabricator.wikimedia.org/T254462) [05:14:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1119 for innodb compression', diff saved to https://phabricator.wikimedia.org/P11864 and previous config saved to /var/cache/conftool/dbconfig/20200713-051428-marostegui.json [05:14:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:15:16] (03CR) 10Marostegui: [C: 03+2] db1119: Disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/611957 (https://phabricator.wikimedia.org/T254462) (owner: 10Marostegui) [05:29:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P11865 and previous config saved to /var/cache/conftool/dbconfig/20200713-052928-marostegui.json [05:29:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:30:19] !log Stop replication on db1082 for schema change and triggers removal T238966 [05:30:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:30:23] T238966: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 [05:33:41] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:34:43] !log Deploy schema change on s3 codfw master, lag will appear on codfw T253276 [05:34:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:34:48] T253276: Normalise MW Core database language fields length - https://phabricator.wikimedia.org/T253276 [05:39:15] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [05:43:11] PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [05:43:55] PROBLEM - Check systemd state on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:43:57] PROBLEM - Check size of conntrack table on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [05:44:29] PROBLEM - puppet last run on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [05:47:15] PROBLEM - Disk space on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=kubernetes2002&var-datasource=codfw+prometheus/ops [05:48:40] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1093 for upgrade', diff saved to https://phabricator.wikimedia.org/P11866 and previous config saved to /var/cache/conftool/dbconfig/20200713-054840-marostegui.json [05:48:41] PROBLEM - proton LVS codfw on proton.svc.codfw.wmnet is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/Proton [05:48:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:50:57] PROBLEM - DPKG on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [05:51:41] PROBLEM - proton LVS codfw on proton.svc.codfw.wmnet is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 500 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Te [05:51:41] page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/Proton [05:51:55] PROBLEM - MD RAID on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [05:53:35] PROBLEM - configured eth on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [05:53:49] RECOVERY - proton LVS codfw on proton.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Proton [05:54:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1093', diff saved to https://phabricator.wikimedia.org/P11867 and previous config saved to /var/cache/conftool/dbconfig/20200713-055422-marostegui.json [05:54:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:10] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1093', diff saved to https://phabricator.wikimedia.org/P11868 and previous config saved to /var/cache/conftool/dbconfig/20200713-060410-marostegui.json [06:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:25] PROBLEM - Check the NTP synchronisation status of timesyncd on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP [06:07:47] PROBLEM - IPMI Sensor Status on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [06:12:11] (03PS1) 10Marostegui: production-m2.sql: Remove gerrit related grants [puppet] - 10https://gerrit.wikimedia.org/r/611963 (https://phabricator.wikimedia.org/T255715) [06:13:01] PROBLEM - dhclient process on kubernetes2002 is CRITICAL: connect to address 10.192.16.42 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [06:15:47] RECOVERY - Check systemd state on kubernetes2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:16:19] !log Reverse gerrit password on m2 master - T255715 [06:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:16:24] T255715: Make sure both `reviewdb-test` (used forgerrit upgrade testing) and `reviewdb` (formerly production) databases get torn down - https://phabricator.wikimedia.org/T255715 [06:16:47] RECOVERY - Check size of conntrack table on kubernetes2002 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack [06:16:51] RECOVERY - puppet last run on kubernetes2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [06:17:01] (03CR) 10Marostegui: "I have done a reverse password for gerrit user on the database, so effectively the grants have been changed, as the password should not lo" [puppet] - 10https://gerrit.wikimedia.org/r/611963 (https://phabricator.wikimedia.org/T255715) (owner: 10Marostegui) [06:20:09] PROBLEM - MariaDB Replica SQL: m2 on db1117 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1348, Errmsg: Error Column Password is not updatable on query. Default database: . [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [06:21:13] PROBLEM - MariaDB Replica SQL: m2 on db2133 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1348, Errmsg: Error Column Password is not updatable on query. Default database: . [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [06:21:19] ^ checking [06:21:25] must have been the above !log [06:21:51] RECOVERY - DPKG on kubernetes2002 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [06:23:47] RECOVERY - MD RAID on kubernetes2002 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [06:23:55] RECOVERY - MariaDB Replica SQL: m2 on db1117 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [06:24:27] RECOVERY - configured eth on kubernetes2002 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth [06:25:01] RECOVERY - MariaDB Replica SQL: m2 on db2133 is OK: OK slave_sql_state Slave_SQL_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [06:25:52] (03PS1) 10Marostegui: mariadb: Promote db1093 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/611964 (https://phabricator.wikimedia.org/T257253) [06:26:51] (03CR) 10Marostegui: [C: 04-2] "Wait for the failover date" [puppet] - 10https://gerrit.wikimedia.org/r/611964 (https://phabricator.wikimedia.org/T257253) (owner: 10Marostegui) [06:28:10] (03PS1) 10Marostegui: wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/611965 (https://phabricator.wikimedia.org/T257253) [06:28:35] RECOVERY - Disk space on kubernetes2002 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=kubernetes2002&var-datasource=codfw+prometheus/ops [06:28:39] (03CR) 10Marostegui: [C: 04-2] "Wait for the failover date" [dns] - 10https://gerrit.wikimedia.org/r/611965 (https://phabricator.wikimedia.org/T257253) (owner: 10Marostegui) [06:35:18] RECOVERY - Check the NTP synchronisation status of timesyncd on kubernetes2002 is OK: OK: synced at Mon 2020-07-13 06:35:17 UTC. https://wikitech.wikimedia.org/wiki/NTP [06:38:39] RECOVERY - IPMI Sensor Status on kubernetes2002 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Power_Supply_Failures [06:43:53] RECOVERY - dhclient process on kubernetes2002 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient [06:44:41] RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes2002 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [06:46:19] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [06:49:43] (03PS1) 10Marostegui: mariadb: Remove puppet references for db1097 [puppet] - 10https://gerrit.wikimedia.org/r/612135 (https://phabricator.wikimedia.org/T257406) [06:50:38] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [06:50:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:51:07] !log marostegui@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) [06:51:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:19] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [06:52:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:54] (03CR) 10Marostegui: [C: 03+2] mariadb: Remove puppet references for db1097 [puppet] - 10https://gerrit.wikimedia.org/r/612135 (https://phabricator.wikimedia.org/T257406) (owner: 10Marostegui) [06:52:59] !log marostegui@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [06:53:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:05] 10Operations, 10DBA, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1097.eqiad.wmnet` - db1097.eqiad.wmnet (**FAIL**) - Downtimed host on Icinga - Found... [06:53:38] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [06:53:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:27] !log marostegui@cumin1001 END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) [06:54:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:54:32] 10Operations, 10DBA, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1097.eqiad.wmnet` - db1097.eqiad.wmnet (**FAIL**) - Downtimed host on Icinga - Found... [06:55:31] 10Operations, 10DBA, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10Marostegui) I have powered off the host manually, the IPMI connection was failing [06:55:56] 10Operations, 10DBA, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10Marostegui) [06:57:36] (03PS1) 10Marostegui: wmnet: Remove db1097 DNS [dns] - 10https://gerrit.wikimedia.org/r/612136 (https://phabricator.wikimedia.org/T257406) [07:04:55] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [07:07:37] (03CR) 10Marostegui: [C: 03+2] wmnet: Remove db1097 DNS [dns] - 10https://gerrit.wikimedia.org/r/612136 (https://phabricator.wikimedia.org/T257406) (owner: 10Marostegui) [07:07:49] 10Operations, 10DBA, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10Marostegui) [07:09:59] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission-hardware: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10Marostegui) a:05Marosteguiβ†’03Jclark-ctr @Jclark-ctr please note that this host has mainboard/memory issues, so let's label it as such. However, the disks and the... [07:22:40] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Import AMD rocm packages in wikimedia-buster - https://phabricator.wikimedia.org/T224723 (10MoritzMuehlenhoff) ROCM is now also packaged in Debian: https://packages.qa.debian.org/r/rocr-runtime.html https://packages.qa.deb... [07:45:46] (03PS2) 10Jcrespo: admin: Add Mooeypoo (wikigit) to the analytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/611228 (https://phabricator.wikimedia.org/T257502) [07:47:19] (03CR) 10Jcrespo: [C: 03+2] admin: Add Mooeypoo (wikigit) to the analytics-privatedata-users group [puppet] - 10https://gerrit.wikimedia.org/r/611228 (https://phabricator.wikimedia.org/T257502) (owner: 10Jcrespo) [07:55:29] (03PS5) 10Jcrespo: bacula: Merge prometheus exporter and icinga check into a single file [puppet] - 10https://gerrit.wikimedia.org/r/611390 (https://phabricator.wikimedia.org/T234900) [07:55:31] (03PS1) 10Jcrespo: analytics-access: Add kerberos principal to user wikigit [puppet] - 10https://gerrit.wikimedia.org/r/612140 (https://phabricator.wikimedia.org/T257502) [07:55:51] (03PS2) 10Jcrespo: analytics-access: Add kerberos principal to user wikigit [puppet] - 10https://gerrit.wikimedia.org/r/612140 (https://phabricator.wikimedia.org/T257502) [07:57:02] (03CR) 10Jcrespo: [C: 03+2] analytics-access: Add kerberos principal to user wikigit [puppet] - 10https://gerrit.wikimedia.org/r/612140 (https://phabricator.wikimedia.org/T257502) (owner: 10Jcrespo) [08:03:38] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for mooeypoo - https://phabricator.wikimedia.org/T257502 (10jcrespo) 05Openβ†’03Resolved Access to private data, including kerberos authentication has been granted. You should have received an email on... [08:04:56] PROBLEM - Check systemd state on sodium is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:11:46] (03CR) 10Volans: "Post-merge -1, see inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/609178 (https://phabricator.wikimedia.org/T243057) (owner: 10Ema) [08:12:33] (03CR) 10Kormat: [C: 03+1] mariadb: Promote db1093 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/611964 (https://phabricator.wikimedia.org/T257253) (owner: 10Marostegui) [08:13:03] (03CR) 10Kormat: [C: 03+1] wmnet: Update s6-master alias [dns] - 10https://gerrit.wikimedia.org/r/611965 (https://phabricator.wikimedia.org/T257253) (owner: 10Marostegui) [08:19:28] (03PS1) 10Muehlenhoff: Switch yarn.wikimedia.org to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) [08:20:44] (03PS1) 10Ema: ATS: log cacheable cookies from etherpad too [puppet] - 10https://gerrit.wikimedia.org/r/612144 (https://phabricator.wikimedia.org/T256395) [08:20:54] !log reimaging es1022 T257284 [08:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:00] T257284: Upgrade es4 to debian buster + mariadb 10.4 - https://phabricator.wikimedia.org/T257284 [08:21:06] (03PS6) 10Jcrespo: bacula: Merge prometheus exporter and icinga check into a single file [puppet] - 10https://gerrit.wikimedia.org/r/611390 (https://phabricator.wikimedia.org/T234900) [08:24:14] (03PS1) 10Kormat: es1022: Disable notifications for reimaging [puppet] - 10https://gerrit.wikimedia.org/r/612147 (https://phabricator.wikimedia.org/T257284) [08:26:37] (03CR) 10Marostegui: [C: 03+1] es1022: Disable notifications for reimaging [puppet] - 10https://gerrit.wikimedia.org/r/612147 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [08:34:14] !log kormat@cumin1001 dbctl commit (dc=all): 'Add weight to es1020, reduce weight on es1022 T257284', diff saved to https://phabricator.wikimedia.org/P11869 and previous config saved to /var/cache/conftool/dbconfig/20200713-083414-kormat.json [08:34:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:19] T257284: Upgrade es4 to debian buster + mariadb 10.4 - https://phabricator.wikimedia.org/T257284 [08:35:24] (03CR) 10ZPapierski: "Won't this fail without a value set somewhere (like in common.yaml)? Am I missing it?" [puppet] - 10https://gerrit.wikimedia.org/r/611373 (owner: 10DCausse) [08:38:38] (03PS1) 10Kormat: install_server: Switch es2022 to buster [puppet] - 10https://gerrit.wikimedia.org/r/612150 (https://phabricator.wikimedia.org/T257284) [08:39:01] (03CR) 10DCausse: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/611373 (owner: 10DCausse) [08:39:03] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1093', diff saved to https://phabricator.wikimedia.org/P11870 and previous config saved to /var/cache/conftool/dbconfig/20200713-083902-marostegui.json [08:39:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:14] (03CR) 10Marostegui: [C: 03+1] install_server: Switch es2022 to buster [puppet] - 10https://gerrit.wikimedia.org/r/612150 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [08:39:55] (03CR) 10ZPapierski: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/611373 (owner: 10DCausse) [08:41:40] (03CR) 10ZPapierski: [wdqs] overrides default blazegraph ns (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/611373 (owner: 10DCausse) [08:44:50] !log kormat@cumin1001 dbctl commit (dc=all): 'Depool es1022 for reimaging T257284', diff saved to https://phabricator.wikimedia.org/P11871 and previous config saved to /var/cache/conftool/dbconfig/20200713-084449-kormat.json [08:44:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:44:55] T257284: Upgrade es4 to debian buster + mariadb 10.4 - https://phabricator.wikimedia.org/T257284 [08:45:10] (03CR) 10Kormat: [C: 03+2] es1022: Disable notifications for reimaging [puppet] - 10https://gerrit.wikimedia.org/r/612147 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [08:45:44] (03CR) 10Kormat: [C: 03+2] install_server: Switch es2022 to buster [puppet] - 10https://gerrit.wikimedia.org/r/612150 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [08:47:31] (03PS4) 10Jbond: mariadb::ferm_misc add netmon1002/2001 access [puppet] - 10https://gerrit.wikimedia.org/r/611243 [08:48:09] (03CR) 10ZPapierski: [C: 03+1] [wcqs] update logo URL [puppet] - 10https://gerrit.wikimedia.org/r/611196 (https://phabricator.wikimedia.org/T251514) (owner: 10DCausse) [08:48:43] (03CR) 10Jbond: [C: 03+1] Remove cas-icinga server alias [puppet] - 10https://gerrit.wikimedia.org/r/611344 (owner: 10Muehlenhoff) [08:49:19] (03CR) 10Jbond: [C: 03+1] Remove cas-icinga from ACME config [puppet] - 10https://gerrit.wikimedia.org/r/611345 (owner: 10Muehlenhoff) [08:50:00] (03PS1) 10Ema: ATS: set Rsyslog::Conf priority to 20 [puppet] - 10https://gerrit.wikimedia.org/r/612151 (https://phabricator.wikimedia.org/T256395) [08:51:44] I'm going to do a little config deployment. [08:52:31] (03PS4) 10Jforrester: VisualEditor: Explicitly set visualeditor-enable to 0 when non-default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610156 (https://phabricator.wikimedia.org/T248343) (owner: 10C. Scott Ananian) [08:52:33] (03CR) 10Jforrester: [C: 03+2] VisualEditor: Explicitly set visualeditor-enable to 0 when non-default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610156 (https://phabricator.wikimedia.org/T248343) (owner: 10C. Scott Ananian) [08:53:09] (03Merged) 10jenkins-bot: VisualEditor: Explicitly set visualeditor-enable to 0 when non-default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610156 (https://phabricator.wikimedia.org/T248343) (owner: 10C. Scott Ananian) [08:53:12] (03CR) 10Ema: [C: 03+2] ATS: log cacheable cookies from etherpad too [puppet] - 10https://gerrit.wikimedia.org/r/612144 (https://phabricator.wikimedia.org/T256395) (owner: 10Ema) [08:53:23] (03CR) 10Ema: [C: 03+2] ATS: set Rsyslog::Conf priority to 20 [puppet] - 10https://gerrit.wikimedia.org/r/612151 (https://phabricator.wikimedia.org/T256395) (owner: 10Ema) [08:53:53] (03CR) 10Jbond: [C: 03+2] mariadb::ferm_misc add netmon1002/2001 access [puppet] - 10https://gerrit.wikimedia.org/r/611243 (owner: 10Jbond) [08:57:16] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T248343 Explicitly set visualeditor-enable to 0 when non-default (duration: 00m 57s) [08:57:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:21] T248343: Uncached VisualEditor w/ Parsoid/PHP (no JS, no RESTBase) for MW 1.35 LTS - https://phabricator.wikimedia.org/T248343 [08:57:46] (03PS4) 10Jforrester: Revert "dblists: Remove "do not modify" note from all.dblist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594214 [08:58:03] (03CR) 10Jforrester: [C: 03+2] buildDBLists: Remove circular dependency on all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594216 (https://phabricator.wikimedia.org/T251715) (owner: 10Jforrester) [08:58:11] (03PS5) 10Jforrester: buildDBLists: Remove circular dependency on all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594216 (https://phabricator.wikimedia.org/T251715) [08:58:16] (03CR) 10Jforrester: [C: 03+2] Revert "dblists: Remove "do not modify" note from all.dblist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594214 (owner: 10Jforrester) [08:58:47] (03PS1) 10Gehel: wdqs: fix missing user agent for test servers [puppet] - 10https://gerrit.wikimedia.org/r/612157 [08:58:50] !log cp: rolling ats-backend-restart to apply SyslogIdentifier changes -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/611311 [08:58:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:19] (03CR) 10DCausse: [C: 03+1] wdqs: fix missing user agent for test servers [puppet] - 10https://gerrit.wikimedia.org/r/612157 (owner: 10Gehel) [08:59:46] (03CR) 10Gehel: [C: 03+2] wdqs: fix missing user agent for test servers [puppet] - 10https://gerrit.wikimedia.org/r/612157 (owner: 10Gehel) [09:00:18] (03Merged) 10jenkins-bot: Revert "dblists: Remove "do not modify" note from all.dblist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594214 (owner: 10Jforrester) [09:00:22] (03Merged) 10jenkins-bot: buildDBLists: Remove circular dependency on all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594216 (https://phabricator.wikimedia.org/T251715) (owner: 10Jforrester) [09:01:57] (03PS2) 10Muehlenhoff: Switch yarn.wikimedia.org to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) [09:09:32] apergos: A small number of prod errors with "BaseDump:57 PHP Warning: XMLReader::open(): Unable to find the wrapper "mediawiki.compress.nzbip2"" and similar; known issue? [09:09:48] known me testing and already fixed [09:09:55] there should be like 5 of them max [09:09:58] Excellent, will pretend I didn't see them. [09:10:13] they will be from snapshot1010 i n case that matters (testbed host) [09:10:32] The last one was a minute ago (09:09). [09:10:50] really? [09:11:18] Oh, no, the nzbip2 ones stopped at 08:33. [09:11:28] ok good, that makes much more sense [09:11:30] The cuerrent ones are "stat failed for …" [09:11:35] yeah ignore those [09:11:40] Wilco. :-) [09:12:00] that's me proving that this command hangs even if the first prefetch file I specify in the list doesn't exist [09:12:06] as long as the list is 2 or more files :-/ [09:12:07] Fun. [09:12:13] yeah I'm loving it a whole lot [09:12:18] thanks for checking in, in any case [09:12:35] (03PS1) 10Kosta Harlan: beta: Update domain for statsd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612160 (https://phabricator.wikimedia.org/T241462) [09:14:12] kostajh: Want that slung out now? [09:14:35] James_F: oh yeah, sure [09:14:43] hopefully that's enough to fix the problem? [09:14:48] (03CR) 10Jforrester: [C: 03+2] beta: Update domain for statsd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612160 (https://phabricator.wikimedia.org/T241462) (owner: 10Kosta Harlan) [09:14:52] πŸ€·πŸ½β€β™‚οΈ [09:14:55] Let's find out. [09:15:54] (03Merged) 10jenkins-bot: beta: Update domain for statsd [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612160 (https://phabricator.wikimedia.org/T241462) (owner: 10Kosta Harlan) [09:17:04] 10Operations, 10Pywikibot, 10cloud-services-team (Kanban): http://pywikibot.org/ is displaying Wikimedia error page - https://phabricator.wikimedia.org/T257536 (10Legoktm) After discussion with SRE, we need to drop the CNAME record and set the NS records at the registrar level. I've emailed @Siebrand asking... [09:17:05] James_F: is it live? maintenance.php still shows the old value [09:17:07] (03CR) 10Volans: [C: 03+2] "Thanks for the upstream bug and temporary fix! LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/610301 (owner: 10Kormat) [09:17:29] kostajh: It'll take about 5 mins to sync out. [09:17:30] (03PS1) 10Privacybatm: transferpy: Create required directories at the time of deb installation [software/transferpy] - 10https://gerrit.wikimedia.org/r/612162 (https://phabricator.wikimedia.org/T257599) [09:17:37] ah ok [09:18:07] I've pulled it in prod, but not Beta Cluster; that's automatic unless it's ultra-broken. [09:19:21] 10Operations, 10Cloud-VPS (Project-requests), 10cloud-services-team (Kanban): Request creation of 'sre-sandbox' VPS project - https://phabricator.wikimedia.org/T247517 (10jbond) thanks created https://phabricator.wikimedia.org/T257796 [09:19:31] 10Operations, 10Pywikibot, 10Traffic, 10HTTPS: Configure HTTPS for pywikibot.org - https://phabricator.wikimedia.org/T257537 (10Legoktm) Once T257536#6300167 is done, this should be possible. [09:19:40] 10Operations, 10Pywikibot, 10Traffic, 10HTTPS: Configure HTTPS for pywikibot.org - https://phabricator.wikimedia.org/T257537 (10Legoktm) [09:19:48] 10Operations, 10Pywikibot, 10cloud-services-team (Kanban): http://pywikibot.org/ is displaying Wikimedia error page - https://phabricator.wikimedia.org/T257536 (10Legoktm) [09:19:51] (03Merged) 10jenkins-bot: Pin to a working version of prospector [software/spicerack] - 10https://gerrit.wikimedia.org/r/610301 (owner: 10Kormat) [09:23:03] 10Operations, 10SRE-tools: Improve sre.hosts.decommission (additionally find host yaml files) - https://phabricator.wikimedia.org/T257297 (10Volans) @elukey indeed that's confusing, we can surely improve it. I think we could first improve the message for what we're searching for, so that it's easier to spot fa... [09:25:35] XioNoX: o/ do you know of any reason why a server trying to boot off pxe would fail to reach a dhcp server currently? [09:27:14] (03PS57) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 (https://phabricator.wikimedia.org/T254248) [09:27:54] (03PS58) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 (https://phabricator.wikimedia.org/T254248) [09:28:51] (03PS1) 10Muehlenhoff: Allow to pass additional settings to the vhost config which are IDP-specific [puppet] - 10https://gerrit.wikimedia.org/r/612166 [09:29:21] on the third attempt it finally reached a dhcp server [09:29:30] James_F: ah right. Still showing the old value but I'll check again in an hour or so [09:29:57] (03CR) 10Urbanecm: "I'm 99% sure the build script used to work that way, but given most initial configs are uploaded days to weeks before actual wiki creation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594216 (https://phabricator.wikimedia.org/T251715) (owner: 10Jforrester) [09:33:06] (03CR) 10Jbond: "lg minor nit" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612166 (owner: 10Muehlenhoff) [09:34:01] (03PS1) 10QChris: Revert "gerrit: Add Code Review logo as favicon" [puppet] - 10https://gerrit.wikimedia.org/r/611914 [09:35:36] (03CR) 10QChris: Revert "gerrit: Add Code Review logo as favicon" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/611914 (owner: 10QChris) [09:36:35] (03CR) 10Jbond: role: port netmon to Buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/611317 (https://phabricator.wikimedia.org/T247967) (owner: 10Filippo Giunchedi) [09:36:41] (03PS2) 10QChris: Revert "gerrit: Add Code Review logo as favicon" [puppet] - 10https://gerrit.wikimedia.org/r/611914 (https://phabricator.wikimedia.org/T257218) [09:37:06] (03PS1) 10Jcrespo: bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) [09:38:40] (03CR) 10jerkins-bot: [V: 04-1] bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo) [09:38:42] (03PS6) 10Vgutierrez: ATS: Support listening on multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) [09:39:38] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime [09:39:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:50] (03PS2) 10Jcrespo: bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) [09:41:18] (03CR) 10jerkins-bot: [V: 04-1] bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo) [09:42:08] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:42:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:19] (03CR) 10ZPapierski: [C: 04-1] "/sparql endpoint isn't properly secured" [puppet] - 10https://gerrit.wikimedia.org/r/609909 (https://phabricator.wikimedia.org/T251498) (owner: 10ZPapierski) [09:43:56] (03PS3) 10Jcrespo: bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) [09:49:01] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Good job. My comments are all nits and can be fixed with a later patch - nothing has to do with the code." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/566559 (https://phabricator.wikimedia.org/T254248) (owner: 10Jbond) [09:50:42] (03PS2) 10DCausse: [wdqs] overrides default blazegraph ns [puppet] - 10https://gerrit.wikimedia.org/r/611373 [09:51:13] (03PS7) 10Vgutierrez: ATS: Support listening on multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) [09:51:43] (03CR) 10DCausse: [wdqs] overrides default blazegraph ns (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/611373 (owner: 10DCausse) [09:52:14] (03CR) 10jerkins-bot: [V: 04-1] [wdqs] overrides default blazegraph ns [puppet] - 10https://gerrit.wikimedia.org/r/611373 (owner: 10DCausse) [09:54:27] (03CR) 10Vgutierrez: "pcc is almost a NOOP: https://puppet-compiler.wmflabs.org/compiler1003/23825/" [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) (owner: 10Vgutierrez) [09:55:26] (03PS2) 10Muehlenhoff: Allow to pass additional settings to the vhost config which are IDP-specific [puppet] - 10https://gerrit.wikimedia.org/r/612166 [09:56:35] (03CR) 10Muehlenhoff: Allow to pass additional settings to the vhost config which are IDP-specific (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612166 (owner: 10Muehlenhoff) [09:57:41] (03PS3) 10DCausse: [wdqs] overrides default blazegraph ns [puppet] - 10https://gerrit.wikimedia.org/r/611373 [09:58:13] (03CR) 10Giuseppe Lavagetto: "One general comment: I'd rather have ratelimit run as a separate service than as part of the same chart as envoy." [deployment-charts] - 10https://gerrit.wikimedia.org/r/609808 (https://phabricator.wikimedia.org/T254906) (owner: 10Hnowlan) [10:05:37] (03CR) 10Gehel: [C: 04-1] "See comments inline." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/611373 (owner: 10DCausse) [10:06:00] (03CR) 10Jbond: [C: 03+1] Allow to pass additional settings to the vhost config which are IDP-specific [puppet] - 10https://gerrit.wikimedia.org/r/612166 (owner: 10Muehlenhoff) [10:06:31] (03PS2) 10Gehel: [wcqs] update logo URL [puppet] - 10https://gerrit.wikimedia.org/r/611196 (https://phabricator.wikimedia.org/T251514) (owner: 10DCausse) [10:07:04] (03CR) 10Gehel: [C: 03+2] [wcqs] update logo URL [puppet] - 10https://gerrit.wikimedia.org/r/611196 (https://phabricator.wikimedia.org/T251514) (owner: 10DCausse) [10:11:21] (03CR) 10Gehel: [C: 03+1] "LGTM" [labs/private] - 10https://gerrit.wikimedia.org/r/610705 (https://phabricator.wikimedia.org/T254646) (owner: 10Ryan Kemper) [10:18:41] (03PS8) 10Vgutierrez: ATS: Support listening on multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) [10:19:33] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I like the idea and the code for tests/integration with the rest of the software, but I don't like much relying on parsing output of the d" (031 comment) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/611251 (https://phabricator.wikimedia.org/T251918) (owner: 10JMeybohm) [10:19:56] (03CR) 10jerkins-bot: [V: 04-1] ATS: Support listening on multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) (owner: 10Vgutierrez) [10:21:18] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:21:26] James_F: I dunno, maybe something is stuck https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update-eqiad . The downstream project can't find an executor [10:22:28] (03PS9) 10Vgutierrez: ATS: Support listening on multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) [10:24:32] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:29:34] 10Operations, 10SRE-tools: More structured cookbooks to reboot hosts - https://phabricator.wikimedia.org/T252807 (10Volans) >>! In T252807#6293205, @MoritzMuehlenhoff wrote: > For posterity, one traceback triggered by some terminal sequence magic in Puppet's "Ruby 2.1 is deprecated" warning: @MoritzMuehlenho... [10:29:58] kostajh: Ah, hmm, yeah, will kick it. [10:30:05] jan_drewniak: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200713T1030). [10:30:37] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Drop support for python3.5 [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610871 (owner: 10JMeybohm) [10:32:23] (03PS3) 10Muehlenhoff: Allow to pass additional settings to the vhost config which are IDP-specific [puppet] - 10https://gerrit.wikimedia.org/r/612166 [10:32:26] (03CR) 10JMeybohm: Check if images are debian based before generating report (031 comment) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/611251 (https://phabricator.wikimedia.org/T251918) (owner: 10JMeybohm) [10:35:33] kostajh: Hmm. It didn't seem to take… [10:35:35] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Drop support for python3.5 (032 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610871 (owner: 10JMeybohm) [10:38:09] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612175 (https://phabricator.wikimedia.org/T128546) [10:38:40] James_F: I see the new for wgStatsdServer now [10:38:45] *new value [10:39:42] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612175 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:39:44] PROBLEM - Work requests waiting in Zuul Gearman server on contint2001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [10:40:17] kostajh: Sorry, yeah, just took a little while. Fixed now? [10:40:23] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612175 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:40:52] (03PS59) 10Jbond: hiera5: upgrade to hiera5 [puppet] - 10https://gerrit.wikimedia.org/r/566559 (https://phabricator.wikimedia.org/T254248) [10:40:55] James_F: still waiting to see the data in Grafana Labs but there might be other issues going on [10:41:01] Ack. [10:41:22] (03CR) 10Jbond: "updated thanks" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/566559 (https://phabricator.wikimedia.org/T254248) (owner: 10Jbond) [10:42:30] (03CR) 10Hnowlan: "> Patch Set 14:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/609808 (https://phabricator.wikimedia.org/T254906) (owner: 10Hnowlan) [10:43:08] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:612175| Bumping portals to master (612175)]] (duration: 00m 56s) [10:43:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:04] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:612175| Bumping portals to master (612175)]] (duration: 00m 56s) [10:44:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:19] (03PS4) 10Muehlenhoff: Allow to pass additional settings to the vhost config [puppet] - 10https://gerrit.wikimedia.org/r/612166 [10:48:15] (03CR) 10Muehlenhoff: [C: 03+2] Allow to pass additional settings to the vhost config [puppet] - 10https://gerrit.wikimedia.org/r/612166 (owner: 10Muehlenhoff) [10:48:49] (03CR) 10Marostegui: bacula: Add ignorelist for long-term broken backups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo) [10:50:31] (03CR) 10Muehlenhoff: [C: 03+2] Remove cas-icinga server alias [puppet] - 10https://gerrit.wikimedia.org/r/611344 (owner: 10Muehlenhoff) [10:58:22] PROBLEM - puppet last run on netmon2001 is CRITICAL: CRITICAL: Puppet last ran 2 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [10:58:41] ^ fixing [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for European mid-day backport window(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200713T1100). [11:00:04] Majavah: A patch you scheduled for European mid-day backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:12] * Majavah here [11:00:12] * Lucas_WMDE waves [11:00:21] I can deploy today! [11:00:23] ah, SandboxLink is back :) [11:00:25] ok! [11:00:53] (03PS2) 10Majavah: Enable SandboxLink extension in trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608799 (https://phabricator.wikimedia.org/T256782) [11:00:55] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Overall LGTM, see two small comments. I still didn't look at the tests, will do so in a bit." (032 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610849 (https://phabricator.wikimedia.org/T257333) (owner: 10JMeybohm) [11:01:04] or pehaps not? Majavah: you have a self CR-1 on the patch [11:01:24] Urbanecm: just removed it, was waiting for the previous train [11:01:31] okay, cool! [11:01:37] (03CR) 10Urbanecm: [C: 03+2] "B&C" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608799 (https://phabricator.wikimedia.org/T256782) (owner: 10Majavah) [11:01:39] yeah, that was from a few backport windows ago [11:02:43] (03Merged) 10jenkins-bot: Enable SandboxLink extension in trwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/608799 (https://phabricator.wikimedia.org/T256782) (owner: 10Majavah) [11:03:42] Majavah: available at mwdebug1001 :) [11:03:50] thank you, testing [11:04:14] RECOVERY - puppet last run on netmon2001 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [11:05:04] Urbanecm: it works [11:05:05] (03PS2) 10Muehlenhoff: Remove cas-icinga from ACME config [puppet] - 10https://gerrit.wikimedia.org/r/611345 [11:05:11] thank you, syncing [11:06:31] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 896c042296b4e1f5d88f786981537655e5d9fea9: Enable SandboxLink extension in trwiki (T256782) (duration: 00m 56s) [11:06:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:37] T256782: Add "SandboxLink" to Turkish Wikipedia - https://phabricator.wikimedia.org/T256782 [11:07:12] Majavah: done! [11:07:21] thank you as always [11:08:41] happy to help! [11:08:45] !log EU B&C done [11:08:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:00] (03CR) 10QChris: [C: 04-1] "On hold for now, as Tgr wants to try improving the current logo." [puppet] - 10https://gerrit.wikimedia.org/r/611914 (https://phabricator.wikimedia.org/T257218) (owner: 10QChris) [11:16:23] 10Operations, 10SRE-Access-Requests, 10Trust-and-Safety: Requesting access to restricted production access and analytics-privatedata-users for Nahid Sultan - https://phabricator.wikimedia.org/T256971 (10JanWMF) [11:21:53] (03PS4) 10RhinosF1: Add NamespaceAliases for kowikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/604515 (https://phabricator.wikimedia.org/T255031) [11:23:22] Urbanecm: I'm mobile but if you can test you can sneak that in as it slipped my calendar [11:25:12] (03PS1) 10ArielGlenn: add alternate capitalization for ariel access to icinga [puppet] - 10https://gerrit.wikimedia.org/r/612181 [11:30:35] (03CR) 10Muehlenhoff: [C: 03+1] "The commit message should *grrr* at Icinga, not LDAP, but LGTM :-)" [puppet] - 10https://gerrit.wikimedia.org/r/612181 (owner: 10ArielGlenn) [11:31:03] (03CR) 10ArielGlenn: [C: 03+2] add alternate capitalization for ariel access to icinga [puppet] - 10https://gerrit.wikimedia.org/r/612181 (owner: 10ArielGlenn) [11:31:26] (03CR) 10Volans: [C: 03+1] "LGTM, one very minor nit, see below." [software/spicerack] - 10https://gerrit.wikimedia.org/r/610282 (https://phabricator.wikimedia.org/T255409) (owner: 10Kormat) [11:41:26] RECOVERY - Host etcd1003 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [11:41:39] (03PS3) 10Muehlenhoff: Switch yarn.wikimedia.org to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) [11:43:48] (03CR) 10MSantos: [C: 03+1] "Ok, cool. Please, ping me when it will happen. I would like to monitor its progress since osm2pgsql is not known for having very informati" [puppet] - 10https://gerrit.wikimedia.org/r/610706 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [11:44:24] !log repool ganeti1007 T244530. Start emptying ganeti1008 [11:44:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:44:30] T244530: upgrade memory in ganeti100[5-8].eqiad.wmnet - https://phabricator.wikimedia.org/T244530 [11:45:56] (03PS1) 10Ema: Revert "ATS: log cacheable cookies from etherpad too" [puppet] - 10https://gerrit.wikimedia.org/r/612266 [11:48:28] (03CR) 10Ema: [C: 03+2] Revert "ATS: log cacheable cookies from etherpad too" [puppet] - 10https://gerrit.wikimedia.org/r/612266 (owner: 10Ema) [11:49:43] !log Password reset for User:Alert5 (T257806) [11:49:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:48] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:52:10] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [11:54:55] (03PS1) 10Ema: cumin: fix prometheus alias [puppet] - 10https://gerrit.wikimedia.org/r/612267 (https://phabricator.wikimedia.org/T243057) [11:57:46] (03PS1) 10Kormat: Revert "es1022: Disable notifications for reimaging" [puppet] - 10https://gerrit.wikimedia.org/r/612319 [11:58:00] (03PS4) 10Muehlenhoff: Switch yarn.wikimedia.org to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) [11:58:40] (03CR) 10Kormat: [C: 03+2] Revert "es1022: Disable notifications for reimaging" [puppet] - 10https://gerrit.wikimedia.org/r/612319 (owner: 10Kormat) [11:59:54] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/612267 (https://phabricator.wikimedia.org/T243057) (owner: 10Ema) [12:05:56] 10Operations, 10Dumps-Generation: Reboot snapshot hosts - https://phabricator.wikimedia.org/T255550 (10ArielGlenn) [12:08:19] !log kormat@cumin1001 dbctl commit (dc=all): 'Start repooling es1022 after reimaging T257284', diff saved to https://phabricator.wikimedia.org/P11873 and previous config saved to /var/cache/conftool/dbconfig/20200713-120818-kormat.json [12:08:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:25] T257284: Upgrade es4 to debian buster + mariadb 10.4 - https://phabricator.wikimedia.org/T257284 [12:08:50] (03PS10) 10Vgutierrez: ATS: Support listening on multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) [12:12:52] (03PS2) 10Marostegui: mariadb: Promote db1093 to s6 master [puppet] - 10https://gerrit.wikimedia.org/r/611964 (https://phabricator.wikimedia.org/T257253) [12:14:11] (03PS1) 10Ema: VTC: add 'public, s-maxage=0' test to 34-pass-set-cookie.vtc [puppet] - 10https://gerrit.wikimedia.org/r/612269 (https://phabricator.wikimedia.org/T256395) [12:16:19] (03CR) 10JMeybohm: Drop support for python3.5 (031 comment) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610871 (owner: 10JMeybohm) [12:17:56] (03CR) 10Vgutierrez: "PCC is happy again: https://puppet-compiler.wmflabs.org/compiler1001/23833/" [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) (owner: 10Vgutierrez) [12:21:15] (03PS5) 10Muehlenhoff: Switch yarn.wikimedia.org to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) [12:29:19] (03PS6) 10Muehlenhoff: Switch yarn.wikimedia.org to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) [12:35:53] (03PS7) 10Muehlenhoff: Switch yarn.wikimedia.org to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) [12:37:02] (03PS7) 10JMeybohm: Add basic chartmuseum library and helm-chartctl CLI [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610849 (https://phabricator.wikimedia.org/T257333) [12:37:04] (03PS3) 10JMeybohm: New package version [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610886 [12:38:01] (03CR) 10JMeybohm: Add basic chartmuseum library and helm-chartctl CLI (032 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610849 (https://phabricator.wikimedia.org/T257333) (owner: 10JMeybohm) [12:38:33] (03CR) 10Volans: [C: 03+1] "ship it!" [puppet] - 10https://gerrit.wikimedia.org/r/612267 (https://phabricator.wikimedia.org/T243057) (owner: 10Ema) [12:39:54] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/23837/" [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) (owner: 10Muehlenhoff) [12:42:04] (03PS8) 10Muehlenhoff: Switch yarn.wikimedia.org to CAS [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) [12:43:45] (03PS4) 10Gehel: Commons: Define entity sources configuration (take 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/609987 (https://phabricator.wikimedia.org/T256906) (owner: 10Addshore) [12:47:19] (03CR) 10Muehlenhoff: "Updated PCC: https://puppet-compiler.wmflabs.org/compiler1001/23838/" [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) (owner: 10Muehlenhoff) [12:51:08] 10Operations, 10SDC General, 10Structured Data Engineering, 10Structured-Data-Backlog, and 3 others: Create CQS puppet configs by applying query_service module - https://phabricator.wikimedia.org/T237089 (10Gehel) 05Openβ†’03Resolved [12:59:11] (03CR) 10Ema: [C: 03+2] VTC: add 'public, s-maxage=0' test to 34-pass-set-cookie.vtc [puppet] - 10https://gerrit.wikimedia.org/r/612269 (https://phabricator.wikimedia.org/T256395) (owner: 10Ema) [12:59:24] (03CR) 10Ema: [C: 03+2] cumin: fix prometheus alias [puppet] - 10https://gerrit.wikimedia.org/r/612267 (https://phabricator.wikimedia.org/T243057) (owner: 10Ema) [13:00:50] (03CR) 10Vgutierrez: [C: 03+1] Remove cas-icinga from ACME config [puppet] - 10https://gerrit.wikimedia.org/r/611345 (owner: 10Muehlenhoff) [13:05:32] !log kormat@cumin1001 dbctl commit (dc=all): 'Fully repool es1022, and set es1020 to zero weight T257284', diff saved to https://phabricator.wikimedia.org/P11878 and previous config saved to /var/cache/conftool/dbconfig/20200713-130532-kormat.json [13:05:35] (03PS1) 10Alexandros Kosiaris: mobileapps: Add a temporary non-TLS release [deployment-charts] - 10https://gerrit.wikimedia.org/r/612273 (https://phabricator.wikimedia.org/T218733) [13:05:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:39] T257284: Upgrade es4 to debian buster + mariadb 10.4 - https://phabricator.wikimedia.org/T257284 [13:06:42] (03CR) 10jerkins-bot: [V: 04-1] mobileapps: Add a temporary non-TLS release [deployment-charts] - 10https://gerrit.wikimedia.org/r/612273 (https://phabricator.wikimedia.org/T218733) (owner: 10Alexandros Kosiaris) [13:06:48] (03PS1) 10Muehlenhoff: Remove lilypond for now [puppet] - 10https://gerrit.wikimedia.org/r/612274 (https://phabricator.wikimedia.org/T257066) [13:08:40] 10Operations, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10MSantos) [13:15:03] (03CR) 10DCausse: [wdqs] overrides default blazegraph ns (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/611373 (owner: 10DCausse) [13:15:33] (03PS4) 10DCausse: [wdqs] overrides default blazegraph ns [puppet] - 10https://gerrit.wikimedia.org/r/611373 [13:18:58] 10Operations, 10DBA, 10Epic, 10User-Kormat: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814 (10Kormat) [13:26:46] 10Operations, 10DBA, 10Epic, 10User-Kormat: Set up replication for zarcillo - https://phabricator.wikimedia.org/T257816 (10Kormat) [13:26:46] 10Operations, 10DBA, 10Epic, 10User-Kormat: Set up replication for zarcillo - https://phabricator.wikimedia.org/T257816 (10Kormat) [13:26:46] 10Operations, 10Pywikibot, 10cloud-services-team (Kanban): http://pywikibot.org/ is displaying Wikimedia error page - https://phabricator.wikimedia.org/T257536 (10siebrand) @Legoktm : https://dnschecker.org/ns-lookup.php?query=pywikibot.org is already showing that this domain has its nameservers at Wikimedia... [13:26:47] 10Operations, 10DBA, 10Epic, 10User-Kormat: Add haproxy config for zarcillo - https://phabricator.wikimedia.org/T257819 (10Kormat) [13:26:48] 10Operations, 10DBA, 10Epic, 10User-Kormat: Add monitoring to ensure consistency between puppet and zarcillo - https://phabricator.wikimedia.org/T257821 (10Kormat) [13:26:48] 10Operations, 10DBA, 10Epic, 10User-Kormat: Set up replication for zarcillo - https://phabricator.wikimedia.org/T257816 (10Marostegui) p:05Triageβ†’03Medium zarcillo was moved to db2093 because db1115 was not stable for a few week, but since: T252331 T231165 and T231182 were solved, we've not had any iss... [13:26:48] 10Operations, 10DBA, 10Epic, 10User-Kormat: Add monitoring to ensure consistency between tendril and zarcillo - https://phabricator.wikimedia.org/T257822 (10Kormat) [13:26:49] 10Operations, 10DBA, 10Epic, 10User-Kormat: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814 (10Marostegui) p:05Triageβ†’03Medium [13:26:49] (03CR) 10CDanis: [C: 03+1] Remove lilypond for now [puppet] - 10https://gerrit.wikimedia.org/r/612274 (https://phabricator.wikimedia.org/T257066) (owner: 10Muehlenhoff) [13:29:50] 10Operations, 10DBA, 10Epic, 10User-Kormat: Add haproxy config for zarcillo - https://phabricator.wikimedia.org/T257819 (10Marostegui) p:05Triageβ†’03Medium There are two options here: 1- Move the database to either m1 or m2. That ensure replication, and a proxy in front of it. 2- Refactor haproxy puppe... [13:33:08] 10Operations, 10RESTBase, 10serviceops, 10RESTBase-architecture, 10Service-Architecture: Use the service proxy in restbase - https://phabricator.wikimedia.org/T255133 (10Joe) 05Openβ†’03Resolved p:05Triageβ†’03High [13:33:11] 10Operations, 10MediaWiki-General, 10serviceops, 10Patch-For-Review, 10Service-Architecture: Create a service-to-service proxy for handling HTTP calls from services to other entities - https://phabricator.wikimedia.org/T244843 (10Joe) [13:33:18] 10Operations, 10DBA, 10Epic, 10User-Kormat: Add monitoring to ensure consistency between tendril and zarcillo - https://phabricator.wikimedia.org/T257822 (10Marostegui) p:05Triageβ†’03Medium [13:33:25] 10Operations, 10DBA, 10Epic, 10User-Kormat: Add monitoring to ensure consistency between puppet and zarcillo - https://phabricator.wikimedia.org/T257821 (10Marostegui) p:05Triageβ†’03Medium [13:35:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1082', diff saved to https://phabricator.wikimedia.org/P11879 and previous config saved to /var/cache/conftool/dbconfig/20200713-133535-marostegui.json [13:35:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:05] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11880 and previous config saved to /var/cache/conftool/dbconfig/20200713-133604-marostegui.json [13:36:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:24] (03CR) 10Alexandros Kosiaris: "The code LGTM, but I am not entirely clear why it's under the docker-report repo." (032 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610849 (https://phabricator.wikimedia.org/T257333) (owner: 10JMeybohm) [13:36:27] 10Operations, 10DBA, 10Epic, 10User-Kormat: Add haproxy config for zarcillo - https://phabricator.wikimedia.org/T257819 (10Marostegui) 05Openβ†’03Stalled [13:36:29] 10Operations, 10DBA, 10Epic, 10User-Kormat: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814 (10Marostegui) [13:37:43] (03CR) 10Muehlenhoff: "@msantos: Sure thing, just ping me on IRC when it works for you and we can go ahead and merge." [puppet] - 10https://gerrit.wikimedia.org/r/610706 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [13:38:36] (03PS5) 10DCausse: [wdqs] overrides default blazegraph ns [puppet] - 10https://gerrit.wikimedia.org/r/611373 [13:39:27] (03CR) 10JMeybohm: Add basic chartmuseum library and helm-chartctl CLI (032 comments) [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610849 (https://phabricator.wikimedia.org/T257333) (owner: 10JMeybohm) [13:39:28] 10Operations, 10DNS, 10Security-Team, 10Traffic: Create dns for security.wikimedia.org - https://phabricator.wikimedia.org/T257831 (10Reedy) [13:39:49] (03CR) 10Ema: [C: 03+1] ATS: Support listening on multiple ports (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) (owner: 10Vgutierrez) [13:40:31] (03PS1) 10Reedy: Add security.wikimedia.org pointing to dyna [dns] - 10https://gerrit.wikimedia.org/r/612278 (https://phabricator.wikimedia.org/T257831) [13:42:05] (03CR) 10Ottomata: "> Do you think that we could move the switft-related eventgate-analytics events to main? Does it make sense or maybe we should keep what w" [puppet] - 10https://gerrit.wikimedia.org/r/611168 (owner: 10Elukey) [13:45:12] (03CR) 10JMeybohm: "> The code LGTM, but I am not entirely clear why it's under the docker-report repo." [docker-images/docker-report] - 10https://gerrit.wikimedia.org/r/610849 (https://phabricator.wikimedia.org/T257333) (owner: 10JMeybohm) [13:47:29] (03PS1) 10Reedy: Add security.wikimedia.org microsite [puppet] - 10https://gerrit.wikimedia.org/r/612279 [13:48:49] (03CR) 10jerkins-bot: [V: 04-1] Add security.wikimedia.org microsite [puppet] - 10https://gerrit.wikimedia.org/r/612279 (owner: 10Reedy) [13:50:04] (03PS2) 10Reedy: Add security.wikimedia.org microsite [puppet] - 10https://gerrit.wikimedia.org/r/612279 (https://phabricator.wikimedia.org/T257834) [13:51:43] 10Operations, 10SRE-tools: More structured cookbooks to reboot hosts - https://phabricator.wikimedia.org/T252807 (10MoritzMuehlenhoff) >>! In T252807#6300353, @Volans wrote: > Do we need to fix it or was a specific odd case with a very old version of ruby? I think we can safely ignore it, this is only an issu... [13:52:51] 10Operations, 10SRE-tools: More structured cookbooks to reboot hosts - https://phabricator.wikimedia.org/T252807 (10Volans) Ack, thanks! :) [13:56:02] (03PS3) 10Muehlenhoff: Remove cas-icinga from ACME config [puppet] - 10https://gerrit.wikimedia.org/r/611345 [13:59:09] (03PS3) 10Reedy: Add security.wikimedia.org microsite [puppet] - 10https://gerrit.wikimedia.org/r/612279 (https://phabricator.wikimedia.org/T257834) [14:01:04] RECOVERY - Check systemd state on deneb is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:01:51] (03CR) 10Reedy: [C: 04-1] "-1 for now, as the git repo doesn't exist, so definitely no point having this merged just yet" [puppet] - 10https://gerrit.wikimedia.org/r/612279 (https://phabricator.wikimedia.org/T257834) (owner: 10Reedy) [14:03:42] (03CR) 10Muehlenhoff: [C: 03+2] Remove cas-icinga from ACME config [puppet] - 10https://gerrit.wikimedia.org/r/611345 (owner: 10Muehlenhoff) [14:06:19] (03CR) 10Vgutierrez: [C: 03+1] Add security.wikimedia.org pointing to dyna [dns] - 10https://gerrit.wikimedia.org/r/612278 (https://phabricator.wikimedia.org/T257831) (owner: 10Reedy) [14:08:17] (03CR) 10Muehlenhoff: [C: 03+2] Remove lilypond for now [puppet] - 10https://gerrit.wikimedia.org/r/612274 (https://phabricator.wikimedia.org/T257066) (owner: 10Muehlenhoff) [14:17:01] !log removing lilypond from production T257066 [14:17:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:06] T257066: Extension:Score / Lilypond is disabled on all wikis - https://phabricator.wikimedia.org/T257066 [14:17:31] 10Operations, 10DNS, 10Security-Team, 10Traffic, 10Patch-For-Review: Create dns for security.wikimedia.org - https://phabricator.wikimedia.org/T257831 (10Reedy) a:03Reedy [14:18:33] 10Operations, 10Traffic: planet.wm.org missing from planet.discovery.wmnet Subject Alternative Name - https://phabricator.wikimedia.org/T257840 (10ema) [14:19:07] (03PS9) 10Alexandros Kosiaris: Add discovery records for chartmuseum [puppet] - 10https://gerrit.wikimedia.org/r/609403 [14:20:18] (03CR) 10jerkins-bot: [V: 04-1] Add discovery records for chartmuseum [puppet] - 10https://gerrit.wikimedia.org/r/609403 (owner: 10Alexandros Kosiaris) [14:20:57] (03PS1) 10Ema: ATS: send 'SSL connection failed' errors to logstash [puppet] - 10https://gerrit.wikimedia.org/r/612282 (https://phabricator.wikimedia.org/T257840) [14:30:18] (03PS10) 10Alexandros Kosiaris: Add discovery records for chartmuseum [puppet] - 10https://gerrit.wikimedia.org/r/609403 [14:34:24] (03CR) 10Muehlenhoff: [C: 03+2] No longer install osm2pgsql from stretch-backports [puppet] - 10https://gerrit.wikimedia.org/r/610706 (https://phabricator.wikimedia.org/T256877) (owner: 10Muehlenhoff) [14:37:40] (03PS1) 10Jforrester: TimedMediaHandler: Make videojs the only player on all group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612347 (https://phabricator.wikimedia.org/T248418) [14:37:42] (03PS1) 10Jforrester: TimedMediaHandler: Make videojs the only player on all group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612348 (https://phabricator.wikimedia.org/T248418) [14:37:44] (03PS1) 10Jforrester: TimedMediaHandler: Make videojs the only player everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612349 (https://phabricator.wikimedia.org/T248418) [14:37:46] (03PS1) 10Jforrester: TimedMediaHandler: Drop Beta Feature, no longer usable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612350 (https://phabricator.wikimedia.org/T248418) [14:37:48] (03PS1) 10Jforrester: TimedMediaHandler: Don't read wmgTmhWebPlayer [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612351 (https://phabricator.wikimedia.org/T248418) [14:37:50] (03PS1) 10Jforrester: TimedMediaHandler: Drop pre-switch config, no longer read [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612352 (https://phabricator.wikimedia.org/T248418) [14:39:39] (03PS1) 10Privacybatm: transfer.py: Comment on setup_logger function [software/transferpy] - 10https://gerrit.wikimedia.org/r/612354 (https://phabricator.wikimedia.org/T257600) [14:41:26] (03PS1) 10Esanders: Remove DiscussionToolsEnableVisual [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612355 [14:42:05] (03PS2) 10Muehlenhoff: lxc: Remove jessie compat code [puppet] - 10https://gerrit.wikimedia.org/r/610707 [14:42:30] 10Operations, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10MoritzMuehlenhoff) [14:44:03] (03PS2) 10Privacybatm: transfer.py: Comment on setup_logger function [software/transferpy] - 10https://gerrit.wikimedia.org/r/612354 (https://phabricator.wikimedia.org/T257600) [14:44:24] (03PS1) 10Andrew Bogott: eqiad1 neutron: move to galera-hosted database [puppet] - 10https://gerrit.wikimedia.org/r/612356 (https://phabricator.wikimedia.org/T256656) [14:44:37] (03PS1) 10Kormat: es1021: Mark as candidate master for es4 [puppet] - 10https://gerrit.wikimedia.org/r/612357 (https://phabricator.wikimedia.org/T257284) [14:45:46] jouncebot: next [14:45:46] In 2 hour(s) and 14 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200713T1700) [14:46:17] (03CR) 10Jforrester: [C: 03+2] Remove DiscussionToolsEnableVisual [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612355 (owner: 10Esanders) [14:47:21] (03Merged) 10jenkins-bot: Remove DiscussionToolsEnableVisual [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612355 (owner: 10Esanders) [14:48:14] (03CR) 10Marostegui: "would you mind doing it for codfw too?" [puppet] - 10https://gerrit.wikimedia.org/r/612357 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [14:48:56] 10Operations, 10Pywikibot, 10cloud-services-team (Kanban): http://pywikibot.org/ is displaying Wikimedia error page - https://phabricator.wikimedia.org/T257536 (10siebrand) Right. I was a little rusty and the managment console and account names had changed since I last used it. But I regained access, and I h... [14:50:40] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Stop setting DiscussionToolsEnableVisual, default value (duration: 00m 57s) [14:50:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:14] RECOVERY - Work requests waiting in Zuul Gearman server on contint2001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [14:51:31] (03PS5) 10Muehlenhoff: Remove IDP defintions for logstash vhosts [puppet] - 10https://gerrit.wikimedia.org/r/607509 (https://phabricator.wikimedia.org/T246998) [14:54:04] (03CR) 10Kormat: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/612357 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [14:57:23] (03CR) 10Marostegui: "Will you send the codfw yaml in this patch or in a separate one?" [puppet] - 10https://gerrit.wikimedia.org/r/612357 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [14:57:41] (03PS2) 10Kormat: es1021: Mark as candidate master for es4 [puppet] - 10https://gerrit.wikimedia.org/r/612357 (https://phabricator.wikimedia.org/T257284) [14:57:44] marostegui: goddamnit. 😊 [14:57:51] hahaha [14:57:53] (i forgot to push) [14:58:00] (you are welcome) [14:58:28] (03CR) 10Marostegui: [C: 03+1] es1021: Mark as candidate master for es4 [puppet] - 10https://gerrit.wikimedia.org/r/612357 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [14:58:31] i should have just replied "yes" [14:58:51] (03CR) 10Kormat: [C: 03+2] es1021: Mark as candidate master for es4 [puppet] - 10https://gerrit.wikimedia.org/r/612357 (https://phabricator.wikimedia.org/T257284) (owner: 10Kormat) [14:59:40] (03PS11) 10Vgutierrez: ATS: Support listening on multiple ports [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) [14:59:50] (03CR) 10Vgutierrez: ATS: Support listening on multiple ports (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/610821 (https://phabricator.wikimedia.org/T254235) (owner: 10Vgutierrez) [15:02:19] (03PS1) 10Marostegui: es1023,es2024: Make them candidate master [puppet] - 10https://gerrit.wikimedia.org/r/612360 [15:07:23] (03CR) 10Kormat: [C: 03+1] es1023,es2024: Make them candidate master [puppet] - 10https://gerrit.wikimedia.org/r/612360 (owner: 10Marostegui) [15:08:46] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/612143 (https://phabricator.wikimedia.org/T159584) (owner: 10Muehlenhoff) [15:11:55] (03CR) 10Marostegui: [C: 03+2] es1023,es2024: Make them candidate master [puppet] - 10https://gerrit.wikimedia.org/r/612360 (owner: 10Marostegui) [15:14:47] (03CR) 10Andrew Bogott: [C: 03+2] eqiad1 neutron: move to galera-hosted database [puppet] - 10https://gerrit.wikimedia.org/r/612356 (https://phabricator.wikimedia.org/T256656) (owner: 10Andrew Bogott) [15:17:38] (03PS1) 10Mholloway: Proton: Update to 2020-07-13-150542-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/612363 [15:25:56] (03CR) 10Mholloway: [C: 03+2] Proton: Update to 2020-07-13-150542-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/612363 (owner: 10Mholloway) [15:27:23] (03Merged) 10jenkins-bot: Proton: Update to 2020-07-13-150542-production [deployment-charts] - 10https://gerrit.wikimedia.org/r/612363 (owner: 10Mholloway) [15:30:08] !log mholloway-shell@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' . [15:30:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:40] (03PS4) 10Jcrespo: bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) [15:33:40] (03PS1) 10Jbond: profile::idp (cloud): correct hiera key [puppet] - 10https://gerrit.wikimedia.org/r/612368 [15:34:11] (03CR) 10jerkins-bot: [V: 04-1] bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo) [15:35:00] !log mholloway-shell@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' . [15:35:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:18] (03CR) 10Jbond: [C: 03+2] profile::idp (cloud): correct hiera key [puppet] - 10https://gerrit.wikimedia.org/r/612368 (owner: 10Jbond) [15:37:42] (03PS5) 10Jcrespo: bacula: Add ignorelist for long-term broken backups [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) [15:39:55] !log mholloway-shell@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' . [15:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:27] (03CR) 10Jcrespo: "I've implemented the comment addition. I will think better the monitoring, as this may be something we want to implement for something mor" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612167 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo) [15:42:05] (03CR) 10Lucas Werkmeister (WMDE): "no longer DNM, I think" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (https://phabricator.wikimedia.org/T257435) (owner: 10Lucas Werkmeister (WMDE)) [15:48:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11881 and previous config saved to /var/cache/conftool/dbconfig/20200713-154847-marostegui.json [15:48:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:51:30] (03CR) 10Ladsgroup: [C: 03+2] "noop for production" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (https://phabricator.wikimedia.org/T257435) (owner: 10Lucas Werkmeister (WMDE)) [15:52:41] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P11882 and previous config saved to /var/cache/conftool/dbconfig/20200713-155240-marostegui.json [15:52:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:00] (03Merged) 10jenkins-bot: Load WikibaseClient using extension registration in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/610265 (https://phabricator.wikimedia.org/T257435) (owner: 10Lucas Werkmeister (WMDE)) [15:54:28] looks fine in mwdebug1001, synicing [15:56:24] (03PS1) 10Jbond: hieradata - cloud: add missing heira defaults [puppet] - 10https://gerrit.wikimedia.org/r/612375 [15:56:55] !log ladsgroup@deploy1001 Synchronized wmf-config/Wikibase.php: [[gerrit:610265|Load WikibaseClient using extension registration in beta (T257435)]] (duration: 00m 55s) [15:56:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:59] T257435: Load Client on Beta using extension registration - https://phabricator.wikimedia.org/T257435 [15:57:00] (03CR) 10Jbond: [C: 03+2] hieradata - cloud: add missing heira defaults [puppet] - 10https://gerrit.wikimedia.org/r/612375 (owner: 10Jbond) [16:09:03] (03PS1) 10Ryan Kemper: Scale largest shards to be closer to 30GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612377 (https://phabricator.wikimedia.org/T256928) [16:10:16] (03PS1) 10Andrew Bogott: eqiad1 designate: move to galera db host [puppet] - 10https://gerrit.wikimedia.org/r/612378 (https://phabricator.wikimedia.org/T242455) [16:17:12] !log aborrero@cumin1001 START - Cookbook sre.hosts.downtime [16:17:14] !log aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [16:17:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:17:47] (03PS2) 10Andrew Bogott: eqiad1 designate: move to galera db host [puppet] - 10https://gerrit.wikimedia.org/r/612378 (https://phabricator.wikimedia.org/T242455) [16:19:04] (03PS2) 10Alexandros Kosiaris: mobileapps: Add a temporary non-TLS release [deployment-charts] - 10https://gerrit.wikimedia.org/r/612273 (https://phabricator.wikimedia.org/T218733) [16:21:12] (03CR) 10DCausse: [C: 03+1] Scale largest shards to be closer to 30GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612377 (https://phabricator.wikimedia.org/T256928) (owner: 10Ryan Kemper) [16:29:11] 10Operations, 10ORES, 10Scoring-platform-team, 10Release Pipeline (Blubber): Build blubber file for ORES - https://phabricator.wikimedia.org/T210268 (10Halfak) [16:31:34] (03CR) 10Andrew Bogott: [C: 03+2] eqiad1 designate: move to galera db host [puppet] - 10https://gerrit.wikimedia.org/r/612378 (https://phabricator.wikimedia.org/T242455) (owner: 10Andrew Bogott) [16:36:41] 10Operations, 10Patch-For-Review: Handle sunset of stretch-backports - https://phabricator.wikimedia.org/T256877 (10aborrero) >>! In T256877#6289542, @MoritzMuehlenhoff wrote: >>>! In T256877#6284764, @aborrero wrote: >> @MoritzMuehlenhoff I'm now thinking this is going to happen with every single debian relea... [16:50:28] PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:51:34] (03PS1) 10Privacybatm: transfer.py: Add unit tests [software/transferpy] - 10https://gerrit.wikimedia.org/r/612384 (https://phabricator.wikimedia.org/T257600) [17:00:04] gehel and onimisionipe: #bothumor I οΏ½ Unicode. All rise for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200713T1700). [17:03:43] cdanis beat me by 10 minutes [17:05:17] (03CR) 10Dzahn: [C: 03+1] production-m2.sql: Remove gerrit related grants [puppet] - 10https://gerrit.wikimedia.org/r/611963 (https://phabricator.wikimedia.org/T255715) (owner: 10Marostegui) [17:05:52] (03CR) 10Dzahn: [C: 03+1] "yep, confirmed i deployed the change that removed the firewall holes" [puppet] - 10https://gerrit.wikimedia.org/r/611963 (https://phabricator.wikimedia.org/T255715) (owner: 10Marostegui) [17:06:10] (03PS2) 10Dzahn: Add ary language [dns] - 10https://gerrit.wikimedia.org/r/611426 (https://phabricator.wikimedia.org/T257674) (owner: 10Urbanecm) [17:07:01] (03CR) 10Dzahn: [C: 03+2] "This is Moroccan Arabic, approved by langcom." [dns] - 10https://gerrit.wikimedia.org/r/611426 (https://phabricator.wikimedia.org/T257674) (owner: 10Urbanecm) [17:07:34] (03CR) 10Dzahn: [C: 03+2] "https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Moroccan" [dns] - 10https://gerrit.wikimedia.org/r/611426 (https://phabricator.wikimedia.org/T257674) (owner: 10Urbanecm) [17:15:29] (03PS5) 10Dzahn: httpd/simplelamp2: add parameter to not purge manual config [puppet] - 10https://gerrit.wikimedia.org/r/597052 (https://phabricator.wikimedia.org/T169368) [17:16:48] (03PS1) 10Arturo Borrero Gonzalez: openstack: neutron: add NRPE plugin to check nf_conntrack status [puppet] - 10https://gerrit.wikimedia.org/r/612390 (https://phabricator.wikimedia.org/T257552) [17:17:14] (03CR) 10jerkins-bot: [V: 04-1] openstack: neutron: add NRPE plugin to check nf_conntrack status [puppet] - 10https://gerrit.wikimedia.org/r/612390 (https://phabricator.wikimedia.org/T257552) (owner: 10Arturo Borrero Gonzalez) [17:18:35] (03PS2) 10Arturo Borrero Gonzalez: openstack: neutron: add NRPE plugin to check nf_conntrack status [puppet] - 10https://gerrit.wikimedia.org/r/612390 (https://phabricator.wikimedia.org/T257552) [17:19:10] 10Operations, 10observability, 10Patch-For-Review: Leverage Grafana annotations to show events in graphs - https://phabricator.wikimedia.org/T222826 (10colewhite) [17:24:09] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10RobH) [17:25:26] (03PS2) 10Hnowlan: changeprop: remove changeprop from puppet [puppet] - 10https://gerrit.wikimedia.org/r/603534 (https://phabricator.wikimedia.org/T220399) [17:25:48] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10RobH) a:05RobHβ†’03None I failed to unassign this from myself, as it shouldn't be mine any longer. I also neglected to link it into its parent ordering task, which I'v... [17:32:16] (03PS18) 10Privacybatm: transferpy: Generate checksum parallel to the data transfer [software/transferpy] - 10https://gerrit.wikimedia.org/r/605851 (https://phabricator.wikimedia.org/T254979) [17:32:36] 10Operations, 10ops-eqiad: Interface errors on asw2-b-eqiad:ge-5/0/35 (kubernetes1010) - https://phabricator.wikimedia.org/T257542 (10Cmjohnson) what does this mean? can I get some clarification on the issue please. [17:33:58] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:39:30] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:41:02] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-02) rack/setup/install 3 lightweight hadoop nodes - https://phabricator.wikimedia.org/T255518 (10Cmjohnson) @Jclark-ctr I went to setup idrac's today and notice that you have an-worker's in the rack space that you said is for an-master or an-coord. Can y... [17:44:30] (03PS1) 10C. Scott Ananian: Don't use the 'zeroconf' configuration for VisualEditor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612392 (https://phabricator.wikimedia.org/T248343) [17:45:54] 10Operations, 10Release-Engineering-Team, 10Scoring-platform-team (Current): Production shell access for Chris Albon - https://phabricator.wikimedia.org/T256412 (10Nuria) Hello, @calbon cannot ssh to stat1007.eqiad.wmnet , i think tehse changes have not gone in yet. Also user will be needing kerberos creden... [17:48:40] (03PS4) 10DCausse: Set proper language code for some wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607235 (https://phabricator.wikimedia.org/T250810) [17:50:34] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:50:48] PROBLEM - Disk space on webperf1002 is CRITICAL: DISK CRITICAL - free space: /srv 11465 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=webperf1002&var-datasource=eqiad+prometheus/ops [17:52:26] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:53:41] 10Operations, 10ops-eqiad, 10DC-Ops: (Due By: 2020-07-17) rack/setup/install - https://phabricator.wikimedia.org/T255520 (10Jclark-ctr) @Cmjohnson >>! In T255520#6234640, @elukey wrote: > @RobH we ordered 6 nodes IIRC, 3 with more storage space and 3 called "lightweight", which ba... [17:58:36] (03CR) 10Ppchelko: [C: 03+1] changeprop: remove changeprop from puppet [puppet] - 10https://gerrit.wikimedia.org/r/603534 (https://phabricator.wikimedia.org/T220399) (owner: 10Hnowlan) [18:00:05] RoanKattouw, Niharika, and Urbanecm: My dear minions, it's time we take the moon! Just kidding. Time for Morning backport window(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200713T1800). [18:00:05] Tks4Fish, ryankemper, and dcausse: A patch you scheduled for Morning backport window(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:34] you had me at "free sticker" [18:01:24] o/ [18:02:51] I can SWAT [18:03:09] I can deploy, I mean [18:05:07] πŸ₯“ [18:05:22] ryankemper: I'll deploy your patch first Tks4Fish does not appear to be around yet [18:05:39] I can πŸ₯“ [18:05:48] Thanks [18:06:00] Also what does BACON stand for? I gather it's backport something-or-other? [18:07:10] (03CR) 10DCausse: [C: 03+2] "Deploying" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612377 (https://phabricator.wikimedia.org/T256928) (owner: 10Ryan Kemper) [18:07:46] >for BAckports and CONfigs [18:07:58] (03Merged) 10jenkins-bot: Scale largest shards to be closer to 30GB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612377 (https://phabricator.wikimedia.org/T256928) (owner: 10Ryan Kemper) [18:08:28] ah, thanks Reedy [18:10:43] Gerrit uploads slow for anyone else? [18:10:53] been trying several times to upload a simple config patch, keeps timing out [18:11:40] ryankemper: your change is live on mwdebug1002, by using WMF-Debug extension you could test the effect of your change [18:11:49] https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug#Browser_extensions <- for the extension [18:14:31] but there's no URL to actually see the impact of your change so let's just deploy [18:15:59] dcausse: Thanks. Since these config changes just govern Elasticsearch index sharding, there's nothing for me to test at this point. I think we're clear to proceed [18:16:10] Yeah [18:18:39] !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T256928: Scale largest shards to be closer to 30GB (duration: 00m 56s) [18:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:18:47] T256928: Tune Elasticsearch Shard counts - https://phabricator.wikimedia.org/T256928 [18:19:02] ryankemper: ^ should be live [18:19:03] (03PS1) 10Krinkle: Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612395 (https://phabricator.wikimedia.org/T256095) [18:19:05] (03PS1) 10Krinkle: Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612396 (https://phabricator.wikimedia.org/T256095) [18:19:24] thanks for rolling the deploy [18:20:24] (03CR) 10DCausse: [C: 03+2] "Deploying" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607235 (https://phabricator.wikimedia.org/T250810) (owner: 10DCausse) [18:21:50] (03PS5) 10DCausse: Set proper language code for some wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607235 (https://phabricator.wikimedia.org/T250810) [18:22:01] (03CR) 10DCausse: [C: 03+2] "Deploying" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607235 (https://phabricator.wikimedia.org/T250810) (owner: 10DCausse) [18:22:51] (03Merged) 10jenkins-bot: Set proper language code for some wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/607235 (https://phabricator.wikimedia.org/T250810) (owner: 10DCausse) [18:26:39] !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T250810: Set proper language code for some wikis (duration: 00m 56s) [18:26:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:45] T250810: CirrusSearch throws an error on several wikis when searching for "intitle:/regex/" - https://phabricator.wikimedia.org/T250810 [18:26:54] Tks4Fish are you around? [18:27:31] dont|panic: ^ [18:27:41] let's ping another nick [18:29:48] I could deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/611937 but I'd need someone to test, but looking at the patch we need a sysop on elwiki [18:30:07] no it doesn't need a sysop ther [18:30:09] e [18:30:17] Majavah: can you test? [18:30:17] it can be verified via Special:ListGroupRights [18:30:20] sure [18:30:25] * Majavah grabs laptop [18:30:27] thanks! [18:30:55] (03PS2) 10DCausse: Add rollbacker to elwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/611937 (https://phabricator.wikimedia.org/T257745) (owner: 10Tks4Fish) [18:31:45] please advise when it's on mwdebug1001 [18:31:56] sure [18:32:28] PROBLEM - Disk space on webperf1002 is CRITICAL: DISK CRITICAL - free space: /srv 11349 MB (3% inode=99%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=webperf1002&var-datasource=eqiad+prometheus/ops [18:33:00] (03CR) 10DCausse: [C: 03+2] "Deploying" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/611937 (https://phabricator.wikimedia.org/T257745) (owner: 10Tks4Fish) [18:33:50] (03Merged) 10jenkins-bot: Add rollbacker to elwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/611937 (https://phabricator.wikimedia.org/T257745) (owner: 10Tks4Fish) [18:35:41] (03CR) 10Jforrester: [C: 03+1] Don't use the 'zeroconf' configuration for VisualEditor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612392 (https://phabricator.wikimedia.org/T248343) (owner: 10C. Scott Ananian) [18:36:36] Majavah: it's live on mwdebug1001, please let me know if it works as expected [18:37:27] dcausse: appears to be working [18:37:37] Majavah: thanks, deploying [18:40:26] !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T257745: Add rollbacker to elwiki (duration: 00m 56s) [18:40:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:40:33] T257745: Create 'rollbacker' user group for elwiki - https://phabricator.wikimedia.org/T257745 [18:41:22] Majavah: it's live, thanks for checking the patch! [18:41:40] happy to help [18:43:53] !log BACON done [18:43:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:51] I'll sling out a quick patch. [18:48:02] (03PS2) 10Jforrester: Don't use the 'zeroconf' configuration for VisualEditor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612392 (https://phabricator.wikimedia.org/T248343) (owner: 10C. Scott Ananian) [18:50:55] (03CR) 10Jforrester: [C: 03+2] Don't use the 'zeroconf' configuration for VisualEditor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612392 (https://phabricator.wikimedia.org/T248343) (owner: 10C. Scott Ananian) [18:52:00] (03Merged) 10jenkins-bot: Don't use the 'zeroconf' configuration for VisualEditor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612392 (https://phabricator.wikimedia.org/T248343) (owner: 10C. Scott Ananian) [18:53:25] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T248343 Don't use the 'zeroconf' configuration for VisualEditor (duration: 00m 55s) [18:53:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:30] T248343: Uncached VisualEditor w/ Parsoid/PHP (no JS, no RESTBase) for MW 1.35 LTS - https://phabricator.wikimedia.org/T248343 [18:59:33] Crap, I forgot [18:59:48] I saved you dont|panic :P [18:59:51] Thanks Majavah and dcausse [18:59:54] 10Operations, 10Parsoid, 10VisualEditor, 10wikitech.wikimedia.org: VisualEditor was removed from Wikitech because Parsoid/PHP isn't yet compatible with how Wikitech is set up - https://phabricator.wikimedia.org/T241961 (10Jdforrester-WMF) [19:00:40] 10Operations, 10Parsoid, 10VisualEditor, 10wikitech.wikimedia.org: VisualEditor was removed from Wikitech because Parsoid/PHP isn't yet compatible with how Wikitech is set up - https://phabricator.wikimedia.org/T241961 (10Jdforrester-WMF) This is //probably// something that will Just Workβ„’ with wmf.41 and... [19:01:11] (03PS1) 10CDanis: replace playbook links w/ shortened URLs [puppet] - 10https://gerrit.wikimedia.org/r/612399 [19:01:40] (03PS2) 10Andrew Bogott: Remove redundant profile::openstack::eqiad1::openstack_controllers settings [puppet] - 10https://gerrit.wikimedia.org/r/610914 [19:02:12] (03CR) 10Andrew Bogott: [C: 03+2] Remove redundant profile::openstack::eqiad1::openstack_controllers settings [puppet] - 10https://gerrit.wikimedia.org/r/610914 (owner: 10Andrew Bogott) [19:03:06] (03PS3) 10Andrew Bogott: Use designateclient in ensure functions [puppet] - 10https://gerrit.wikimedia.org/r/522196 (https://phabricator.wikimedia.org/T227785) [19:03:56] (03CR) 10Dzahn: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler1002/23843/releases1001.eqiad.wmnet/change.releases1001.eqiad.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/610404 (owner: 10Dzahn) [19:06:02] (03PS4) 10Dzahn: releases: move more common code out of the mediawiki class [puppet] - 10https://gerrit.wikimedia.org/r/610404 [19:07:07] (03CR) 10Andrew Bogott: [C: 03+2] Use designateclient in ensure functions [puppet] - 10https://gerrit.wikimedia.org/r/522196 (https://phabricator.wikimedia.org/T227785) (owner: 10Andrew Bogott) [19:14:16] (03CR) 10Dzahn: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler1003/23844/releases1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/610404 (owner: 10Dzahn) [19:15:47] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:18:37] (03PS5) 10Dzahn: releases: move more common code out of the mediawiki class [puppet] - 10https://gerrit.wikimedia.org/r/610404 [19:20:07] (03CR) 10Krinkle: [C: 03+2] Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612395 (https://phabricator.wikimedia.org/T256095) (owner: 10Krinkle) [19:21:18] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:23:26] 10Operations, 10Parsoid, 10VisualEditor, 10wikitech.wikimedia.org: VisualEditor was removed from Wikitech because Parsoid/PHP isn't yet compatible with how Wikitech is set up - https://phabricator.wikimedia.org/T241961 (10cscott) [19:25:26] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/23845/releases1001.eqiad.wmnet/index.html" [puppet] - 10https://gerrit.wikimedia.org/r/610404 (owner: 10Dzahn) [19:25:57] 10Operations, 10Parsoid, 10VisualEditor, 10wikitech.wikimedia.org: VisualEditor was removed from Wikitech because Parsoid/PHP isn't yet compatible with how Wikitech is set up - https://phabricator.wikimedia.org/T241961 (10cscott) @Jdforrester-WMF Let's manually install/load the Parsoid extension and explic... [19:28:44] (03PS2) 10Krinkle: Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612395 (https://phabricator.wikimedia.org/T256095) [19:28:48] (03CR) 10Krinkle: [V: 03+2 C: 03+2] Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 3) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612395 (https://phabricator.wikimedia.org/T256095) (owner: 10Krinkle) [19:33:12] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: I1a12124f1811e9a (duration: 00m 57s) [19:33:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:33:36] (03PS1) 10Andrew Bogott: Use designateclient in ensure functions [puppet] - 10https://gerrit.wikimedia.org/r/612404 (https://phabricator.wikimedia.org/T227785) [19:34:33] (03CR) 10Andrew Bogott: [C: 03+2] Use designateclient in ensure functions [puppet] - 10https://gerrit.wikimedia.org/r/612404 (https://phabricator.wikimedia.org/T227785) (owner: 10Andrew Bogott) [19:35:13] (03PS2) 10Krinkle: Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612396 (https://phabricator.wikimedia.org/T256095) [19:36:52] PROBLEM - termbox codfw on termbox.svc.codfw.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [19:37:03] (03PS2) 10Dzahn: releases: switch reprepro file sync to support multiple destinations [puppet] - 10https://gerrit.wikimedia.org/r/610405 (https://phabricator.wikimedia.org/T247652) [19:38:07] PROBLEM - termbox eqiad on termbox.svc.eqiad.wmnet is CRITICAL: /termbox (get rendered termbox) is CRITICAL: Test get rendered termbox returned the unexpected status 500 (expecting: 200) https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service [19:39:28] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [19:39:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:05] !log milimetric@deploy1001 Started deploy [analytics/refinery@de0a1f1]: Regular analytics weekly train [analytics/refinery@de0a1f1] [19:41:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:08] (03Abandoned) 10Krinkle: arclamp: Add support for sample_pop option to arclamp-log [puppet] - 10https://gerrit.wikimedia.org/r/606789 (https://phabricator.wikimedia.org/T255920) (owner: 10Krinkle) [19:47:46] !log milimetric@deploy1001 Finished deploy [analytics/refinery@de0a1f1]: Regular analytics weekly train [analytics/refinery@de0a1f1] (duration: 06m 41s) [19:47:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:57] !log milimetric@deploy1001 Started deploy [analytics/refinery@de0a1f1] (thin): Regular analytics weekly train THIN [analytics/refinery@de0a1f1] [19:48:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:04] !log milimetric@deploy1001 Finished deploy [analytics/refinery@de0a1f1] (thin): Regular analytics weekly train THIN [analytics/refinery@de0a1f1] (duration: 00m 07s) [19:48:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:51:44] (03PS1) 10MSantos: update proton beta instance for restbase [puppet] - 10https://gerrit.wikimedia.org/r/612406 (https://phabricator.wikimedia.org/T256795) [19:53:09] (03CR) 10Mholloway: [C: 03+1] update proton beta instance for restbase [puppet] - 10https://gerrit.wikimedia.org/r/612406 (https://phabricator.wikimedia.org/T256795) (owner: 10MSantos) [19:55:40] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/23846/" [puppet] - 10https://gerrit.wikimedia.org/r/610405 (https://phabricator.wikimedia.org/T247652) (owner: 10Dzahn) [20:00:04] halfak and accraze: My dear minions, it's time we take the moon! Just kidding. Time for Services – Graphoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200713T2000). [20:00:26] 10Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: TY pages in a subdomain of wikipedia and set hid banner cookie - https://phabricator.wikimedia.org/T251780 (10DStrine) [20:02:47] 10Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: TY pages in a subdomain of wikipedia and set hid banner cookie - https://phabricator.wikimedia.org/T251780 (10DStrine) For the record: we are moving ahead with the TY page that is accessible from a wikipedi.... [20:03:03] !log rsynced reprepro data from releases1001 to releases1002, releases2002 [20:03:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:03:09] 10Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: TY pages in a subdomain of wikipedia and set hid banner cookie - https://phabricator.wikimedia.org/T251780 (10mepps) As I understand this: - Confirm img tag is working as expected - Create new subdomain dona... [20:03:18] (03PS2) 10Dzahn: releases: sync MediaWiki security patches to multiple servers [puppet] - 10https://gerrit.wikimedia.org/r/610406 [20:04:51] 10Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: TY pages in a subdomain of wikipedia and set hide banner cookie - https://phabricator.wikimedia.org/T251780 (10DStrine) [20:12:29] !log ebernhardson@deploy1001 Started deploy [wikimedia/discovery/analytics@1edde21]: airflow: ship_to_es: Implement multi-index understanding [20:12:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:58] !log ebernhardson@deploy1001 Finished deploy [wikimedia/discovery/analytics@1edde21]: airflow: ship_to_es: Implement multi-index understanding (duration: 00m 29s) [20:13:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:07] (03CR) 10Dzahn: [C: 04-1] "duplicate declaration ... https://puppet-compiler.wmflabs.org/compiler1001/23847/releases1001.eqiad.wmnet/change.releases1001.eqiad.wmnet." [puppet] - 10https://gerrit.wikimedia.org/r/610406 (owner: 10Dzahn) [20:15:38] (03PS3) 10Dzahn: releases: sync MediaWiki security patches to multiple servers [puppet] - 10https://gerrit.wikimedia.org/r/610406 [20:28:24] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: PDU Upgrade in C8 (July 14, 2pm-4pm UTC)) - https://phabricator.wikimedia.org/T257871 (10wiki_willy) [20:30:11] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By:TBD) rack/setup/install rows C and D new PDUs - https://phabricator.wikimedia.org/T253694 (10wiki_willy) [20:30:12] 10Operations, 10ops-eqiad, 10DC-Ops: eqiad: PDU Upgrade in C8 (July 14, 2pm-4pm UTC)) - https://phabricator.wikimedia.org/T257871 (10wiki_willy) [20:41:47] (03Abandoned) 10Ottomata: Initial debian commit [debs/anaconda] (debian) - 10https://gerrit.wikimedia.org/r/594204 (https://phabricator.wikimedia.org/T251006) (owner: 10Ottomata) [20:41:49] (03CR) 10Ottomata: "Original review here:" [debs/anaconda-wmf] (debian) - 10https://gerrit.wikimedia.org/r/610880 (https://phabricator.wikimedia.org/T251006) (owner: 10Ottomata) [20:59:05] (03Abandoned) 10Andrew Bogott: neutron: enable l3_agent_only_dmz_cidr_hack in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/585031 (https://phabricator.wikimedia.org/T247505) (owner: 10Andrew Bogott) [21:00:04] Reedy and sbassett: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200713T2100). [21:03:19] 10Operations, 10MediaWiki-General, 10Performance-Team, 10serviceops-radar, and 3 others: Move MainStash out of Redis to a simpler multi-dc aware solution - https://phabricator.wikimedia.org/T212129 (10Krinkle) [21:03:32] 10Operations, 10MediaWiki-General, 10Performance-Team, 10serviceops-radar, and 3 others: Move MainStash out of Redis to a simpler multi-dc aware solution - https://phabricator.wikimedia.org/T212129 (10Krinkle) OK. Penciling in for Q2. [21:11:20] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/23848/" [puppet] - 10https://gerrit.wikimedia.org/r/610406 (owner: 10Dzahn) [21:25:16] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:26:20] (03PS1) 10Jdlrobson: Drop main page special casing on all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612423 (https://phabricator.wikimedia.org/T32405) [21:27:06] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [21:29:01] (03PS2) 10Jdlrobson: Drop main page special casing on all projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612423 (https://phabricator.wikimedia.org/T32405) [21:30:48] (03PS2) 10Ahmon Dancy: Allow aptly::repo commands to run as alternate user [puppet] - 10https://gerrit.wikimedia.org/r/611457 (https://phabricator.wikimedia.org/T250157) [21:51:21] 10Operations, 10Traffic, 10Patch-For-Review: planet.wm.org missing from planet.discovery.wmnet Subject Alternative Name - https://phabricator.wikimedia.org/T257840 (10Dzahn) a:03Dzahn [21:53:35] 10Operations, 10Traffic, 10Patch-For-Review: planet.wm.org missing from planet.discovery.wmnet Subject Alternative Name - https://phabricator.wikimedia.org/T257840 (10Dzahn) It was never expected that planet without a language prefix exists. It's just that i recently added the "blank" domain to config bec... [21:56:18] (03CR) 10Krinkle: [C: 03+2] Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612396 (https://phabricator.wikimedia.org/T256095) (owner: 10Krinkle) [21:57:08] (03Merged) 10jenkins-bot: Enable wgForceHTTPS and wgCookieSameSite='None' (Phase 4) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/612396 (https://phabricator.wikimedia.org/T256095) (owner: 10Krinkle) [22:02:08] * Krinkle staging on mwdebug1002 [22:27:50] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: I80ca62643f5c (duration: 00m 58s) [22:27:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:31:42] PROBLEM - wikifeeds codfw on wikifeeds.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/featured/{year}/{month}/{day} (retrieve title of the featured article for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Wikifeeds [22:31:52] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job={swagger_check_restbase_cluster_codfw,swagger_check_restbase_cluster_eqiad,swagger_check_restbase_esams,swagger_check_wikifeeds_codfw,swagger_check_wikifeeds_eqiad} site={codfw,eqiad,esams} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:31:54] PROBLEM - restbase endpoints health on restbase1025 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:31:57] PROBLEM - restbase endpoints health on restbase2022 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:02] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [22:32:10] PROBLEM - PyBal backends health check on lvs2010 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw2255.codfw.wmnet, mw2234.codfw.wmnet, mw2271.codfw.wmnet, mw2301.codfw.wmnet, mw2236.codfw.wmnet, mw2227.codfw.wmnet, mw2197.codfw.wmnet, mw2371.codfw.wmnet, mw2331.codfw.wmnet, mw2276.codfw.wmnet, mw2196.codfw.wmnet, mw2313.codfw.wmnet, mw2312.codfw.wmnet, mw2256.codfw.wmnet, mw2229.codfw.wmnet, mw2310.codfw.wmnet, mw2 [22:32:10] mw2232.codfw.wmnet, mw2231.codfw.wmnet, mw2233.codfw.wmnet, mw2325.codfw.wmnet, mw2314.codfw.wmnet, mw2275.codfw.wmnet, mw2242.codfw.wmnet, mw2194.codfw.wmnet, mw2193.codfw.wmnet, mw2257.codfw.wmnet, mw2195.codfw.wmnet, mw2369.codfw.wmnet, mw2199.codfw.wmnet, mw2365.codfw.wmnet, mw2361.codfw.wmnet, mw2315.codfw.wmnet, mw2363.codfw.wmnet, mw2192.codfw.wmnet, mw2373.codfw.wmnet, mw2270.codfw.wmnet, mw2359.codfw.wmnet, mw2241.codfw. [22:32:10] fw.wmnet, mw2198.codfw.wmnet, mw2272.codfw.wmnet, mw2307.codfw.wmnet, mw2268.codfw.wmnet, mw2226.codfw.wmnet, mw2333.codfw.wmnet, mw2240.codfw.wmnet, mw2303.codfw.wmnet, mw2258.codfw.wm https://wikitech.wikimedia.org/wiki/PyBal [22:32:12] PROBLEM - wikifeeds eqiad on wikifeeds.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/featured/{year}/{month}/{day} (retrieve title of the featured article for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Wikifeeds [22:32:14] PROBLEM - restbase endpoints health on restbase1020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:14] PROBLEM - restbase endpoints health on restbase2021 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:18] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:18] PROBLEM - restbase endpoints health on restbase2018 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:27] PROBLEM - restbase endpoints health on restbase2023 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:28] PROBLEM - restbase endpoints health on restbase1024 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:28] PROBLEM - restbase endpoints health on restbase2014 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:30] PROBLEM - restbase endpoints health on restbase2012 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:32] PROBLEM - restbase endpoints health on restbase1021 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:32] PROBLEM - restbase endpoints health on restbase1022 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:32] PROBLEM - restbase endpoints health on restbase1018 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:32] PROBLEM - restbase endpoints health on restbase1027 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:32] PROBLEM - restbase endpoints health on restbase2019 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:32] PROBLEM - restbase endpoints health on restbase2015 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:32] PROBLEM - restbase endpoints health on restbase2011 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:38] PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw1371.eqiad.wmnet, mw1365.eqiad.wmnet, mw1267.eqiad.wmnet, mw1355.eqiad.wmnet, mw1333.eqiad.wmnet, mw1323.eqiad.wmnet, mw1384.eqiad.wmnet, mw1322.eqiad.wmnet, mw1327.eqiad.wmnet, mw1328.eqiad.wmnet, mw1413.eqiad.wmnet, mw1405.eqiad.wmnet, mw1351.eqiad.wmnet, mw1270.eqiad.wmnet, mw1391.eqiad.wmnet, mw1329.eqiad.wmnet, mw1 [22:32:38] mw1352.eqiad.wmnet, mw1266.eqiad.wmnet, mw1326.eqiad.wmnet, mw1268.eqiad.wmnet, mw1319.eqiad.wmnet, mw1393.eqiad.wmnet, mw1349.eqiad.wmnet, mw1324.eqiad.wmnet, mw1350.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1331.eqiad.wmnet, mw1271.eqiad.wmnet, mw1321.eqiad.wmnet, mw1269.eqiad.wmnet, mw1403.eqiad.wmnet, mw1325.eqiad.wmnet, mw1274.eqiad.wmnet, mw1373.eqiad.wmnet, mw1411.eqiad.wmnet, mw1369.eqiad.wmnet, mw1387.eqiad. [22:32:38] ad.wmnet, mw1368.eqiad.wmnet, mw1409.eqiad.wmnet, mw1273.eqiad.wmnet, mw1354.eqiad.wmnet, mw1385.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [22:32:42] PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw1351.eqiad.wmnet, mw1273.eqiad.wmnet, mw1371.eqiad.wmnet, mw1365.eqiad.wmnet, mw1367.eqiad.wmnet, mw1267.eqiad.wmnet, mw1322.eqiad.wmnet, mw1319.eqiad.wmnet, mw1349.eqiad.wmnet, mw1384.eqiad.wmnet, mw1327.eqiad.wmnet, mw1328.eqiad.wmnet, mw1413.eqiad.wmnet, mw1364.eqiad.wmnet, mw1272.eqiad.wmnet, mw1270.eqiad.wmnet, mw1 [22:32:42] mw1329.eqiad.wmnet, mw1269.eqiad.wmnet, mw1352.eqiad.wmnet, mw1266.eqiad.wmnet, mw1326.eqiad.wmnet, mw1321.eqiad.wmnet, mw1355.eqiad.wmnet, mw1393.eqiad.wmnet, mw1324.eqiad.wmnet, mw1350.eqiad.wmnet, mw1370.eqiad.wmnet, mw1389.eqiad.wmnet, mw1331.eqiad.wmnet, mw1271.eqiad.wmnet, mw1268.eqiad.wmnet, mw1403.eqiad.wmnet, mw1325.eqiad.wmnet, mw1373.eqiad.wmnet, mw1411.eqiad.wmnet, mw1369.eqiad.wmnet, mw1387.eqiad.wmnet, mw1353.eqiad. [22:32:42] ad.wmnet, mw1409.eqiad.wmnet, mw1333.eqiad.wmnet, mw1354.eqiad.wmnet, mw1385.eqiad.wmnet, mw1330.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal [22:32:46] PROBLEM - restbase endpoints health on restbase1017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:50] PROBLEM - restbase endpoints health on restbase2017 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:54] PROBLEM - PyBal backends health check on lvs2009 is CRITICAL: PYBAL CRITICAL - CRITICAL - apaches_80: Servers mw2255.codfw.wmnet, mw2234.codfw.wmnet, mw2271.codfw.wmnet, mw2301.codfw.wmnet, mw2236.codfw.wmnet, mw2227.codfw.wmnet, mw2197.codfw.wmnet, mw2371.codfw.wmnet, mw2331.codfw.wmnet, mw2276.codfw.wmnet, mw2196.codfw.wmnet, mw2313.codfw.wmnet, mw2312.codfw.wmnet, mw2256.codfw.wmnet, mw2229.codfw.wmnet, mw2310.codfw.wmnet, mw2 [22:32:54] mw2232.codfw.wmnet, mw2231.codfw.wmnet, mw2233.codfw.wmnet, mw2325.codfw.wmnet, mw2314.codfw.wmnet, mw2275.codfw.wmnet, mw2242.codfw.wmnet, mw2194.codfw.wmnet, mw2193.codfw.wmnet, mw2257.codfw.wmnet, mw2195.codfw.wmnet, mw2369.codfw.wmnet, mw2199.codfw.wmnet, mw2365.codfw.wmnet, mw2361.codfw.wmnet, mw2315.codfw.wmnet, mw2363.codfw.wmnet, mw2192.codfw.wmnet, mw2373.codfw.wmnet, mw2270.codfw.wmnet, mw2359.codfw.wmnet, mw2241.codfw. [22:32:54] fw.wmnet, mw2198.codfw.wmnet, mw2272.codfw.wmnet, mw2307.codfw.wmnet, mw2268.codfw.wmnet, mw2226.codfw.wmnet, mw2333.codfw.wmnet, mw2240.codfw.wmnet, mw2303.codfw.wmnet, mw2258.codfw.wm https://wikitech.wikimedia.org/wiki/PyBal [22:32:56] PROBLEM - restbase endpoints health on restbase1023 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:56] PROBLEM - restbase endpoints health on restbase1019 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:56] PROBLEM - restbase endpoints health on restbase1016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:56] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [22:32:56] PROBLEM - restbase endpoints health on restbase2020 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:32:56] PROBLEM - restbase endpoints health on restbase2010 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:33:17] PROBLEM - restbase endpoints health on restbase2016 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:33:18] PROBLEM - restbase endpoints health on restbase1026 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:33:18] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:33:18] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:33:22] PROBLEM - restbase endpoints health on restbase2013 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [22:34:50] PROBLEM - Restbase edge eqiad on text-lb.eqiad.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [22:35:26] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [22:35:37] PROBLEM - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [22:38:14] PROBLEM - Restbase edge codfw on text-lb.codfw.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [22:39:32] PROBLEM - Restbase edge ulsfo on text-lb.ulsfo.wikimedia.org is CRITICAL: /api/rest_v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [22:42:50] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:44:15] (03Abandoned) 10QChris: Revert "gerrit: Add Code Review logo as favicon" [puppet] - 10https://gerrit.wikimedia.org/r/611914 (https://phabricator.wikimedia.org/T257218) (owner: 10QChris) [22:46:32] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [23:00:05] RoanKattouw, Niharika, and Urbanecm: Your horoscope predicts another unfortunate Evening backport window(Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200713T2300). [23:06:56] !log releases* delete /usr/local/sbin/sync-* scripts created by rsync::quickdatacopy and let puppet recreate the ones still needed [23:06:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:16:05] (03PS1) 10Andrew Bogott: wmcs domain proxy: add a fallthrough redirect for unknown .wmflabs.org domains [puppet] - 10https://gerrit.wikimedia.org/r/612442 (https://phabricator.wikimedia.org/T256276) [23:19:25] Krinkle: are you still around? [23:20:00] (03CR) 10BryanDavis: wmcs domain proxy: add a fallthrough redirect for unknown .wmflabs.org domains (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/612442 (https://phabricator.wikimedia.org/T256276) (owner: 10Andrew Bogott) [23:20:20] shortly after your deploy a bunch of appservers are failing healthchecks [23:21:14] (03PS2) 10Andrew Bogott: wmcs domain proxy: add a fallthrough redirect for unknown .wmflabs.org domains [puppet] - 10https://gerrit.wikimedia.org/r/612442 (https://phabricator.wikimedia.org/T256276) [23:23:55] (03CR) 10BryanDavis: [C: 03+1] "Untested, but the code looks correct to my eye." [puppet] - 10https://gerrit.wikimedia.org/r/612442 (https://phabricator.wikimedia.org/T256276) (owner: 10Andrew Bogott) [23:24:15] (03PS1) 10Dzahn: releases: pull MW security patches from deployment server on all servers [puppet] - 10https://gerrit.wikimedia.org/r/612445 [23:25:45] (03CR) 10jerkins-bot: [V: 04-1] releases: pull MW security patches from deployment server on all servers [puppet] - 10https://gerrit.wikimedia.org/r/612445 (owner: 10Dzahn) [23:28:01] cdanis: I am [23:29:36] I can't tell *why* Pybal's health check is failing [23:29:45] the HTTP fetch succeeds when I try it [23:29:59] it's also only a subset of appservers [23:30:22] (03PS2) 10Dzahn: releases: pull MW security patches from deployment server on all servers [puppet] - 10https://gerrit.wikimedia.org/r/612445