[00:24:33] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:38:52] (03PS7) 10Andrew Bogott: Keystone/newton: install python-keystone [puppet] - 10https://gerrit.wikimedia.org/r/538445 [00:38:54] (03PS1) 10Andrew Bogott: glance/newton: move to the HA-proxied listen ports [puppet] - 10https://gerrit.wikimedia.org/r/538457 [00:40:19] (03CR) 10Andrew Bogott: "Apologies if you already have a patch waiting to do this :)" [puppet] - 10https://gerrit.wikimedia.org/r/538457 (owner: 10Andrew Bogott) [00:40:22] (03CR) 10Andrew Bogott: [C: 03+2] glance/newton: move to the HA-proxied listen ports [puppet] - 10https://gerrit.wikimedia.org/r/538457 (owner: 10Andrew Bogott) [00:45:43] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:19:17] 10Operations, 10DBA: db1075 (s3 master) crashed - https://phabricator.wikimedia.org/T233534 (10wiki_willy) [02:20:10] 10Operations, 10ops-eqdfw, 10DBA: db1075 (s3 master) crashed - https://phabricator.wikimedia.org/T233534 (10wiki_willy) a:03Cmjohnson [02:22:33] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 65326936 and 3 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:25:45] RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 289872 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [03:07:45] (03CR) 10Jhedden: [C: 03+1] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/538457 (owner: 10Andrew Bogott) [04:18:45] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:00:24] 10Operations, 10ops-eqdfw, 10DBA: db1075 (s3 master) crashed - https://phabricator.wikimedia.org/T233534 (10Marostegui) [05:00:27] 10Operations, 10DBA, 10Patch-For-Review: Switchover s3 primary database master db1075 -> db1078 - 24th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230783 (10Marostegui) [05:06:32] 10Operations, 10ops-eqiad, 10DBA: db1075 (s3 master) crashed - https://phabricator.wikimedia.org/T233534 (10Marostegui) p:05Triage→03High [05:10:43] 10Operations, 10DBA: Batch db1074-db1079 hosts having BBU issues - https://phabricator.wikimedia.org/T233569 (10Marostegui) [05:11:13] 10Operations, 10DBA: Batch db1074-db1079 hosts having BBU issues - https://phabricator.wikimedia.org/T233569 (10Marostegui) [05:11:16] 10Operations, 10ops-eqiad, 10DBA: db1075 (s3 master) crashed - https://phabricator.wikimedia.org/T233534 (10Marostegui) [05:13:06] 10Operations, 10DBA, 10Patch-For-Review: Switchover s3 primary database master db1075 -> db1078 - 24th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230783 (10Marostegui) db1075 (the current master) crashed yesterday with BBU issues {T233534}. db1078 is also part of the same batch of hosts that have h... [05:15:16] 10Operations, 10ops-eqiad, 10DBA: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Marostegui) [05:16:12] 10Operations, 10DBA: Batch db1074-db1079 hosts having BBU issues - https://phabricator.wikimedia.org/T233569 (10Marostegui) p:05Triage→03Normal [05:18:58] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1066 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538459 (https://phabricator.wikimedia.org/T233071) [05:19:50] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Remove db1066 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538459 (https://phabricator.wikimedia.org/T233071) (owner: 10Marostegui) [05:20:43] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1066 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538459 (https://phabricator.wikimedia.org/T233071) (owner: 10Marostegui) [05:20:59] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1066 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538459 (https://phabricator.wikimedia.org/T233071) (owner: 10Marostegui) [05:21:15] (03CR) 10Giuseppe Lavagetto: "IMHO, this won't remove HHVM completely. I'd rather reimage the servers from scratch once the transition is completed" [puppet] - 10https://gerrit.wikimedia.org/r/538108 (https://phabricator.wikimedia.org/T229792) (owner: 10Dzahn) [05:21:50] 10Operations, 10DBA: Batch db1074-db1079 hosts having BBU issues - https://phabricator.wikimedia.org/T233569 (10Marostegui) [05:22:15] 10Operations, 10DBA, 10Patch-For-Review: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 (10Marostegui) [05:22:26] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db1066 from config T233071 (duration: 01m 15s) [05:22:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:22:30] T233071: Decommission db1066.eqiad.wmnet - https://phabricator.wikimedia.org/T233071 [05:23:06] 10Operations, 10ops-eqiad, 10DBA, 10Wikimedia-Incident: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Marostegui) [05:23:30] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db1066 from config T233071 (duration: 00m 56s) [05:23:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:20] (03Abandoned) 10Elukey: role::analytics_cluster::cordinator: add mysqldump/bacula backups [puppet] - 10https://gerrit.wikimedia.org/r/536177 (https://phabricator.wikimedia.org/T231208) (owner: 10Elukey) [06:07:43] (03PS1) 10Ammarpad: Disallow indexing discussion and user pages on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538462 (https://phabricator.wikimedia.org/T233562) [06:15:15] (03CR) 10Vgutierrez: [C: 03+2] ATS: Unmask trafficserver.service iff it's actually being used [puppet] - 10https://gerrit.wikimedia.org/r/537611 (owner: 10Vgutierrez) [06:16:00] (03PS4) 10Vgutierrez: ATS: Unmask trafficserver.service iff it's actually being used [puppet] - 10https://gerrit.wikimedia.org/r/537611 [06:22:59] 10Operations, 10DBA, 10Wikimedia-Incident: Batch db1074-db1079 hosts having BBU issues - https://phabricator.wikimedia.org/T233569 (10Marostegui) [06:24:21] (03PS3) 10Giuseppe Lavagetto: Add envoy image with TLS termination. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/399640 [06:28:43] (03CR) 10Vgutierrez: [C: 03+2] ATS: Provide HTTPS check [puppet] - 10https://gerrit.wikimedia.org/r/538231 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez) [06:28:58] (03PS4) 10Vgutierrez: ATS: Provide HTTPS check [puppet] - 10https://gerrit.wikimedia.org/r/538231 (https://phabricator.wikimedia.org/T231433) [06:29:22] 10Operations, 10ops-eqiad, 10DBA, 10Wikimedia-Incident: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Marostegui) I am starting to write the Incident Report: https://wikitech.wikimedia.org/wiki/Incident_documentation/20190923-s3_primary_db_master_crashed_-_s3_wikis... [06:37:10] 10Operations, 10DBA, 10Patch-For-Review: Switchover s3 primary database master db1075 -> db1078 - 24th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230783 (10Marostegui) db1123 (current recentchanges, logpager etc) s3 slave is in D8, so thus not affected by the PDU maintenance, so maybe we should fai... [06:42:00] (03PS4) 10Vgutierrez: cache: Deploy ats-tls in the text cluster [puppet] - 10https://gerrit.wikimedia.org/r/537993 (https://phabricator.wikimedia.org/T231627) [06:42:35] (03PS2) 10Marostegui: wmnet: Update s3-master alias to point to db1123 [dns] - 10https://gerrit.wikimedia.org/r/538004 (https://phabricator.wikimedia.org/T230783) [06:43:53] 10Operations, 10DBA, 10Patch-For-Review: Switchover s3 primary database master db1075 -> db1123 - 24th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230783 (10Marostegui) [06:46:54] (03PS1) 10Marostegui: db1123: Change binlog format to STATEMENT [puppet] - 10https://gerrit.wikimedia.org/r/538470 (https://phabricator.wikimedia.org/T230783) [06:49:03] (03CR) 10Marostegui: [C: 03+2] db1123: Change binlog format to STATEMENT [puppet] - 10https://gerrit.wikimedia.org/r/538470 (https://phabricator.wikimedia.org/T230783) (owner: 10Marostegui) [06:50:56] !log marostegui@cumin1001 dbctl commit (dc=all): 'Change db1123 and db1078 roles, db1078 will serve logpager and recentchanges, db1123 will just serve general traffic', diff saved to https://phabricator.wikimedia.org/P9144 and previous config saved to /var/cache/conftool/dbconfig/20190923-065056-marostegui.json [06:50:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:32] (03CR) 10Vgutierrez: [C: 03+2] cache: Deploy ats-tls in the text cluster [puppet] - 10https://gerrit.wikimedia.org/r/537993 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [06:59:43] (03PS5) 10Vgutierrez: cache: Deploy ats-tls in the text cluster [puppet] - 10https://gerrit.wikimedia.org/r/537993 (https://phabricator.wikimedia.org/T231627) [07:03:37] (03PS3) 10Muehlenhoff: Add component for OpenJDK 8 forwardport for Buster [puppet] - 10https://gerrit.wikimedia.org/r/538274 [07:06:29] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1123 to change binlog format T230783', diff saved to https://phabricator.wikimedia.org/P9145 and previous config saved to /var/cache/conftool/dbconfig/20190923-070628-marostegui.json [07:06:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:33] T230783: Switchover s3 primary database master db1075 -> db1123 - 24th Sept @05:00 UTC - https://phabricator.wikimedia.org/T230783 [07:08:10] !log Stop MySQL on db1123 to reboot to change binlog format and kernel - T230783 [07:08:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:21] (03CR) 10Muehlenhoff: [C: 03+2] Add component for OpenJDK 8 forwardport for Buster [puppet] - 10https://gerrit.wikimedia.org/r/538274 (owner: 10Muehlenhoff) [07:10:23] (03PS1) 10Vgutierrez: ATS: Move libhwloc5 pin to trafficserver class [puppet] - 10https://gerrit.wikimedia.org/r/538471 (https://phabricator.wikimedia.org/T231627) [07:10:27] (03PS2) 10Muehlenhoff: cas: Disable gauth for now [puppet] - 10https://gerrit.wikimedia.org/r/538260 [07:11:31] (03PS2) 10Vgutierrez: ATS: Move libhwloc5 pin to trafficserver class [puppet] - 10https://gerrit.wikimedia.org/r/538471 (https://phabricator.wikimedia.org/T231627) [07:12:23] (03CR) 10Muehlenhoff: [C: 03+2] cas: Disable gauth for now [puppet] - 10https://gerrit.wikimedia.org/r/538260 (owner: 10Muehlenhoff) [07:13:36] (03CR) 10Vgutierrez: [C: 03+2] "pcc is happy and shows a NOOP on upload cluster: https://puppet-compiler.wmflabs.org/compiler1001/18482/" [puppet] - 10https://gerrit.wikimedia.org/r/538471 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [07:14:14] (03PS3) 10Vgutierrez: ATS: Move libhwloc5 pin to trafficserver class [puppet] - 10https://gerrit.wikimedia.org/r/538471 (https://phabricator.wikimedia.org/T231627) [07:15:38] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9146 and previous config saved to /var/cache/conftool/dbconfig/20190923-071537-marostegui.json [07:15:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:17:45] (03Abandoned) 10Marostegui: mariadb: Promote db1078 to s3 master [puppet] - 10https://gerrit.wikimedia.org/r/538003 (https://phabricator.wikimedia.org/T230783) (owner: 10Marostegui) [07:18:09] (03PS2) 10Muehlenhoff: cas: Add service ID for debmonitor [puppet] - 10https://gerrit.wikimedia.org/r/538263 [07:20:04] 10Operations, 10serviceops, 10Beta-Cluster-reproducible, 10User-Joe: Update confd package - https://phabricator.wikimedia.org/T147204 (10Joe) p:05Low→03Normal a:03Joe [07:20:12] PROBLEM - Ensure trafficserver_exporter is running for instance tls on cp5007 is CRITICAL: NRPE: Command check_trafficserver_exporter_tls not defined https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [07:20:15] (03CR) 10Muehlenhoff: [C: 03+2] cas: Add service ID for debmonitor [puppet] - 10https://gerrit.wikimedia.org/r/538263 (owner: 10Muehlenhoff) [07:21:04] ^^that's me, kinda expected [07:24:14] PROBLEM - Freshness of OCSP Stapling files -ATS-TLS- on cp5007 is CRITICAL: NRPE: Command check_trafficserver_tls_ocsp_freshness not defined https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates [07:25:45] (03PS1) 10Marostegui: mariadb: Promote db1123 to s3 master [puppet] - 10https://gerrit.wikimedia.org/r/538522 (https://phabricator.wikimedia.org/T230783) [07:25:50] RECOVERY - Ensure trafficserver_exporter is running for instance tls on cp5007 is OK: PROCS OK: 1 process with args /usr/bin/python3 /usr/bin/prometheus-trafficserver-exporter --no-procstats --no-ssl-verification --endpoint https://127.0.0.1:8443/_stats --port 9322 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [07:26:34] RECOVERY - Freshness of OCSP Stapling files -ATS-TLS- on cp5007 is OK: OK https://wikitech.wikimedia.org/wiki/HTTPS/Unified_Certificates [07:30:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'More traffic to db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9147 and previous config saved to /var/cache/conftool/dbconfig/20190923-073044-marostegui.json [07:30:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:21] 10Operations, 10DBA, 10Patch-For-Review, 10Wikimedia-Incident: Batch db1074-db1079 hosts having BBU issues - https://phabricator.wikimedia.org/T233569 (10Marostegui) [07:40:29] !log swift eqiad-prod: continue ms-be1027 decom - T233289 [07:40:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:33] T233289: Unable to power on ms-be1027 - https://phabricator.wikimedia.org/T233289 [07:41:23] (03CR) 10Muehlenhoff: "I agree, the transition is big enough to warrant that (there's also quite some collateral changes like tmpreaper getting dropped and we ha" [puppet] - 10https://gerrit.wikimedia.org/r/538108 (https://phabricator.wikimedia.org/T229792) (owner: 10Dzahn) [07:41:33] !log swift run swiftrepl without deletes eqiad -> codfw [07:41:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:52] 10Operations, 10ops-codfw, 10DBA: db2127 memory issues - https://phabricator.wikimedia.org/T233184 (10Marostegui) 05Open→03Resolved a:03Papaul HW logs look clean, closing this! Thanks @Papaul for catching this! [07:44:46] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:46:10] \o/ [07:46:17] :D [07:46:46] nothing better than fresh analytics alarms on Monday morning [07:48:58] watching icinga.wm.o/alerts can be scary sometimes [07:52:56] 10Operations, 10Traffic: puppet restarts nginx instead of reloading it on ncredir servers - https://phabricator.wikimedia.org/T233518 (10Vgutierrez) p:05Triage→03Normal [07:54:51] (03PS1) 10Filippo Giunchedi: swift: point Prometheus alerts to the their site [puppet] - 10https://gerrit.wikimedia.org/r/538563 [07:57:46] (03PS1) 10Effie Mouzeli: Convert 50% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) [07:58:14] (03CR) 10Filippo Giunchedi: [C: 03+2] swift: point Prometheus alerts to the their site [puppet] - 10https://gerrit.wikimedia.org/r/538563 (owner: 10Filippo Giunchedi) [08:00:19] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Cleanup: also remove those values from the hiera files for mw1347 and mw1348" [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [08:00:19] PROBLEM - Check systemd state on cp2016 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:00:25] PROBLEM - Check systemd state on cp2023 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:01:05] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:01:27] hmm checking [08:02:57] RECOVERY - Check systemd state on cp2016 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:04:25] RECOVERY - Check systemd state on cp2023 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:06:05] 10Operations, 10ops-eqiad, 10DC-Ops: hw troubleshooting: Memory correctable errors -EDAC- for elastic1029.eqiad.wmnet - https://phabricator.wikimedia.org/T233578 (10Mathew.onipe) [08:06:05] (03PS2) 10Muehlenhoff: Add access for the Icinga replication check [puppet] - 10https://gerrit.wikimedia.org/r/537635 [08:07:45] (03PS3) 10Muehlenhoff: Add access for the Icinga replication check [puppet] - 10https://gerrit.wikimedia.org/r/537635 [08:08:02] 10Operations, 10ops-eqiad, 10DC-Ops: hw troubleshooting: Memory correctable errors -EDAC- for elastic1029.eqiad.wmnet - https://phabricator.wikimedia.org/T233578 (10Mathew.onipe) p:05Triage→03Normal [08:10:01] (03CR) 10Muehlenhoff: [C: 03+2] Add access for the Icinga replication check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/537635 (owner: 10Muehlenhoff) [08:11:41] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:12:07] PROBLEM - Check systemd state on cp1083 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:14:55] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add envoy image with TLS termination. (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/399640 (owner: 10Giuseppe Lavagetto) [08:18:35] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:20:33] (03PS2) 10Marostegui: mariadb: Promote db1123 to s3 master [puppet] - 10https://gerrit.wikimedia.org/r/538522 (https://phabricator.wikimedia.org/T230783) [08:20:40] (03CR) 10Marostegui: [C: 04-2] "Wait for failover day" [puppet] - 10https://gerrit.wikimedia.org/r/538522 (https://phabricator.wikimedia.org/T230783) (owner: 10Marostegui) [08:21:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9148 and previous config saved to /var/cache/conftool/dbconfig/20190923-082119-marostegui.json [08:21:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:47] RECOVERY - Check systemd state on cp1083 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:23:45] (03CR) 10Filippo Giunchedi: [C: 03+2] facilities: add phase monitoring for single phase PDUs [puppet] - 10https://gerrit.wikimedia.org/r/538161 (https://phabricator.wikimedia.org/T229101) (owner: 10Filippo Giunchedi) [08:23:52] (03PS2) 10Filippo Giunchedi: facilities: add phase monitoring for single phase PDUs [puppet] - 10https://gerrit.wikimedia.org/r/538161 (https://phabricator.wikimedia.org/T229101) [08:24:16] !log elukey@deploy1001 Started deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes [08:24:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:25:31] (03PS2) 10Effie Mouzeli: Convert 50% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) [08:27:31] (03PS1) 10Vgutierrez: acme_chief: Allow specifying a custom resource to get notified on cert updates [puppet] - 10https://gerrit.wikimedia.org/r/538567 (https://phabricator.wikimedia.org/T233518) [08:27:33] (03PS1) 10Vgutierrez: ncredir: Notify Exec[nginx-reload] instead of Service[nginx] on TLS material changes [puppet] - 10https://gerrit.wikimedia.org/r/538568 (https://phabricator.wikimedia.org/T233518) [08:28:50] (03CR) 10Effie Mouzeli: "> Cleanup: also remove those values from the hiera files for mw1347" [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [08:30:21] (03CR) 10jerkins-bot: [V: 04-1] ncredir: Notify Exec[nginx-reload] instead of Service[nginx] on TLS material changes [puppet] - 10https://gerrit.wikimedia.org/r/538568 (https://phabricator.wikimedia.org/T233518) (owner: 10Vgutierrez) [08:31:42] !log elukey@deploy1001 Finished deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes (duration: 07m 26s) [08:31:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:32:00] (03PS2) 10Vgutierrez: ncredir: Notify Exec[nginx-reload] instead of Service[nginx] on cert changes [puppet] - 10https://gerrit.wikimedia.org/r/538568 (https://phabricator.wikimedia.org/T233518) [08:35:35] (03PS2) 10Vgutierrez: acme_chief: Allow specifying a custom resource to get notified on cert updates [puppet] - 10https://gerrit.wikimedia.org/r/538567 (https://phabricator.wikimedia.org/T233518) [08:35:37] (03PS3) 10Vgutierrez: ncredir: Notify Exec[nginx-reload] instead of Service[nginx] on cert changes [puppet] - 10https://gerrit.wikimedia.org/r/538568 (https://phabricator.wikimedia.org/T233518) [08:41:40] 10Operations, 10DC-Ops, 10observability, 10Patch-For-Review, 10User-fgiunchedi: Phase monitoring for new PDUs - https://phabricator.wikimedia.org/T229101 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi This is completed, we're monitoring the single phase in ulsfo now with same settings as codfw/eqiad! [08:44:07] (03CR) 10Vgutierrez: [C: 03+2] acme_chief: Allow specifying a custom resource to get notified on cert updates [puppet] - 10https://gerrit.wikimedia.org/r/538567 (https://phabricator.wikimedia.org/T233518) (owner: 10Vgutierrez) [08:44:16] (03PS3) 10Vgutierrez: acme_chief: Allow specifying a custom resource to get notified on cert updates [puppet] - 10https://gerrit.wikimedia.org/r/538567 (https://phabricator.wikimedia.org/T233518) [08:48:59] (03CR) 10Vgutierrez: [C: 03+2] ncredir: Notify Exec[nginx-reload] instead of Service[nginx] on cert changes [puppet] - 10https://gerrit.wikimedia.org/r/538568 (https://phabricator.wikimedia.org/T233518) (owner: 10Vgutierrez) [08:49:02] (03PS1) 10Effie Mouzeli: Convert 25% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) [08:49:11] (03PS4) 10Vgutierrez: ncredir: Notify Exec[nginx-reload] instead of Service[nginx] on cert changes [puppet] - 10https://gerrit.wikimedia.org/r/538568 (https://phabricator.wikimedia.org/T233518) [08:49:36] (03PS2) 10Effie Mouzeli: Convert 25% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) [08:50:12] (03PS3) 10Effie Mouzeli: Convert 50% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) [08:50:37] (03PS1) 10Mathew.onipe: query_service: rename wdqs module to query_service [puppet] - 10https://gerrit.wikimedia.org/r/538572 (https://phabricator.wikimedia.org/T232297) [08:50:55] (03CR) 10Alexandros Kosiaris: "> I agree with that. Correct me if I am wrong, this patch seemed to be in the right direction (changing file permissions) although it will" [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo) [08:51:18] (03CR) 10Effie Mouzeli: "we already have mw1270" [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [08:51:35] (03CR) 10jerkins-bot: [V: 04-1] query_service: rename wdqs module to query_service [puppet] - 10https://gerrit.wikimedia.org/r/538572 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe) [08:51:51] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Convert 50% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [08:54:02] Hey people. Something weird going on fermium? Messages get relayed with quite some delay today. cc mutante [08:57:21] (03PS1) 10Vgutierrez: acme_chief: Simplify notification handling [puppet] - 10https://gerrit.wikimedia.org/r/538573 (https://phabricator.wikimedia.org/T233518) [08:57:47] (03PS3) 10Jcrespo: backups: Change file owner of bacula storage daemon config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) [08:58:09] (03PS2) 10Vgutierrez: acme_chief: Simplify notification handling on acme_chief::cert [puppet] - 10https://gerrit.wikimedia.org/r/538573 (https://phabricator.wikimedia.org/T233518) [08:58:49] (03CR) 10jerkins-bot: [V: 04-1] backups: Change file owner of bacula storage daemon config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo) [08:59:06] (03CR) 10jerkins-bot: [V: 04-1] acme_chief: Simplify notification handling on acme_chief::cert [puppet] - 10https://gerrit.wikimedia.org/r/538573 (https://phabricator.wikimedia.org/T233518) (owner: 10Vgutierrez) [08:59:16] sigh [08:59:40] (03PS4) 10Jcrespo: backups: Change file owner of bacula storage daemon config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) [09:00:23] (03PS3) 10Vgutierrez: acme_chief: Simplify notification handling on acme_chief::cert [puppet] - 10https://gerrit.wikimedia.org/r/538573 (https://phabricator.wikimedia.org/T233518) [09:00:37] (03CR) 10jerkins-bot: [V: 04-1] backups: Change file owner of bacula storage daemon config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo) [09:02:07] (03PS2) 10Mathew.onipe: query_service: rename wdqs module to query_service [puppet] - 10https://gerrit.wikimedia.org/r/538572 (https://phabricator.wikimedia.org/T232297) [09:02:28] (03CR) 10Vgutierrez: [C: 03+2] "PCC shows a NOOP on different acme-chief clients https://puppet-compiler.wmflabs.org/compiler1002/18489/ and we get rid of the temp. varia" [puppet] - 10https://gerrit.wikimedia.org/r/538573 (https://phabricator.wikimedia.org/T233518) (owner: 10Vgutierrez) [09:02:44] (03PS4) 10Vgutierrez: acme_chief: Simplify notification handling on acme_chief::cert [puppet] - 10https://gerrit.wikimedia.org/r/538573 (https://phabricator.wikimedia.org/T233518) [09:02:48] !log Disable puppet and rolling restart php-fpm on mw[1312-1317,1339-1347]* - T219150 [09:02:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:02:51] T219150: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 [09:07:31] (03PS1) 10Vgutierrez: install_server: Reload nginx instead of restarting it on cert updates [puppet] - 10https://gerrit.wikimedia.org/r/538574 (https://phabricator.wikimedia.org/T233518) [09:08:14] (03CR) 10jerkins-bot: [V: 04-1] install_server: Reload nginx instead of restarting it on cert updates [puppet] - 10https://gerrit.wikimedia.org/r/538574 (https://phabricator.wikimedia.org/T233518) (owner: 10Vgutierrez) [09:10:18] (03PS2) 10Vgutierrez: install_server: Reload nginx instead of restarting it on cert updates [puppet] - 10https://gerrit.wikimedia.org/r/538574 (https://phabricator.wikimedia.org/T233518) [09:11:30] (03CR) 10Alexandros Kosiaris: [C: 03+2] apertium-dan-nor: New upstream release [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/538379 (https://phabricator.wikimedia.org/T218184) (owner: 10KartikMistry) [09:11:57] (03CR) 10Vgutierrez: [C: 03+2] "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1002/18493/" [puppet] - 10https://gerrit.wikimedia.org/r/538574 (https://phabricator.wikimedia.org/T233518) (owner: 10Vgutierrez) [09:12:00] (03PS1) 10Ladsgroup: Set item terms on write both up to Q40Mio [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538577 (https://phabricator.wikimedia.org/T225055) [09:12:22] (03CR) 10Alexandros Kosiaris: [C: 03+2] apertium-nno-nob: New upstream release [debs/contenttranslation/apertium-nno-nob] - 10https://gerrit.wikimedia.org/r/538380 (https://phabricator.wikimedia.org/T218184) (owner: 10KartikMistry) [09:12:44] (03CR) 10Alexandros Kosiaris: [C: 03+2] apertium-swe-dan: New upstream release [debs/contenttranslation/apertium-swe-dan] - 10https://gerrit.wikimedia.org/r/538381 (https://phabricator.wikimedia.org/T218184) (owner: 10KartikMistry) [09:12:49] 10Operations, 10Traffic, 10Patch-For-Review: puppet restarts nginx instead of reloading it on ncredir servers - https://phabricator.wikimedia.org/T233518 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [09:13:02] (03CR) 10Alexandros Kosiaris: [C: 03+2] apertium-swe-nor: New upstream release [debs/contenttranslation/apertium-swe-nor] - 10https://gerrit.wikimedia.org/r/538382 (https://phabricator.wikimedia.org/T218184) (owner: 10KartikMistry) [09:15:43] (03PS5) 10Jcrespo: backups: Change file owner of bacula storage&director config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) [09:16:40] (03CR) 10jerkins-bot: [V: 04-1] backups: Change file owner of bacula storage&director config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo) [09:17:56] (03CR) 10Alexandros Kosiaris: [C: 03+1] Convert openldap/corp to profile [puppet] - 10https://gerrit.wikimedia.org/r/538273 (owner: 10Muehlenhoff) [09:20:43] (03CR) 10Alexandros Kosiaris: [C: 03+1] Add pbuilder hook to build packages with/against forward-ported JDK 8 [puppet] - 10https://gerrit.wikimedia.org/r/538277 (owner: 10Muehlenhoff) [09:25:24] (03Abandoned) 10Mathew.onipe: query_service: rename wdqs module to query_service [puppet] - 10https://gerrit.wikimedia.org/r/537008 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe) [09:25:54] (03PS6) 10Jcrespo: backups: Change file owner of bacula storage&director config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) [09:26:01] (03PS4) 10Mathew.onipe: query_service: change wdqs module to query_service for reusbility [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297) [09:26:07] PROBLEM - Check systemd state on ms-be1040 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [09:26:45] (03PS1) 10Vgutierrez: ATS: Reenable systed systemd_hardening on cp5001 for ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/538580 [09:26:48] (03CR) 10jerkins-bot: [V: 04-1] backups: Change file owner of bacula storage&director config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo) [09:27:27] (03PS2) 10Vgutierrez: ATS: Reenable systemd hardening on cp5001 for ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/538580 (https://phabricator.wikimedia.org/T232298) [09:29:20] (03PS7) 10Jcrespo: backups: Change file owner of bacula storage&director config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) [09:31:17] (03CR) 10Jcrespo: [C: 04-1] "Blind first pass, I will later check all diffs." [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) (owner: 10Jcrespo) [09:34:52] (03PS1) 10Vgutierrez: ATS: Set sysconfdir back as a read-only directory [puppet] - 10https://gerrit.wikimedia.org/r/538582 (https://phabricator.wikimedia.org/T232988) [09:35:56] (03CR) 10Vgutierrez: [C: 03+2] ATS: Reenable systemd hardening on cp5001 for ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/538580 (https://phabricator.wikimedia.org/T232298) (owner: 10Vgutierrez) [09:38:17] !log T218184 upload to apt.wikimedia.org/jessie-wikimedia apertium-dan-nor_1.4.0-1+wmf1, apertium-nno-nob_1.2.0-1+wmf1, apertium-swe-dan_0.8.0-2+wmf1, apertium-swe-nor_0.3.0-2+wmf1 [09:38:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:20] T218184: Update apertium-nno-nob, apertium-swe-dan, apertium-swe-nor and apertium-dan-nor packages - https://phabricator.wikimedia.org/T218184 [09:40:00] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/536714 (owner: 10Dzahn) [09:40:51] (03PS3) 10Mathew.onipe: query_service: rename wdqs module to query_service [puppet] - 10https://gerrit.wikimedia.org/r/538572 (https://phabricator.wikimedia.org/T232297) [09:40:53] (03PS5) 10Mathew.onipe: query_service: prepare query_service for reusbility [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297) [09:42:06] (03CR) 10Muehlenhoff: "One comment inline (and this needs SRE meeting approval or earlier signoff by Mark or Faidon to merge)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538048 (https://phabricator.wikimedia.org/T233189) (owner: 10Volans) [09:44:05] !log mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bnwiki --logwiki=metawiki 'Huangzonghao' 'HUANGZONGHAO' (T233585) [09:44:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:08] T233585: Please unblock failed global renames - https://phabricator.wikimedia.org/T233585 [09:45:21] !log mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'نعنوعه' 'مريانا_علي' (T233585) [09:45:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:48] (03CR) 10Muehlenhoff: profile::icinga: Add apereo_cas authenticated vhost (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538019 (owner: 10Jbond) [09:49:32] (03PS3) 10Muehlenhoff: Convert openldap/corp to profile [puppet] - 10https://gerrit.wikimedia.org/r/538273 [09:51:43] (03CR) 10Vgutierrez: [C: 03+2] ATS: Set sysconfdir back as a read-only directory [puppet] - 10https://gerrit.wikimedia.org/r/538582 (https://phabricator.wikimedia.org/T232988) (owner: 10Vgutierrez) [09:51:50] !log stopping db2102 mariadb to recover db [09:51:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:55] (03PS2) 10Vgutierrez: ATS: Set sysconfdir back as a read-only directory [puppet] - 10https://gerrit.wikimedia.org/r/538582 (https://phabricator.wikimedia.org/T232988) [09:52:53] (03CR) 10Muehlenhoff: [C: 03+2] Convert openldap/corp to profile [puppet] - 10https://gerrit.wikimedia.org/r/538273 (owner: 10Muehlenhoff) [09:53:25] (03PS4) 10Muehlenhoff: Convert openldap/corp to profile [puppet] - 10https://gerrit.wikimedia.org/r/538273 [09:54:08] (03PS4) 10Effie Mouzeli: Convert 50% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) [09:58:07] (03CR) 10Vgutierrez: [C: 03+2] ATS: Reenable systemd hardening on cp5001 for ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/538580 (https://phabricator.wikimedia.org/T232298) (owner: 10Vgutierrez) [09:58:34] (03PS3) 10Vgutierrez: ATS: Reenable systemd hardening on cp5001 for ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/538580 (https://phabricator.wikimedia.org/T232298) [10:00:39] (03CR) 10Effie Mouzeli: [V: 03+1] "Expected https://puppet-compiler.wmflabs.org/compiler1001/18498/" [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [10:04:31] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] Convert 50% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [10:04:35] (03PS5) 10Effie Mouzeli: Convert 50% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538565 (https://phabricator.wikimedia.org/T219150) [10:07:25] (03PS5) 10Jbond: profile::icinga: Add apereo_cas authenticated vhost [puppet] - 10https://gerrit.wikimedia.org/r/538019 [10:08:16] (03CR) 10Filippo Giunchedi: initial commit (033 comments) [debs/prometheus-swagger-exporter] - 10https://gerrit.wikimedia.org/r/536376 (owner: 10Cwhite) [10:11:17] RECOVERY - Check systemd state on ms-be1040 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:11:19] (03PS4) 10Giuseppe Lavagetto: Add envoy image with TLS termination. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/399640 [10:11:21] (03PS1) 10Giuseppe Lavagetto: Fixes to the envoy image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/538586 [10:13:22] (03CR) 10Jbond: profile::icinga: Add apereo_cas authenticated vhost (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538019 (owner: 10Jbond) [10:15:16] (03PS2) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp4027 [puppet] - 10https://gerrit.wikimedia.org/r/537995 (https://phabricator.wikimedia.org/T231627) [10:15:18] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp4027 [puppet] - 10https://gerrit.wikimedia.org/r/538587 (https://phabricator.wikimedia.org/T231627) [10:15:48] (03PS1) 10Muehlenhoff: openldap_corp: Hierarize existing setup to allow adding a second server pair [puppet] - 10https://gerrit.wikimedia.org/r/538588 [10:16:08] (03PS2) 10Muehlenhoff: Add pbuilder hook to build packages with/against forward-ported JDK 8 [puppet] - 10https://gerrit.wikimedia.org/r/538277 [10:17:02] (03CR) 10Muehlenhoff: [C: 03+2] Add pbuilder hook to build packages with/against forward-ported JDK 8 [puppet] - 10https://gerrit.wikimedia.org/r/538277 (owner: 10Muehlenhoff) [10:17:36] (03Abandoned) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp4027 [puppet] - 10https://gerrit.wikimedia.org/r/537995 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [10:17:39] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp4027 [puppet] - 10https://gerrit.wikimedia.org/r/538589 (https://phabricator.wikimedia.org/T231627) [10:20:56] (03Abandoned) 10Jbond: Revert "puppetmaster::frontend: update web conf to use RewriteRules instead of proxypass" [puppet] - 10https://gerrit.wikimedia.org/r/532699 (owner: 10Jbond) [10:22:30] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/538019 (owner: 10Jbond) [10:23:47] (03CR) 10Filippo Giunchedi: [C: 03+2] Remove obsolete partman recipe lvm.cfg [puppet] - 10https://gerrit.wikimedia.org/r/538268 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [10:25:20] (03CR) 10Filippo Giunchedi: Set up scap target for deploying the phatality plugin into kibana (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/537240 (https://phabricator.wikimedia.org/T230752) (owner: 1020after4) [10:26:11] (03PS1) 10Jbond: puppetmaster1003: promote puppetmaster1003 to a real puppetmaster backend [puppet] - 10https://gerrit.wikimedia.org/r/538590 (https://phabricator.wikimedia.org/T233203) [10:27:03] (03CR) 1020after4: [C: 04-1] "That works for me as well. I'll revise the patch" [puppet] - 10https://gerrit.wikimedia.org/r/537240 (https://phabricator.wikimedia.org/T230752) (owner: 1020after4) [10:28:01] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/538590 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [10:30:04] jan_drewniak: I, the Bot under the Fountain, allow thee, The Deployer, to do Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190923T1030). [10:30:17] (03PS2) 10Muehlenhoff: openldap_corp: Hierarize existing setup to allow adding a second server pair [puppet] - 10https://gerrit.wikimedia.org/r/538588 [10:33:11] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538591 (https://phabricator.wikimedia.org/T128546) [10:33:46] 10Operations: Create OpenJDK 8 packages for Buster - https://phabricator.wikimedia.org/T233604 (10MoritzMuehlenhoff) [10:39:05] (03CR) 10Filippo Giunchedi: [C: 04-1] "See inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/529590 (owner: 10Alex Monk) [10:39:14] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538591 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:40:11] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538591 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:40:32] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538591 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:42:59] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:538591| Bumping portals to master (T128546)]] (duration: 00m 58s) [10:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:03] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:43:57] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:538591| Bumping portals to master (T128546)]] (duration: 00m 57s) [10:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:46:21] (03PS3) 10Muehlenhoff: openldap_corp: Hierarize existing setup to allow adding a second server pair [puppet] - 10https://gerrit.wikimedia.org/r/538588 [10:50:24] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM in the sense that the change is correct and octocatalog-diff from related task looks good AFAICT. Also easy to revert/depool puppetma" [puppet] - 10https://gerrit.wikimedia.org/r/538590 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [10:53:13] (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1001/18502/" [puppet] - 10https://gerrit.wikimedia.org/r/538588 (owner: 10Muehlenhoff) [10:54:46] (03PS3) 10Effie Mouzeli: Convert 25% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) [10:55:46] (03PS2) 10Muehlenhoff: Remove obsolete partman recipe lvm.cfg [puppet] - 10https://gerrit.wikimedia.org/r/538268 (https://phabricator.wikimedia.org/T156955) [10:57:48] 10Operations: Create OpenJDK 8 packages for Buster - https://phabricator.wikimedia.org/T233604 (10elukey) Work already done to keep archives happy: https://gerrit.wikimedia.org/r/#/c/538274/ https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/538277/ [10:58:00] (03PS4) 10Effie Mouzeli: Convert 25% of API servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) [11:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190923T1100). [11:00:05] awight, odder, Ammarpad, and Amir1: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:09] o/ [11:00:11] (03PS6) 10Jbond: profile::icinga: Add apereo_cas authenticated vhost [puppet] - 10https://gerrit.wikimedia.org/r/538019 [11:00:52] I can deploy today, just let me know if anyone wishes to deploy their own patches. [11:00:56] o/ [11:01:26] o/ [11:03:19] odder: I'll push your patches sequentially. [11:03:26] (03CR) 10Awight: [C: 03+2] Add localized logos for the Zulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538291 (https://phabricator.wikimedia.org/T233424) (owner: 10Odder) [11:03:32] (03CR) 10Jbond: [C: 03+2] profile::icinga: Add apereo_cas authenticated vhost [puppet] - 10https://gerrit.wikimedia.org/r/538019 (owner: 10Jbond) [11:03:45] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Commit message is wrong." [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [11:04:14] & will skip the debug step for the first patch. [11:04:19] (03Merged) 10jenkins-bot: Add localized logos for the Zulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538291 (https://phabricator.wikimedia.org/T233424) (owner: 10Odder) [11:05:26] !log uploaded openjdk 8u222-b10-1~deb10u1 to buster-wikimedia/component/jdk8 (bootstrap build, second boron build following) T233604 [11:05:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:30] T233604: Create OpenJDK 8 packages for Buster - https://phabricator.wikimedia.org/T233604 [11:05:55] (03CR) 10Awight: [C: 03+2] Add localized logos for the Zulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538293 (https://phabricator.wikimedia.org/T233424) (owner: 10Odder) [11:06:16] (03CR) 10Jcrespo: [C: 03+1] Remove obsolete partman recipe lvm.cfg [puppet] - 10https://gerrit.wikimedia.org/r/538268 (https://phabricator.wikimedia.org/T156955) (owner: 10Muehlenhoff) [11:06:20] !log awight@deploy1001 Synchronized static/images/project-logos: SWAT: [[gerrit:538291|Add localized logos for the Zulu Wikipedia (T233424)]] (duration: 00m 57s) [11:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:23] T233424: New localized logo for zu.wikipedia.org - https://phabricator.wikimedia.org/T233424 [11:06:35] (03CR) 10jenkins-bot: Add localized logos for the Zulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538291 (https://phabricator.wikimedia.org/T233424) (owner: 10Odder) [11:06:46] (03Merged) 10jenkins-bot: Add localized logos for the Zulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538293 (https://phabricator.wikimedia.org/T233424) (owner: 10Odder) [11:07:34] odder: Please check mwdebug1002, both patches should be applied there. [11:08:39] (03CR) 10jenkins-bot: Add localized logos for the Zulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538293 (https://phabricator.wikimedia.org/T233424) (owner: 10Odder) [11:09:20] Looks correct to me, continuing. [11:09:31] awight: Yup, works for me, too. [11:09:37] thanks! [11:09:53] (03PS5) 10Effie Mouzeli: Convert 25% of app servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) [11:11:09] !log awight@deploy1001 Synchronized wmf-config/VariantSettings.php: SWAT: [[gerrit:538293|Add localized logos for the Zulu Wikipedia (T233424)]] (duration: 00m 56s) [11:11:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:06] (03CR) 10Effie Mouzeli: [V: 03+1] "Expected https://puppet-compiler.wmflabs.org/compiler1002/18506/" [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [11:12:11] Is anyone here to coordinate Ammarpad's patch? If not, I will skip it. There's a dependency between multiple files, so this actually looks unsafe to me: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/538408/ [11:12:44] (03CR) 10Awight: [C: 04-1] "Skipping this patch due to the unsafe dependency. Please split into a patch providing the file, and a patch with configuration to use the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [11:12:58] !log Disable puppet and rolling restart of php7.2-fpm on mw[1321-1333] - T219150 [11:13:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:01] T219150: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 [11:13:54] Amir1: Can I go ahead with your terms migration? [11:14:03] awight: sure, not testable [11:14:08] ack! [11:14:17] (03CR) 10Awight: "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538577 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:15:06] (03CR) 10Alexandros Kosiaris: [C: 03+1] Add envoy image with TLS termination. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/399640 (owner: 10Giuseppe Lavagetto) [11:15:12] awight: Hm, I can't see the logo out of mwdebug1002, did you purge the logo in Varnish? [11:15:16] awight: https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#Purging [11:15:36] odder: thanks, I haven't purged. Will do! [11:16:03] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] Convert 25% of app servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) (owner: 10Effie Mouzeli) [11:16:13] (03PS6) 10Effie Mouzeli: Convert 25% of app servers to only serve PHP7.2 [puppet] - 10https://gerrit.wikimedia.org/r/538571 (https://phabricator.wikimedia.org/T219150) [11:16:39] (03Abandoned) 10Alexandros Kosiaris: prometheus, k8s: enabling services prometheus service discovery [puppet] - 10https://gerrit.wikimedia.org/r/529789 (owner: 10Fsero) [11:16:45] awight: And it works, thanks again :) [11:17:28] (03PS1) 10Jbond: icinga::cas: fix incorrectr attribute name [puppet] - 10https://gerrit.wikimedia.org/r/538593 [11:18:04] odder: Purged. [11:18:08] (03PS2) 10Jbond: icinga::cas: fix incorrect attribute name [puppet] - 10https://gerrit.wikimedia.org/r/538593 [11:18:17] (03PS3) 10Jbond: icinga::cas: fix incorrect attribute name [puppet] - 10https://gerrit.wikimedia.org/r/538593 [11:18:44] (03CR) 10Awight: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538577 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:18:51] * awight facepalms [11:21:08] (03PS2) 10Awight: Set item terms on write both up to Q40Mio [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538577 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:21:10] (03CR) 10Awight: [C: 03+2] Set item terms on write both up to Q40Mio [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538577 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:21:48] (03CR) 10Jbond: [C: 03+2] icinga::cas: fix incorrect attribute name [puppet] - 10https://gerrit.wikimedia.org/r/538593 (owner: 10Jbond) [11:21:55] (03Merged) 10jenkins-bot: Set item terms on write both up to Q40Mio [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538577 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:23:32] !log awight@deploy1001 Synchronized wmf-config/VariantSettings.php: SWAT: [[gerrit:538577|Set item terms on write both up to Q40Mio (T225055)]] (duration: 00m 55s) [11:23:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:23:36] T225055: Switch `tmpItemTermsMigrationStages` to MIGRATION_WRITE_BOTH - https://phabricator.wikimedia.org/T225055 [11:23:43] (03CR) 10jenkins-bot: Set item terms on write both up to Q40Mio [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538577 (https://phabricator.wikimedia.org/T225055) (owner: 10Ladsgroup) [11:27:03] Amir1: request for advice: my backport includes some non-essential i18n message changes. Is it reasonable to skip the full scap/l10n cache update, and let that go out with the train? [11:27:33] awight: yeah, if it doesn't need the i18n, just let it be [11:27:41] :) [11:27:44] gladly. [11:31:43] (03PS1) 10Jbond: icinga::cas apply correct type checking [puppet] - 10https://gerrit.wikimedia.org/r/538594 [11:31:50] !log awight@deploy1001 Synchronized php-1.34.0-wmf.23/extensions/FileImporter: SWAT: [[gerrit:538566|Add change tags to all FileImport text revisions (T227849)]] (duration: 00m 57s) [11:31:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:53] T227849: Guarantee that FileImporter imported pages are discoverable after a failure - https://phabricator.wikimedia.org/T227849 [11:33:21] (03CR) 10Jbond: [C: 03+2] icinga::cas apply correct type checking [puppet] - 10https://gerrit.wikimedia.org/r/538594 (owner: 10Jbond) [11:33:41] !log EU SWAT finished [11:33:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:07] !log installing expat security updates [11:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:20] 10Operations, 10Icinga, 10observability, 10User-CDanis: re-create script for manual paging - https://phabricator.wikimedia.org/T82937 (10CDanis) [11:37:23] (03PS3) 10Ammarpad: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) [11:38:11] (03PS2) 10DCausse: [cirrus] glent method 0 A/B test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/537637 (https://phabricator.wikimedia.org/T233211) [11:38:27] Ammarpad: Thanks, I'm still happy to deploy this today! [11:38:29] 10Puppet, 10Patch-For-Review: Analyses octocatalog-diff output - https://phabricator.wikimedia.org/T233203 (10jbond) p:05Triage→03Normal [11:42:03] !log switching cp4027 from nginx to ats-tls - T231627 [11:42:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:07] T231627: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 [11:42:54] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to 4443 on cp4027 [puppet] - 10https://gerrit.wikimedia.org/r/538587 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [11:42:57] (03PS1) 10Jbond: icinga::cas add ssl_settings [puppet] - 10https://gerrit.wikimedia.org/r/538595 [11:43:07] (03PS2) 10Vgutierrez: hiera: Move nginx from port 443 to 4443 on cp4027 [puppet] - 10https://gerrit.wikimedia.org/r/538587 (https://phabricator.wikimedia.org/T231627) [11:43:59] (03PS7) 10CDanis: dbctl: add set-candidate-master subcommand on instance [software/conftool] - 10https://gerrit.wikimedia.org/r/534819 (https://phabricator.wikimedia.org/T229677) [11:44:01] (03PS6) 10CDanis: dbctl: add set-note instance subcommand [software/conftool] - 10https://gerrit.wikimedia.org/r/534899 (https://phabricator.wikimedia.org/T229677) [11:44:03] (03PS2) 10CDanis: some reminders for doing the next release [software/conftool] - 10https://gerrit.wikimedia.org/r/535296 [11:44:19] (03CR) 10CDanis: dbctl: add set-candidate-master subcommand on instance (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/534819 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [11:44:23] (03CR) 10Jbond: [C: 03+2] icinga::cas add ssl_settings [puppet] - 10https://gerrit.wikimedia.org/r/538595 (owner: 10Jbond) [11:44:32] (03PS2) 10Jbond: icinga::cas add ssl_settings [puppet] - 10https://gerrit.wikimedia.org/r/538595 [11:47:05] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to 443 on cp4027 [puppet] - 10https://gerrit.wikimedia.org/r/538589 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [11:47:20] (03PS2) 10Vgutierrez: hiera: Move ats-tls from port 8443 to 443 on cp4027 [puppet] - 10https://gerrit.wikimedia.org/r/538589 (https://phabricator.wikimedia.org/T231627) [11:50:15] PROBLEM - HTTPS Unified ECDSA on cp4027 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [11:50:25] that's expected ^^ [11:50:31] !log restarting Apache/HHVM/PHP on mw1261-mw1265 after Expat security update [11:50:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:51] PROBLEM - HTTPS Unified RSA on cp4027 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [11:51:51] RECOVERY - HTTPS Unified ECDSA on cp4027 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345563 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-11-22 07:59:59 +0000 (expires in 59 days) https://wikitech.wikimedia.org/wiki/HTTPS [11:52:20] jbond42: puppet is failing on icinga1001 with something related to cas [11:52:27] RECOVERY - HTTPS Unified RSA on cp4027 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345527 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2019-11-22 07:59:59 +0000 (expires in 59 days) https://wikitech.wikimedia.org/wiki/HTTPS [11:52:49] https://www.irccloud.com/pastebin/RtLv82Xv/ [11:53:17] jbond42: it's been failing for the last hour and something [11:54:11] vgutierrez: yes sorry i will ackknowlage it [11:54:13] vgutierrez: ack it's WIP, see the followup commits [11:54:46] ook... ETA? it's blocking some stuff on cp4027 on my side [11:55:01] 10 mins? [11:55:13] ack :) [11:55:23] cheers ill ping once done [11:55:41] PROBLEM - ats-tls HTTPS en.wikipedia.org ECDSA on cp4027 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [11:55:57] PROBLEM - Ensure traffic_manager binds on 8443 and responds to HTTP requests on cp4027 is CRITICAL: connect to address 10.128.0.127 and port 8443: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:56:03] PROBLEM - ats-tls HTTPS en.wikipedia.org RSA on cp4027 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [11:56:24] ACKNOWLEDGEMENT - Ensure traffic_manager binds on 8443 and responds to HTTP requests on cp4027 is CRITICAL: connect to address 10.128.0.127 and port 8443: Connection refused Vgutierrez waiting to be able to run puppet on icinga1001 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:56:24] ACKNOWLEDGEMENT - ats-tls HTTPS en.wikipedia.org ECDSA on cp4027 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused Vgutierrez waiting to be able to run puppet on icinga1001 https://wikitech.wikimedia.org/wiki/HTTPS [11:56:24] ACKNOWLEDGEMENT - ats-tls HTTPS en.wikipedia.org RSA on cp4027 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused Vgutierrez waiting to be able to run puppet on icinga1001 https://wikitech.wikimedia.org/wiki/HTTPS [11:59:04] (03CR) 10CDanis: [C: 03+2] dbctl: add set-candidate-master subcommand on instance [software/conftool] - 10https://gerrit.wikimedia.org/r/534819 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [11:59:07] (03CR) 10CDanis: [C: 03+2] dbctl: add set-note instance subcommand [software/conftool] - 10https://gerrit.wikimedia.org/r/534899 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [11:59:49] (03PS1) 10Ammarpad: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538597 (https://phabricator.wikimedia.org/T233104) [12:01:51] (03Merged) 10jenkins-bot: dbctl: add set-candidate-master subcommand on instance [software/conftool] - 10https://gerrit.wikimedia.org/r/534819 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [12:01:53] (03Merged) 10jenkins-bot: dbctl: add set-note instance subcommand [software/conftool] - 10https://gerrit.wikimedia.org/r/534899 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [12:05:07] (03PS4) 10Ammarpad: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) [12:05:24] !log restarting apache on bast5001 to pick up expat security update [12:05:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:27] (03PS5) 10Ammarpad: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) [12:06:12] (03CR) 10Ammarpad: "> Skipping this patch due to the unsafe dependency. Please split" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [12:09:44] (03PS2) 10Ammarpad: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538597 [12:28:07] (03PS1) 10Jbond: icinga::cas: fix cas errors and add spec test [puppet] - 10https://gerrit.wikimedia.org/r/538600 [12:29:15] (03CR) 10jerkins-bot: [V: 04-1] icinga::cas: fix cas errors and add spec test [puppet] - 10https://gerrit.wikimedia.org/r/538600 (owner: 10Jbond) [12:34:34] Does anyone know who I need to ask for interface admin rights on the beta cluster wikis? [12:34:37] (03PS2) 10Jbond: icinga::cas: fix cas errors and add spec test [puppet] - 10https://gerrit.wikimedia.org/r/538600 [12:36:16] (03PS1) 10Elukey: profile::analytics::refinery::job::druid_load: add dims to netflow [puppet] - 10https://gerrit.wikimedia.org/r/538603 (https://phabricator.wikimedia.org/T229682) [12:37:22] (03CR) 10Jbond: [C: 03+2] icinga::cas: fix cas errors and add spec test [puppet] - 10https://gerrit.wikimedia.org/r/538600 (owner: 10Jbond) [12:39:45] (03CR) 10Elukey: "Arzhel just added two new fields to the netflow stream (as described in https://phabricator.wikimedia.org/T229682#5507056). Do we need to " [puppet] - 10https://gerrit.wikimedia.org/r/538603 (https://phabricator.wikimedia.org/T229682) (owner: 10Elukey) [12:40:34] !log rolling restart of graphoid on scb to pick up expat security update [12:40:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:29] (03CR) 10Marostegui: "<3" [software/conftool] - 10https://gerrit.wikimedia.org/r/534819 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [12:44:17] (03PS1) 10Jbond: icinga::cas correct vhost [puppet] - 10https://gerrit.wikimedia.org/r/538605 [12:45:20] (03CR) 10Jbond: [C: 03+2] icinga::cas correct vhost [puppet] - 10https://gerrit.wikimedia.org/r/538605 (owner: 10Jbond) [12:46:07] (03PS1) 10Gehel: elasticsearch: decommission elastic1017 [puppet] - 10https://gerrit.wikimedia.org/r/538606 (https://phabricator.wikimedia.org/T230518) [12:46:37] 10Operations: Create OpenJDK 8 packages for Buster - https://phabricator.wikimedia.org/T233604 (10elukey) [12:46:54] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: decommission elastic1017 [puppet] - 10https://gerrit.wikimedia.org/r/538606 (https://phabricator.wikimedia.org/T230518) (owner: 10Gehel) [12:49:04] RECOVERY - ats-tls HTTPS en.wikipedia.org RSA on cp4027 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 342130 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2019-11-22 07:59:59 +0000 (expires in 59 days) https://wikitech.wikimedia.org/wiki/HTTPS [12:49:18] yey.. icinga is back :) [12:50:33] sorry not quite yet :S [12:50:39] hmmm jbond42 I guess that you're aware but icinga web interface is currently down [12:51:04] PROBLEM - Check systemd state on icinga1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:51:13] indeed, external monitoring paged for icinga [12:52:01] yes apache is failihng to start (hafve removed my additional vhost and still checking) [12:53:09] RECOVERY - Check systemd state on icinga1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:53:09] ack, thanks jbond42 ! ping if help is needed [12:53:25] ok its back, will continue to troubleshoot [12:53:28] thanks godog [12:53:40] (03CR) 10Muehlenhoff: "Looks good, one comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538606 (https://phabricator.wikimedia.org/T230518) (owner: 10Gehel) [12:55:16] (03PS2) 10Gehel: elasticsearch: decommission elastic1017 [puppet] - 10https://gerrit.wikimedia.org/r/538606 (https://phabricator.wikimedia.org/T230518) [12:56:02] RECOVERY - ats-tls HTTPS en.wikipedia.org ECDSA on cp4027 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 341713 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-11-22 07:59:59 +0000 (expires in 59 days) https://wikitech.wikimedia.org/wiki/HTTPS [12:56:06] (03CR) 10jerkins-bot: [V: 04-1] elasticsearch: decommission elastic1017 [puppet] - 10https://gerrit.wikimedia.org/r/538606 (https://phabricator.wikimedia.org/T230518) (owner: 10Gehel) [12:56:30] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [12:57:17] (03PS3) 10Gehel: elasticsearch: decommission elastic1017 [puppet] - 10https://gerrit.wikimedia.org/r/538606 (https://phabricator.wikimedia.org/T230518) [12:58:02] _joe_, effie ^^ that's expected? [12:59:06] <_joe_> nope [12:59:16] <_joe_> exactly what I was hoping not to see actually [12:59:32] <_joe_> oh it's both HHVM and php [12:59:32] (03CR) 10Muehlenhoff: [C: 03+1] elasticsearch: decommission elastic1017 [puppet] - 10https://gerrit.wikimedia.org/r/538606 (https://phabricator.wikimedia.org/T230518) (owner: 10Gehel) [12:59:44] <_joe_> so it's not php-specific at least [13:00:00] <_joe_> and it's the usual pattern [13:00:10] yeah.. actually it looks like hhvm handled it sightly better [13:00:11] <_joe_> some important memcached set of keys expired or something [13:00:15] (03CR) 10CDanis: [C: 03+2] "Half the reason I wrote this was just to not forget stuff during vacation, but I still think this is a reasonable approach in general. Le" [software/conftool] - 10https://gerrit.wikimedia.org/r/535296 (owner: 10CDanis) [13:00:18] <_joe_> vgutierrez: because it has less load [13:00:30] <_joe_> now most requests go to php [13:00:40] <_joe_> also yes hhvm is a tad more efficient when dealing with memcached [13:01:22] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [13:03:08] (03Merged) 10jenkins-bot: some reminders for doing the next release [software/conftool] - 10https://gerrit.wikimedia.org/r/535296 (owner: 10CDanis) [13:03:59] (03CR) 10Gehel: [C: 03+2] elasticsearch: decommission elastic1017 [puppet] - 10https://gerrit.wikimedia.org/r/538606 (https://phabricator.wikimedia.org/T230518) (owner: 10Gehel) [13:04:20] (03PS4) 10Gehel: elasticsearch: decommission elastic1017 [puppet] - 10https://gerrit.wikimedia.org/r/538606 (https://phabricator.wikimedia.org/T230518) [13:06:54] (03PS3) 10CDanis: dbctl: indicate failed commit in announcement [software/conftool] - 10https://gerrit.wikimedia.org/r/534230 (https://phabricator.wikimedia.org/T231871) [13:07:11] 10Operations, 10ops-eqiad, 10DC-Ops, 10Discovery-Search (Current work), 10Patch-For-Review: elastic1017 lost network after reboot - https://phabricator.wikimedia.org/T230518 (10Gehel) Steps for decommission of elastic1017: [x] - all system services confirmed offline from production use [x] - set all ic... [13:13:21] (03PS1) 10Jbond: icinga::cas correct paramater names [puppet] - 10https://gerrit.wikimedia.org/r/538612 [13:13:41] (03PS2) 10Jbond: icinga::cas correct paramater names [puppet] - 10https://gerrit.wikimedia.org/r/538612 [13:16:03] (03CR) 10Jbond: [C: 03+2] icinga::cas correct paramater names [puppet] - 10https://gerrit.wikimedia.org/r/538612 (owner: 10Jbond) [13:20:21] vgutierrez: much longer then the 10 minutes i promised but icinga should be good now. sorry for the delay [13:20:35] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, 10Discovery-Search (Current work): elastic1017 lost network after reboot - https://phabricator.wikimedia.org/T230518 (10MoritzMuehlenhoff) [13:20:36] thx [13:21:18] !log installing qemu security updates on remaining cloudvirt hosts [13:21:20] (03PS2) 10Jbond: puppetmaster1003: promote puppetmaster1003 to a real puppetmaster backend [puppet] - 10https://gerrit.wikimedia.org/r/538590 (https://phabricator.wikimedia.org/T233203) [13:21:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:37] (03CR) 10Jbond: [C: 03+2] puppetmaster1003: promote puppetmaster1003 to a real puppetmaster backend [puppet] - 10https://gerrit.wikimedia.org/r/538590 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [13:24:58] 10Operations, 10ops-eqiad, 10DC-Ops, 10User-Zppix, 10cloud-services-team (Kanban): VMs on cloudvirt1015 crashing - bad mainboard/memory - https://phabricator.wikimedia.org/T220853 (10MoritzMuehlenhoff) What's the status, was there a reply from Dell? [13:29:26] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:36:52] (03CR) 10Giuseppe Lavagetto: [C: 03+1] dbctl: indicate failed commit in announcement [software/conftool] - 10https://gerrit.wikimedia.org/r/534230 (https://phabricator.wikimedia.org/T231871) (owner: 10CDanis) [13:39:58] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:41:28] 10Operations: ganeti netbox sync alerts are noisy - https://phabricator.wikimedia.org/T233624 (10CDanis) [13:43:49] 10Operations: ganeti netbox sync alerts are noisy - https://phabricator.wikimedia.org/T233624 (10CDanis) I've downtimed these two alerts for a week, expires 30 Sept. [13:45:05] (03CR) 10Masumrezarock100: [C: 03+1] Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538597 (owner: 10Ammarpad) [13:46:02] (03CR) 10CDanis: [C: 03+2] dbctl: indicate failed commit in announcement [software/conftool] - 10https://gerrit.wikimedia.org/r/534230 (https://phabricator.wikimedia.org/T231871) (owner: 10CDanis) [13:48:46] (03Merged) 10jenkins-bot: dbctl: indicate failed commit in announcement [software/conftool] - 10https://gerrit.wikimedia.org/r/534230 (https://phabricator.wikimedia.org/T231871) (owner: 10CDanis) [13:49:30] (03PS5) 10Ottomata: Check that oozie is installed (not spark 1) for installing sharelib [puppet] - 10https://gerrit.wikimedia.org/r/532403 (https://phabricator.wikimedia.org/T229347) [13:49:46] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Check that oozie is installed (not spark 1) for installing sharelib [puppet] - 10https://gerrit.wikimedia.org/r/532403 (https://phabricator.wikimedia.org/T229347) (owner: 10Ottomata) [13:53:01] 10Operations, 10ops-eqiad, 10DBA: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10Marostegui) Any ETA on when the request will be sent to Dell? Thanks! [14:00:35] (03PS1) 10Paladox: Merge remote-tracking branch 'origin/stable-2.15' into wmf/stable-2.15 [software/gerrit] (stable-2.15) - 10https://gerrit.wikimedia.org/r/538617 [14:00:53] (03Abandoned) 10Paladox: Merge remote-tracking branch 'origin/stable-2.15' into wmf/stable-2.15 [software/gerrit] (stable-2.15) - 10https://gerrit.wikimedia.org/r/538617 (owner: 10Paladox) [14:00:56] (03PS1) 10Paladox: Merge remote-tracking branch 'origin/stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/538618 [14:02:51] (03PS1) 10Paladox: Merge remote-tracking branch 'origin/stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/538619 [14:03:04] (03Abandoned) 10Paladox: Merge remote-tracking branch 'origin/stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/538618 (owner: 10Paladox) [14:03:58] (03CR) 10jerkins-bot: [V: 04-1] Merge remote-tracking branch 'origin/stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/538619 (owner: 10Paladox) [14:04:04] huh [14:04:22] oh [14:15:52] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 3 others: Public schema.wikimedia.org endpoint for schema.svc - https://phabricator.wikimedia.org/T233630 (10Ottomata) [14:16:18] (03Abandoned) 10Filippo Giunchedi: logstash: stop relaying to central statsd [puppet] - 10https://gerrit.wikimedia.org/r/535148 (https://phabricator.wikimedia.org/T205870) (owner: 10Filippo Giunchedi) [14:17:39] (03PS8) 10Jcrespo: backups: Change file owner of bacula storage&director config [puppet] - 10https://gerrit.wikimedia.org/r/538239 (https://phabricator.wikimedia.org/T229209) [14:20:04] (03CR) 10Elukey: [C: 03+1] "Should be good!" [puppet] - 10https://gerrit.wikimedia.org/r/536645 (owner: 10Ayounsi) [14:22:04] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) [14:25:07] (03PS1) 10Alexandros Kosiaris: rsyslog: Support adding metadata to input, default to off [puppet] - 10https://gerrit.wikimedia.org/r/538626 (https://phabricator.wikimedia.org/T207200) [14:25:09] (03PS1) 10Alexandros Kosiaris: rsyslog: populate kubernetes configuration [puppet] - 10https://gerrit.wikimedia.org/r/538627 (https://phabricator.wikimedia.org/T207200) [14:26:05] (03PS1) 10Filippo Giunchedi: netops: add proxy for ripe-atlas-tools, fix atlas user [puppet] - 10https://gerrit.wikimedia.org/r/538628 (https://phabricator.wikimedia.org/T232711) [14:26:52] (03CR) 10jerkins-bot: [V: 04-1] netops: add proxy for ripe-atlas-tools, fix atlas user [puppet] - 10https://gerrit.wikimedia.org/r/538628 (https://phabricator.wikimedia.org/T232711) (owner: 10Filippo Giunchedi) [14:37:51] (03PS2) 10Filippo Giunchedi: netops: add proxy for ripe-atlas-tools, fix atlas user [puppet] - 10https://gerrit.wikimedia.org/r/538628 (https://phabricator.wikimedia.org/T232711) [14:42:02] godog: <3 awesome [14:43:50] (03PS3) 10Filippo Giunchedi: netops: add proxy for ripe-atlas-tools, fix atlas user [puppet] - 10https://gerrit.wikimedia.org/r/538628 (https://phabricator.wikimedia.org/T232711) [14:44:53] cdanis: <3 <3 also welcome back [14:44:57] thanks! [14:46:21] (03PS4) 10Filippo Giunchedi: netops: add proxy for ripe-atlas-tools, fix atlas user [puppet] - 10https://gerrit.wikimedia.org/r/538628 (https://phabricator.wikimedia.org/T232711) [14:48:26] (03PS1) 10Jbond: puppetmaster::frontend: add locale backend [puppet] - 10https://gerrit.wikimedia.org/r/538629 [14:50:20] (03CR) 10Filippo Giunchedi: [C: 03+2] "PCC https://puppet-compiler.wmflabs.org/compiler1001/18509/cumin1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/538628 (https://phabricator.wikimedia.org/T232711) (owner: 10Filippo Giunchedi) [14:50:30] (03PS5) 10Filippo Giunchedi: netops: add proxy for ripe-atlas-tools, fix atlas user [puppet] - 10https://gerrit.wikimedia.org/r/538628 (https://phabricator.wikimedia.org/T232711) [14:50:45] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster::frontend: add locale backend [puppet] - 10https://gerrit.wikimedia.org/r/538629 (owner: 10Jbond) [14:51:29] (03PS2) 10Jbond: puppetmaster::frontend: add locale backend [puppet] - 10https://gerrit.wikimedia.org/r/538629 (https://phabricator.wikimedia.org/T233203) [14:51:55] (03PS1) 10Jhedden: openstack: update glance keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538630 [14:52:03] (03CR) 10Filippo Giunchedi: [V: 03+2 C: 03+2] netops: add proxy for ripe-atlas-tools, fix atlas user [puppet] - 10https://gerrit.wikimedia.org/r/538628 (https://phabricator.wikimedia.org/T232711) (owner: 10Filippo Giunchedi) [14:56:44] (03PS1) 10Jhedden: openstack: update neutron keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538631 [15:00:17] (03CR) 10Jhedden: [C: 03+2] openstack: update neutron keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538631 (owner: 10Jhedden) [15:00:27] (03PS2) 10Jhedden: openstack: update neutron keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538631 [15:00:55] (03CR) 10Jhedden: [C: 03+2] openstack: update glance keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538630 (owner: 10Jhedden) [15:00:57] (03PS2) 10Jhedden: openstack: update glance keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538630 [15:01:05] (03PS3) 10Jhedden: openstack: update glance keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538630 [15:01:24] (03CR) 10Andrew Bogott: [C: 03+1] openstack: update glance keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538630 (owner: 10Jhedden) [15:01:44] (03CR) 10Andrew Bogott: [C: 03+1] openstack: update neutron keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538631 (owner: 10Jhedden) [15:02:34] (03CR) 10Jhedden: [C: 03+2] openstack: update glance keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538630 (owner: 10Jhedden) [15:02:42] (03PS3) 10Jbond: puppetmaster::frontend: add locale backend [puppet] - 10https://gerrit.wikimedia.org/r/538629 (https://phabricator.wikimedia.org/T233203) [15:02:48] (03PS4) 10Jhedden: openstack: update glance keystone config for newton [puppet] - 10https://gerrit.wikimedia.org/r/538630 [15:02:59] 10Operations, 10netops, 10observability, 10Patch-For-Review: Deploy ripe-atlas-tools for ad-hoc network tests - https://phabricator.wikimedia.org/T232711 (10fgiunchedi) [15:04:57] 10Operations, 10netops, 10observability, 10Patch-For-Review: Deploy ripe-atlas-tools for ad-hoc network tests - https://phabricator.wikimedia.org/T232711 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Completed! I've updated the ripe atlas documentation at https://wikitech.wikimedia.org/wiki/RIPE_At... [15:15:44] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/538629 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [15:15:46] (03CR) 10Giuseppe Lavagetto: Add envoy image with TLS termination. (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/399640 (owner: 10Giuseppe Lavagetto) [15:16:22] (03CR) 10Ayounsi: [C: 03+1] "New dimensions names are correct, let me know if there is anything else I should review." [puppet] - 10https://gerrit.wikimedia.org/r/538603 (https://phabricator.wikimedia.org/T229682) (owner: 10Elukey) [15:16:47] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Fixes to the envoy image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/538586 (owner: 10Giuseppe Lavagetto) [15:17:25] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Add envoy image with TLS termination. [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/399640 (owner: 10Giuseppe Lavagetto) [15:20:01] (03PS4) 10Jbond: puppetmaster::frontend: add locale backend [puppet] - 10https://gerrit.wikimedia.org/r/538629 (https://phabricator.wikimedia.org/T233203) [15:20:15] (03CR) 10Jbond: puppetmaster::frontend: add locale backend (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/538629 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [15:20:37] (03CR) 10Muehlenhoff: [C: 03+1] puppetmaster::frontend: add locale backend [puppet] - 10https://gerrit.wikimedia.org/r/538629 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [15:23:57] (03CR) 10Jbond: [C: 03+2] puppetmaster::frontend: add locale backend [puppet] - 10https://gerrit.wikimedia.org/r/538629 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [15:24:00] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, 10DC-Ops: analytics1045 - RAID failure and /var/lib/hadoop/data/j can't be mounted - https://phabricator.wikimedia.org/T232069 (10elukey) This host can keep running with one disk less, fixed it with https://gerrit.wikimedia.org/r/#/c/operations... [15:26:38] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) p:05Triage→03Normal [15:26:43] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 3 others: Public schema.wikimedia.org endpoint for schema.svc - https://phabricator.wikimedia.org/T233630 (10Ottomata) p:05Triage→03Normal [15:26:51] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 3 others: Public schema.wikimedia.org endpoint for schema.svc - https://phabricator.wikimedia.org/T233630 (10Ottomata) a:05Ottomata→03None [15:27:21] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) a:05Ottomata→03None [15:30:27] 10Operations, 10Analytics, 10Traffic: Images served with text/html content type - https://phabricator.wikimedia.org/T232679 (10Ottomata) 05Open→03Declined Nuria I think we can decline this yes? Doing so, feel free to reopen if I am wrong. [15:31:26] 10Operations, 10Icinga, 10netops, 10observability: scs monitoring missing in Icinga - https://phabricator.wikimedia.org/T233318 (10fgiunchedi) Sounds great! Adding ssh + ping for starters should be quite easy in puppet [15:33:02] (03PS1) 10Cwhite: profile: add mmutf8fix to kafka output actions [puppet] - 10https://gerrit.wikimedia.org/r/538642 [15:38:25] 10Operations, 10Analytics, 10Traffic: Cookies and misc services caching - https://phabricator.wikimedia.org/T232453 (10fdans) cc @Aklapper gasserandreas seems to be moving stuff around our board, could you take a look at it? Seems malicious. [15:50:30] (03PS1) 10Jhedden: openstack: add keystone URIs to neutron in newton [puppet] - 10https://gerrit.wikimedia.org/r/538646 [15:53:07] (03PS2) 10Jhedden: openstack: add keystone URIs to neutron in newton [puppet] - 10https://gerrit.wikimedia.org/r/538646 [15:54:14] (03CR) 10Andrew Bogott: [C: 03+1] openstack: add keystone URIs to neutron in newton [puppet] - 10https://gerrit.wikimedia.org/r/538646 (owner: 10Jhedden) [15:57:57] (03PS3) 10Jhedden: openstack: add keystone URIs to neutron in newton [puppet] - 10https://gerrit.wikimedia.org/r/538646 [16:00:26] (03PS4) 10Jhedden: openstack: add keystone URIs to neutron in newton [puppet] - 10https://gerrit.wikimedia.org/r/538646 [16:02:53] (03CR) 10Jhedden: [C: 03+2] "PCC results: https://puppet-compiler.wmflabs.org/compiler1002/18513/" [puppet] - 10https://gerrit.wikimedia.org/r/538646 (owner: 10Jhedden) [16:03:14] (03PS5) 10Jhedden: openstack: add keystone URIs to neutron in newton [puppet] - 10https://gerrit.wikimedia.org/r/538646 [16:15:52] RECOVERY - BGP status on cr2-eqiad is OK: BGP OK - up: 186, down: 5, shutdown: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [16:16:57] (03PS1) 10Urbanecm: Close bgwikinews, but allow sysops to edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538662 [16:17:02] (03CR) 10Filippo Giunchedi: [C: 03+1] "Good find! Should immediately mitigate at least some issues" [puppet] - 10https://gerrit.wikimedia.org/r/538642 (owner: 10Cwhite) [16:17:31] jouncebot: next [16:17:31] In 0 hour(s) and 42 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190923T1700) [16:17:39] jouncebot: now [16:17:39] No deployments scheduled for the next 0 hour(s) and 42 minute(s) [16:17:47] * Urbanecm is going to do a quick config deploy [16:18:19] (03PS2) 10Urbanecm: Close bgwikinews, but allow sysops to edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538662 [16:22:15] (03PS8) 10Andrew Bogott: Keystone/newton: install python-keystone [puppet] - 10https://gerrit.wikimedia.org/r/538445 [16:22:17] (03PS1) 10Andrew Bogott: Move cloudnet2002-dev and 2003-dev to Newton [puppet] - 10https://gerrit.wikimedia.org/r/538667 [16:22:41] (03CR) 10Urbanecm: [C: 03+2] Close bgwikinews, but allow sysops to edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538662 (owner: 10Urbanecm) [16:22:49] 10Operations, 10MediaWiki-Releasing, 10Parsoid: signatures were invalid: EXPKEYSIG 90E9F83F22250DD7 MediaWiki releases repository - https://phabricator.wikimedia.org/T225601 (10fgiunchedi) >>! In T225601#5513487, @Misterms735 wrote: > @fgiunchedi I still have the problem. Whe... [16:23:17] (03CR) 10Andrew Bogott: [C: 03+2] Move cloudnet2002-dev and 2003-dev to Newton [puppet] - 10https://gerrit.wikimedia.org/r/538667 (owner: 10Andrew Bogott) [16:23:22] (03PS3) 10Urbanecm: Close bgwikinews, but allow sysops to edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538662 (https://phabricator.wikimedia.org/T233322) [16:23:25] (03CR) 10Bstorm: [C: 03+2] tools-manifest: increase the timeout to 30s [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/536378 (https://phabricator.wikimedia.org/T220650) (owner: 10Bstorm) [16:23:32] (03CR) 10Urbanecm: [C: 03+2] Close bgwikinews, but allow sysops to edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538662 (https://phabricator.wikimedia.org/T233322) (owner: 10Urbanecm) [16:24:21] (03CR) 10Bstorm: [C: 03+2] tools-manifest: apply black formatting to webservicemonitor [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/536377 (owner: 10Bstorm) [16:24:37] (03Merged) 10jenkins-bot: Close bgwikinews, but allow sysops to edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538662 (https://phabricator.wikimedia.org/T233322) (owner: 10Urbanecm) [16:25:00] (03Merged) 10jenkins-bot: tools-manifest: apply black formatting to webservicemonitor [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/536377 (owner: 10Bstorm) [16:25:48] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10RobH) p:05Triage→03Normal [16:25:57] 10Operations, 10ops-codfw, 10media-storage: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10RobH) [16:26:03] !log mwscript createAndPromote.php --wiki=bgwikinews --sysop --force 'Martin Urbanec' - temporary (T233322) [16:26:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:26:07] T233322: Close and Delete/Redirect Bulgarian Wikinews - https://phabricator.wikimedia.org/T233322 [16:26:25] (03CR) 10jenkins-bot: Close bgwikinews, but allow sysops to edit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538662 (https://phabricator.wikimedia.org/T233322) (owner: 10Urbanecm) [16:26:41] (03CR) 10Bstorm: [C: 03+2] tools-manifest: increase the timeout to 30s [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/536378 (https://phabricator.wikimedia.org/T220650) (owner: 10Bstorm) [16:27:46] !log urbanecm@deploy1001 Synchronized dblists/closed.dblist: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 1/2) (duration: 00m 58s) [16:27:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:15] !log urbanecm@deploy1001 Synchronized wmf-config/VariantSettings.php: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 2/2) (duration: 00m 56s) [16:29:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:53] (03PS9) 10Andrew Bogott: Keystone/newton: install python-keystone [puppet] - 10https://gerrit.wikimedia.org/r/538445 [16:32:55] (03PS1) 10Andrew Bogott: Neutron: add newton manifests [puppet] - 10https://gerrit.wikimedia.org/r/538669 [16:33:10] !log Remove my temporary adminship on bgwikinews (T233322) [16:33:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:14] T233322: Close and Delete/Redirect Bulgarian Wikinews - https://phabricator.wikimedia.org/T233322 [16:34:01] (03CR) 10Andrew Bogott: [C: 03+2] Neutron: add newton manifests [puppet] - 10https://gerrit.wikimedia.org/r/538669 (owner: 10Andrew Bogott) [16:46:22] !log elukey@deploy1001 Started deploy [analytics/refinery@b99647e]: (no justification provided) [16:46:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:05] (03CR) 10Herron: [C: 03+1] "LTTM! Recommend adding a bug to track mitigations for this issue over the longer term." [puppet] - 10https://gerrit.wikimedia.org/r/538642 (owner: 10Cwhite) [16:53:46] !log elukey@deploy1001 Finished deploy [analytics/refinery@b99647e]: (no justification provided) (duration: 07m 24s) [16:53:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:04] gehel and onimisionipe: (Dis)respected human, time to deploy Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190923T1700). Please do the needful. [17:10:01] jouncebot: no deployment today [17:12:23] 10Operations, 10Analytics, 10User-Elukey: setup/install krb1001/WMF5173 - https://phabricator.wikimedia.org/T233141 (10RobH) p:05Triage→03Normal [17:13:59] 10Operations, 10ops-eqiad: apply hostname labels for krb1001/WMF5173 - https://phabricator.wikimedia.org/T233642 (10RobH) [17:19:02] RECOVERY - Check systemd state on an-coord1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:19:55] \o/ [17:21:44] (03PS1) 10RobH: krb dns entries [dns] - 10https://gerrit.wikimedia.org/r/538677 (https://phabricator.wikimedia.org/T233141) [17:22:30] (03CR) 10RobH: [C: 03+2] krb dns entries [dns] - 10https://gerrit.wikimedia.org/r/538677 (https://phabricator.wikimedia.org/T233141) (owner: 10RobH) [17:25:29] (03PS1) 10Phamhi: monitoring: Switch a collection of sms alerts to email-only for hpham [puppet] - 10https://gerrit.wikimedia.org/r/538678 [17:35:48] (03PS1) 10RobH: krb1001 install params [puppet] - 10https://gerrit.wikimedia.org/r/538681 (https://phabricator.wikimedia.org/T233141) [17:36:54] 10Puppet: occational puppet errors: Error 500 on SERVER: Server Error: Unsupported facts format - https://phabricator.wikimedia.org/T233643 (10jbond) [17:37:12] 10Operations, 10Puppet: occational puppet errors: Error 500 on SERVER: Server Error: Unsupported facts format - https://phabricator.wikimedia.org/T233643 (10jbond) p:05Triage→03Normal [17:40:12] 10Operations, 10User-DannyS712: 503 Backend fetch failed - https://phabricator.wikimedia.org/T233271 (10Zzuuzz) I haven't seen this error in the last couple of days. I suggest whatever was affecting checkuser has probably been resolved. [17:40:14] (03PS1) 10Jbond: base::puppet: add preferred_serialization_format = pson [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) [17:41:19] (03CR) 10Dzahn: "approval is on ticket. looks good except what Moritz already said" [puppet] - 10https://gerrit.wikimedia.org/r/538048 (https://phabricator.wikimedia.org/T233189) (owner: 10Volans) [17:41:33] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 3 others: Public EventGate endpoint for analytics event intake - https://phabricator.wikimedia.org/T233629 (10Ottomata) [17:42:29] (03PS1) 10Jbond: Revert "puppetmaster1003: promote puppetmaster1003 to a real puppetmaster backend" [puppet] - 10https://gerrit.wikimedia.org/r/538683 [17:42:42] (03PS2) 10Jbond: Revert "puppetmaster1003: promote puppetmaster1003 to a real puppetmaster backend" [puppet] - 10https://gerrit.wikimedia.org/r/538683 [17:43:46] (03PS3) 10Jbond: Revert "puppetmaster1003: promote puppetmaster1003 to a real puppetmaster backend" [puppet] - 10https://gerrit.wikimedia.org/r/538683 (https://phabricator.wikimedia.org/T233643) [17:44:25] (03PS1) 10Jbond: Revert "puppetmaster::frontend: add locale backend" [puppet] - 10https://gerrit.wikimedia.org/r/538684 (https://phabricator.wikimedia.org/T233643) [17:44:27] (03CR) 10jerkins-bot: [V: 04-1] Revert "puppetmaster1003: promote puppetmaster1003 to a real puppetmaster backend" [puppet] - 10https://gerrit.wikimedia.org/r/538683 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [17:46:09] (03PS4) 10Jbond: Revert "puppetmaster1003: promote puppetmaster1003 to a real puppetmaster backend" [puppet] - 10https://gerrit.wikimedia.org/r/538683 (https://phabricator.wikimedia.org/T233643) [17:50:19] (03PS1) 10Mholloway: MachineVision: Update active Handler config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538685 (https://phabricator.wikimedia.org/T233610) [17:52:02] (03CR) 10Mholloway: [V: 03+2 C: 03+2] MachineVision: Update active Handler config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538685 (https://phabricator.wikimedia.org/T233610) (owner: 10Mholloway) [17:52:27] (03CR) 10jenkins-bot: MachineVision: Update active Handler config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538685 (https://phabricator.wikimedia.org/T233610) (owner: 10Mholloway) [17:52:59] (03PS2) 10Dzahn: add fake ssl key for performance.discovery.wmnet, remove webperf [labs/private] - 10https://gerrit.wikimedia.org/r/538349 [17:53:28] (03CR) 10Jdlrobson: [C: 04-1] "You'll need to reference the svg in config as well." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538597 (owner: 10Ammarpad) [17:53:45] (03CR) 10Jbond: [C: 03+2] base::puppet: add preferred_serialization_format = pson [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [17:54:09] (03CR) 10Jbond: [C: 03+2] Revert "puppetmaster1003: promote puppetmaster1003 to a real puppetmaster backend" [puppet] - 10https://gerrit.wikimedia.org/r/538683 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [17:54:18] (03CR) 10Jbond: [C: 03+2] Revert "puppetmaster::frontend: add locale backend" [puppet] - 10https://gerrit.wikimedia.org/r/538684 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [17:54:20] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config (T233610) (duration: 00m 58s) [17:54:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:54:32] T233610: Internal error from MachineVision on action=info for files: argument to RandomWikidataIdHandler must be WikidataDepictsSetter, LabelResolver given - https://phabricator.wikimedia.org/T233610 [17:54:32] (03PS2) 10Jbond: Revert "puppetmaster::frontend: add locale backend" [puppet] - 10https://gerrit.wikimedia.org/r/538684 (https://phabricator.wikimedia.org/T233643) [17:57:06] (03CR) 10Mforns: [C: 03+1] "LGTM! I think this is the only thing needed." [puppet] - 10https://gerrit.wikimedia.org/r/538603 (https://phabricator.wikimedia.org/T229682) (owner: 10Elukey) [17:57:31] I see nothing schedule on the SWAT, I'm going to deploy a mediawiki core backport right now [17:59:39] (03CR) 10RobH: [C: 03+2] krb1001 install params [puppet] - 10https://gerrit.wikimedia.org/r/538681 (https://phabricator.wikimedia.org/T233141) (owner: 10RobH) [17:59:48] gilles: ping me once you're done, I have a security deployment to do [17:59:50] (03PS2) 10RobH: krb1001 install params [puppet] - 10https://gerrit.wikimedia.org/r/538681 (https://phabricator.wikimedia.org/T233141) [17:59:51] !log gilles@deploy1001 Synchronized php-1.34.0-wmf.23/maintenance/purgeList.php: T233095 Make purgeList.php use getCdnUrls() (duration: 00m 56s) [17:59:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:54] T233095: Significant mobile web performance regression observed at deployment of 1.34.0-wmf.22 - https://phabricator.wikimedia.org/T233095 [18:00:01] (03CR) 10Dzahn: [V: 03+2 C: 03+2] add fake ssl key for performance.discovery.wmnet, remove webperf [labs/private] - 10https://gerrit.wikimedia.org/r/538349 (owner: 10Dzahn) [18:00:04] MaxSem, RoanKattouw, Niharika, and Urbanecm: Time to snap out of that daydream and deploy Morning SWAT (Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190923T1800). [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:00:18] * Urbanecm is waiting for gilles 's permission to do the sec deploy [18:02:05] Urbanecm: done [18:02:09] thanks gilles [18:02:11] (03PS2) 10Dzahn: ssl: add certificate for performance.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/538348 [18:03:04] !log T233095 Purge articles for all wikis: foreachwiki maintenance/purgeList.php --all --verbose [18:03:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:04:13] (03CR) 10Dzahn: [C: 03+2] ssl: add certificate for performance.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/538348 (owner: 10Dzahn) [18:07:47] 10Operations, 10netops: asw2-d2-eqiad crash - https://phabricator.wikimedia.org/T233645 (10ayounsi) p:05Triage→03High [18:10:50] (03PS2) 10Jbond: base::puppet: add preferred_serialization_format = pson [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) [18:10:52] (03PS1) 10Jbond: puppetmaster::frontend: add locale backend and promote puppetmaster1003 [puppet] - 10https://gerrit.wikimedia.org/r/538686 (https://phabricator.wikimedia.org/T233203) [18:11:00] 10Operations, 10netops: asw2-d2-eqiad crash - https://phabricator.wikimedia.org/T233645 (10ayounsi) [18:13:12] !log Security deploy for T207094 [18:13:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:14:17] (03PS3) 10Mforns: analytics::refinery::job::druid_load: Add sanitization for netflow [puppet] - 10https://gerrit.wikimedia.org/r/535924 (https://phabricator.wikimedia.org/T229674) [18:14:25] (03CR) 10jerkins-bot: [V: 04-1] base::puppet: add preferred_serialization_format = pson [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [18:14:38] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster::frontend: add locale backend and promote puppetmaster1003 [puppet] - 10https://gerrit.wikimedia.org/r/538686 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [18:15:32] (03CR) 10Mforns: analytics::refinery::job::druid_load: Add sanitization for netflow (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/535924 (https://phabricator.wikimedia.org/T229674) (owner: 10Mforns) [18:16:00] (03CR) 10Urbanecm: [C: 03+2] New throttle rule for Wikimedia Chile editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538134 (https://phabricator.wikimedia.org/T233378) (owner: 10Ammarpad) [18:16:32] (03CR) 10Jbond: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [18:16:43] 10Operations, 10Analytics, 10User-Elukey: setup/install krb1001/WMF5173 - https://phabricator.wikimedia.org/T233141 (10RobH) [18:17:17] (03CR) 10Jhedden: [C: 03+1] monitoring: Switch a collection of sms alerts to email-only for hpham [puppet] - 10https://gerrit.wikimedia.org/r/538678 (owner: 10Phamhi) [18:17:20] (03PS1) 10Mholloway: MachineVision: Update active Handler config, take 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538687 (https://phabricator.wikimedia.org/T233610) [18:17:49] 10Operations, 10Analytics, 10User-Elukey: setup/install krb1001/WMF5173 - https://phabricator.wikimedia.org/T233141 (10RobH) a:05RobH→03elukey @ekuley, Assigning this to you since you initially requested the hardware. If someone else needs to implement, feel free to reassign or resolve this task as nee... [18:17:51] (03CR) 10Phamhi: [C: 03+2] monitoring: Switch a collection of sms alerts to email-only for hpham [puppet] - 10https://gerrit.wikimedia.org/r/538678 (owner: 10Phamhi) [18:19:10] (03Merged) 10jenkins-bot: New throttle rule for Wikimedia Chile editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538134 (https://phabricator.wikimedia.org/T233378) (owner: 10Ammarpad) [18:19:13] (03PS2) 10Phamhi: monitoring: Switch a collection of sms alerts to email-only for hpham [puppet] - 10https://gerrit.wikimedia.org/r/538678 [18:19:29] (03CR) 10jenkins-bot: New throttle rule for Wikimedia Chile editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538134 (https://phabricator.wikimedia.org/T233378) (owner: 10Ammarpad) [18:19:31] (03CR) 10Phamhi: [V: 03+2 C: 03+2] monitoring: Switch a collection of sms alerts to email-only for hpham [puppet] - 10https://gerrit.wikimedia.org/r/538678 (owner: 10Phamhi) [18:19:48] (03PS2) 10Urbanecm: Disallow indexing discussion and user pages on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538462 (https://phabricator.wikimedia.org/T233562) (owner: 10Ammarpad) [18:20:34] (03CR) 10Urbanecm: [C: 03+2] Disallow indexing discussion and user pages on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538462 (https://phabricator.wikimedia.org/T233562) (owner: 10Ammarpad) [18:21:14] (03CR) 10Mholloway: [C: 04-1] "Will deploy after SWAT is finished." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538687 (https://phabricator.wikimedia.org/T233610) (owner: 10Mholloway) [18:21:46] !log urbanecm@deploy1001 Synchronized wmf-config/throttle.php: SWAT: 6cb2042: New throttle rule for Wikimedia Chile editathon (T233378) (duration: 00m 56s) [18:21:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:51] T233378: Lift IP limit - WMCL Editathon 2019-09-27 - https://phabricator.wikimedia.org/T233378 [18:21:56] (03Merged) 10jenkins-bot: Disallow indexing discussion and user pages on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538462 (https://phabricator.wikimedia.org/T233562) (owner: 10Ammarpad) [18:22:09] 10Operations, 10MobileFrontend, 10Traffic, 10Readers-Web-Backlog (Tracking): Sections on some mobile pages are not collabsable - https://phabricator.wikimedia.org/T233373 (10Jdlrobson) This is likely a caching issue. We recently moved some code around and had reports that this might not have gone as smooth... [18:22:44] (03CR) 10jenkins-bot: Disallow indexing discussion and user pages on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538462 (https://phabricator.wikimedia.org/T233562) (owner: 10Ammarpad) [18:23:09] (03CR) 10Jbond: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/538686 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [18:23:49] !log urbanecm@deploy1001 Synchronized wmf-config/VariantSettings.php: SWAT: 8f3f070: Disallow indexing discussion and user pages on eswiki (T233562) (duration: 00m 56s) [18:23:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:52] T233562: Update wgNamespaceRobotPolicies on eswiki - https://phabricator.wikimedia.org/T233562 [18:25:50] (03PS6) 10Urbanecm: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [18:26:06] (03CR) 10Urbanecm: [C: 03+2] Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538597 (owner: 10Ammarpad) [18:26:17] (03PS7) 10Urbanecm: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [18:26:24] (03CR) 10Urbanecm: [C: 03+2] Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [18:26:26] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster::frontend: add locale backend and promote puppetmaster1003 [puppet] - 10https://gerrit.wikimedia.org/r/538686 (https://phabricator.wikimedia.org/T233203) (owner: 10Jbond) [18:27:10] (03Merged) 10jenkins-bot: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538597 (owner: 10Ammarpad) [18:27:22] (03PS2) 10Jbond: puppetmaster::frontend: add locale backend and promote puppetmaster1003 [puppet] - 10https://gerrit.wikimedia.org/r/538686 (https://phabricator.wikimedia.org/T233203) [18:27:27] (03Merged) 10jenkins-bot: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [18:28:07] (03CR) 10jenkins-bot: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538597 (owner: 10Ammarpad) [18:28:21] Urbanecm: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/538597/ is not going to work [18:28:37] it is missing the associated config change [18:28:47] Jdlrobson: well it's just in another patchset, see https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/538408/8/wmf-config/VariantSettings.php [18:28:54] ah okay :) [18:29:12] never mind then - i hadn't seen that cool [18:29:16] I should've mentioned that in my C+2 probably [18:30:03] (03CR) 10jenkins-bot: Add localized Wikipedia wordmark for szlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [18:30:15] thanks for the comment Jdlrobson ! [18:30:44] (03PS3) 10Jbond: puppetmaster::frontend: add locale backend and promote puppetmaster1003 [puppet] - 10https://gerrit.wikimedia.org/r/538686 (https://phabricator.wikimedia.org/T233203) [18:30:55] !log urbanecm@deploy1001 Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: d397f5f: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 56s) [18:30:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:30:58] T233104: Add localized Wikipedia wordmark to the Silesian (szl) mobile frontend - https://phabricator.wikimedia.org/T233104 [18:31:28] (03CR) 10Jdlrobson: [C: 04-1] "I stand corrected. It's been pointed out that the config change is at https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/538" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538597 (owner: 10Ammarpad) [18:31:54] (03CR) 10Jdlrobson: "Thanks for taking care of this Ammarpad !" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538408 (https://phabricator.wikimedia.org/T233104) (owner: 10Ammarpad) [18:32:24] (03CR) 10Muehlenhoff: base::puppet: add preferred_serialization_format = pson (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [18:34:17] (03CR) 10Mforns: Rsync analytics mediawiki history dumps to dumps.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538312 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [18:35:23] !log urbanecm@deploy1001 Synchronized wmf-config/VariantSettings.php: SWAT: be2f9d4: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 55s) [18:35:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:55] (03CR) 10Jbond: base::puppet: add preferred_serialization_format = pson (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [18:36:46] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [18:38:07] (03CR) 10Jbond: base::puppet: add preferred_serialization_format = pson (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [18:38:21] 10Operations, 10MobileFrontend, 10Traffic, 10Readers-Web-Backlog (Tracking): Sections on some mobile pages are not collabsable - https://phabricator.wikimedia.org/T233373 (10AntiCompositeNumber) It only occurred while logged-out. I'm not logged-out and on the mobile site often, so I haven't noticed it aga... [18:39:40] (03PS2) 10Dzahn: webperf: use performance.discovery instead webperf.discovery [puppet] - 10https://gerrit.wikimedia.org/r/538347 [18:40:11] (03CR) 10Joal: Rsync analytics mediawiki history dumps to dumps.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538312 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [18:41:46] (03PS1) 10Kosta Harlan: GrowthExperiments: Enable suggested edits feature flag on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538690 (https://phabricator.wikimedia.org/T232419) [18:41:48] (03PS1) 10Kosta Harlan: GrowthExperiments: Ensure suggested edits feature is off in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538691 (https://phabricator.wikimedia.org/T232419) [18:42:15] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/18514/" [puppet] - 10https://gerrit.wikimedia.org/r/538347 (owner: 10Dzahn) [18:42:52] (03CR) 10jerkins-bot: [V: 04-1] GrowthExperiments: Ensure suggested edits feature is off in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538691 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [18:43:11] 10Operations, 10MobileFrontend, 10Traffic, 10Readers-Web-Backlog (Tracking): Sections on some mobile pages are not collabsable - https://phabricator.wikimedia.org/T233373 (10Jdlrobson) I'd expect no reports after the end of this week. If so I think we can safely assume caching and close these tickets. Than... [18:43:53] 10Operations, 10ops-eqiad, 10DBA, 10Wikimedia-Incident: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Krenair) >>! In T233534#5514692, @Marostegui wrote: > I am starting to write the Incident Report: https://wikitech.wikimedia.org/wiki/Incident_documentation/201909... [18:45:05] (03PS1) 10Urbanecm: Fix: Move hiwikisource's extra namespace to extra namespace section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538692 [18:45:13] (03PS2) 10Kosta Harlan: GrowthExperiments: Ensure suggested edits feature is off in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538691 (https://phabricator.wikimedia.org/T232419) [18:45:28] (03CR) 10Urbanecm: [C: 03+2] Fix: Move hiwikisource's extra namespace to extra namespace section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538692 (owner: 10Urbanecm) [18:45:54] 10Operations, 10netops, 10Wikimedia-Incident: asw2-d2-eqiad crash - https://phabricator.wikimedia.org/T233645 (10ayounsi) [18:46:22] (03Merged) 10jenkins-bot: Fix: Move hiwikisource's extra namespace to extra namespace section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538692 (owner: 10Urbanecm) [18:46:41] (03CR) 10jenkins-bot: Fix: Move hiwikisource's extra namespace to extra namespace section [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538692 (owner: 10Urbanecm) [18:48:19] (03CR) 10Urbanecm: [C: 03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538691 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [18:48:20] !log urbanecm@deploy1001 Synchronized wmf-config/VariantSettings.php: SWAT: 37fcbdf: Fix: Move hiwikisource extra namespace to extra namespace section (duration: 00m 56s) [18:48:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:48:28] !log Morning SWAT done [18:48:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:40] (03CR) 10Mholloway: [V: 03+2 C: 03+2] MachineVision: Update active Handler config, take 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538687 (https://phabricator.wikimedia.org/T233610) (owner: 10Mholloway) [18:50:29] (03CR) 10jenkins-bot: MachineVision: Update active Handler config, take 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538687 (https://phabricator.wikimedia.org/T233610) (owner: 10Mholloway) [18:51:14] (03PS1) 10Muehlenhoff: Remove expiry date/contact for sukhe [puppet] - 10https://gerrit.wikimedia.org/r/538693 [18:51:43] (03PS2) 10Muehlenhoff: Remove expiry date/contact for sukhe [puppet] - 10https://gerrit.wikimedia.org/r/538693 [18:51:44] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config, take 2 (T233610) (duration: 00m 56s) [18:51:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:51:47] T233610: Internal error from MachineVision on action=info for files: argument to RandomWikidataIdHandler must be WikidataDepictsSetter, LabelResolver given - https://phabricator.wikimedia.org/T233610 [18:52:49] thank you moritzm! [18:54:12] (03CR) 10Muehlenhoff: [C: 03+2] Remove expiry date/contact for sukhe [puppet] - 10https://gerrit.wikimedia.org/r/538693 (owner: 10Muehlenhoff) [18:54:20] (03CR) 10Dzahn: "on webperf1001/2001 - envoy TLS proxy gets restarted after cert chain changed - backend not switched yet" [puppet] - 10https://gerrit.wikimedia.org/r/538347 (owner: 10Dzahn) [18:54:35] sorry for the indirect IRC highlight :-) [18:56:05] 10Operations, 10Core Platform Team, 10Performance-Team, 10TechCom-RFC, and 6 others: Serve Main Page of WMF wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Izno) This one will probably require a user notice before WMF rollout and maybe even a "do you guys want us to do this" ques... [18:56:13] 10Operations, 10ops-eqiad, 10DBA, 10Wikimedia-Incident: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Marostegui) >>! In T233534#5517195, @Krenair wrote: >>>! In T233534#5514692, @Marostegui wrote: >> I am starting to write the Incident Report: https://wikitech.wik... [18:56:15] (03Abandoned) 10Dzahn: hhvm: make it possible to let puppet completely remove hhvm [puppet] - 10https://gerrit.wikimedia.org/r/538108 (https://phabricator.wikimedia.org/T229792) (owner: 10Dzahn) [19:03:35] (03CR) 10Herron: [C: 03+1] base::puppet: add preferred_serialization_format = pson [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [19:04:56] 10Operations, 10Puppet, 10Patch-For-Review: occational puppet errors: Error 500 on SERVER: Server Error: Unsupported facts format - https://phabricator.wikimedia.org/T233643 (10jbond) [19:06:37] (03CR) 10Dzahn: [C: 03+1] "@Marostegui So Gerrit is on m2 and i see in puppet mariadb templates production-m2.sql.erb. In the Gerrit section there it uses IPs of dbp" [puppet] - 10https://gerrit.wikimedia.org/r/535966 (https://phabricator.wikimedia.org/T222391) (owner: 10Dzahn) [19:09:18] !log Going to deploy one more last-time patch [19:09:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:09:22] (03PS1) 10Urbanecm: Redefine hiwikisource extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538697 (https://phabricator.wikimedia.org/T233365) [19:09:31] (03PS2) 10Jhedden: openstack: configure eqiad1 keystone for apache wsgi [puppet] - 10https://gerrit.wikimedia.org/r/538267 (https://phabricator.wikimedia.org/T223907) [19:09:45] (03CR) 10Urbanecm: [C: 03+2] Redefine hiwikisource extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538697 (https://phabricator.wikimedia.org/T233365) (owner: 10Urbanecm) [19:10:10] (03CR) 10Jhedden: [C: 03+2] openstack: configure eqiad1 keystone for apache wsgi [puppet] - 10https://gerrit.wikimedia.org/r/538267 (https://phabricator.wikimedia.org/T223907) (owner: 10Jhedden) [19:10:41] (03CR) 10jerkins-bot: [V: 04-1] Redefine hiwikisource extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538697 (https://phabricator.wikimedia.org/T233365) (owner: 10Urbanecm) [19:11:01] (03CR) 10jerkins-bot: [V: 04-1] Redefine hiwikisource extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538697 (https://phabricator.wikimedia.org/T233365) (owner: 10Urbanecm) [19:11:32] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538697 (https://phabricator.wikimedia.org/T233365) (owner: 10Urbanecm) [19:11:42] (03CR) 10Dzahn: [C: 03+1] "should be good to go now. cert situation cleaned up." [puppet] - 10https://gerrit.wikimedia.org/r/535929 (https://phabricator.wikimedia.org/T210411) (owner: 10Dzahn) [19:12:11] (03PS3) 10Dzahn: ATS: switch webperf backends to TLS and discovery name [puppet] - 10https://gerrit.wikimedia.org/r/535929 (https://phabricator.wikimedia.org/T210411) [19:13:49] (03CR) 10Urbanecm: [C: 03+2] Redefine hiwikisource extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538697 (https://phabricator.wikimedia.org/T233365) (owner: 10Urbanecm) [19:14:50] 10Operations, 10ops-eqiad, 10DBA, 10Wikimedia-Incident: db1075 (s3 master) crashed - BBU failure - https://phabricator.wikimedia.org/T233534 (10Krenair) I'm wondering if an entry should be added under "Where did we get lucky?" along the lines of "I/We noticed this incident before SMS paging begun". Was th... [19:15:04] (03Merged) 10jenkins-bot: Redefine hiwikisource extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538697 (https://phabricator.wikimedia.org/T233365) (owner: 10Urbanecm) [19:15:33] (03CR) 10Paladox: "recheck" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/538619 (owner: 10Paladox) [19:16:30] (03CR) 10jenkins-bot: Redefine hiwikisource extra namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538697 (https://phabricator.wikimedia.org/T233365) (owner: 10Urbanecm) [19:16:42] !log urbanecm@deploy1001 Synchronized wmf-config/VariantSettings.php: 2a7a125: Redefine hiwikisource extra namespaces (T233365) (duration: 00m 57s) [19:16:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:45] T233365: Namespace definitions for Hindi Wikisource need to be in the ProofreadPage extension, not in VariantSettings.php - https://phabricator.wikimedia.org/T233365 [19:17:22] (03CR) 10jerkins-bot: [V: 04-1] Merge remote-tracking branch 'origin/stable-2.15' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/538619 (owner: 10Paladox) [19:19:44] 10Operations, 10serviceops: convert parsoid cluster from parsoid/JS to parsoid/PHP - https://phabricator.wikimedia.org/T233654 (10Dzahn) [19:29:20] 10Operations, 10Core Platform Team, 10Performance-Team, 10TechCom-RFC, and 6 others: Serve Main Page of WMF wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Krinkle) This is still an open RFC. Consultation with the community will be part of this RFC, including asking for input and... [19:32:03] (03PS2) 10Herron: admin: move Papaul from datacenter-ops to ops group [puppet] - 10https://gerrit.wikimedia.org/r/538048 (https://phabricator.wikimedia.org/T233189) (owner: 10Volans) [19:34:35] (03CR) 10Herron: admin: move Papaul from datacenter-ops to ops group (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538048 (https://phabricator.wikimedia.org/T233189) (owner: 10Volans) [19:41:55] 10Operations, 10ops-eqiad: apply hostname labels for krb1001/WMF5173 - https://phabricator.wikimedia.org/T233642 (10herron) p:05Triage→03Normal [19:43:09] 10Operations, 10User-DannyS712: 503 Backend fetch failed - https://phabricator.wikimedia.org/T233271 (10herron) p:05Triage→03Normal [19:47:16] (03CR) 10Krinkle: "@Thiemo This is actually no longer in use. It's handled at the cache proxy layer, which does handle XML-like responses already, but it has" [puppet] - 10https://gerrit.wikimedia.org/r/535860 (https://phabricator.wikimedia.org/T232615) (owner: 10Gilles) [19:49:19] (03CR) 10Dzahn: [C: 03+1] admin: move Papaul from datacenter-ops to ops group [puppet] - 10https://gerrit.wikimedia.org/r/538048 (https://phabricator.wikimedia.org/T233189) (owner: 10Volans) [19:52:14] (03CR) 10Dzahn: [C: 03+1] "already on root@ mail alias too" [puppet] - 10https://gerrit.wikimedia.org/r/538048 (https://phabricator.wikimedia.org/T233189) (owner: 10Volans) [19:53:12] 10Operations, 10User-DannyS712: 503 Backend fetch failed - https://phabricator.wikimedia.org/T233271 (10Aklapper) [19:56:07] (03CR) 10Mforns: Rsync analytics mediawiki history dumps to dumps.wikimedia.org (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/538312 (https://phabricator.wikimedia.org/T208612) (owner: 10Mforns) [19:56:23] 10Operations, 10Anti-Harassment, 10CheckUser, 10MediaWiki-User-management, 10Traffic: Users editing from 127.0.0.1 - https://phabricator.wikimedia.org/T233657 (10Anomie) I'm going to poke this at #Traffic, since it seems unlikely that 127.0.0.1 is supposed to be showing up in XFF there. Is 10.128.0.127 s... [19:57:53] !log T233657 ✔️ cdanis@cp4027.ulsfo.wmnet ~ 🕓🍵 sudo -i depool [19:57:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:56] T233657: Users editing from 127.0.0.1 - https://phabricator.wikimedia.org/T233657 [19:58:58] 10Operations, 10Anti-Harassment, 10CheckUser, 10MediaWiki-User-management, 10Traffic: Users editing from 127.0.0.1 - https://phabricator.wikimedia.org/T233657 (10CDanis) 10.128.0.127 is cp4027 which @Vgutierrez was using to experiment with ATS terminating TLS (see also T231627) I've depooled it for now,... [20:00:04] cscott, arlolra, subbu, bearND, halfak, and accraze: Your horoscope predicts another unfortunate Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190923T2000). [20:00:19] no parsoid deploy [20:04:57] 10Operations, 10Anti-Harassment, 10CheckUser, 10MediaWiki-User-management, 10Traffic: Users editing from 127.0.0.1 - https://phabricator.wikimedia.org/T233657 (10Anomie) >>! In T233657#5517441, @CDanis wrote: > I've depooled it for now, which should stop this. I confirm that new entries with 127.0.0.1 h... [20:07:49] 10Operations, 10Performance-Team, 10observability, 10serviceops: Ensure graphs used by Performance account for Varnish-to-ATS migration - https://phabricator.wikimedia.org/T233474 (10Krinkle) p:05Triage→03Normal a:03Krinkle [20:08:17] (03CR) 10Awight: "I also made a bad suggestion, to split the original patch. During today's EU SWAT, I scared myself out of the multiple-sync deployment be" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538597 (owner: 10Ammarpad) [20:09:06] 10Operations, 10Traffic, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10CDanis) I depooled cp4027 today when {T233657} surfaced. Gonna guess that the ATS TLS termination is missing a special client IP header that nginx knows to insert? [20:09:22] 10Operations, 10Anti-Harassment, 10CheckUser, 10MediaWiki-User-management, 10Traffic: Users editing from 127.0.0.1 - https://phabricator.wikimedia.org/T233657 (10CDanis) 05Open→03Resolved a:03CDanis [20:12:41] (03PS1) 10Ottomata: camus mediawiki_events - increase map tasks to 50 and run in essential queue [puppet] - 10https://gerrit.wikimedia.org/r/538701 [20:14:08] (03PS2) 10Ottomata: camus mediawiki_events - increase map tasks to 50 and run in essential queue [puppet] - 10https://gerrit.wikimedia.org/r/538701 [20:16:13] (03PS3) 10Ottomata: camus mediawiki_events - increase map tasks to 50 and run in essential queue [puppet] - 10https://gerrit.wikimedia.org/r/538701 [20:16:19] 10Operations, 10Analytics, 10Traffic: Publish tls related info to webrequest via varnish - https://phabricator.wikimedia.org/T233661 (10Nuria) [20:17:12] (03CR) 10Ottomata: [V: 03+2 C: 03+2] camus mediawiki_events - increase map tasks to 50 and run in essential queue [puppet] - 10https://gerrit.wikimedia.org/r/538701 (owner: 10Ottomata) [20:18:30] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/news (get In the News content for unsupported language (with aggregated=true)) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [20:19:33] (03PS1) 10Ottomata: Fix timer interval for camus::job { 'mediawiki_events [puppet] - 10https://gerrit.wikimedia.org/r/538702 [20:20:12] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/talk/{title} (Get structured talk page for enwiki Salt article) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [20:21:23] 10Operations, 10Wikimedia-Logstash: Logstash pipeline crashes on non-UTF8 log messages. - https://phabricator.wikimedia.org/T233662 (10colewhite) [20:21:39] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Fix timer interval for camus::job { 'mediawiki_events [puppet] - 10https://gerrit.wikimedia.org/r/538702 (owner: 10Ottomata) [20:21:40] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29 [20:23:12] (03PS2) 10Cwhite: profile: add mmutf8fix to kafka output actions [puppet] - 10https://gerrit.wikimedia.org/r/538642 (https://phabricator.wikimedia.org/T233662) [20:23:24] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [20:29:30] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Logstash pipeline crashes on non-UTF8 log messages. - https://phabricator.wikimedia.org/T233662 (10colewhite) There are a few options to consider. * Mitigation in MediaWiki -- Force all logging to be UTF-8 compliant. * Mitigation in the logging pipeline... [20:32:16] 10Operations, 10Anti-Harassment, 10CheckUser, 10MediaWiki-User-management, 10Traffic: Users editing from 127.0.0.1 (due to experimenting with ATS terminating TLS) - https://phabricator.wikimedia.org/T233657 (10Aklapper) [20:34:22] (03PS5) 1020after4: Set up scap target for deploying the phatality plugin into kibana [puppet] - 10https://gerrit.wikimedia.org/r/537240 (https://phabricator.wikimedia.org/T230752) [20:35:02] (03CR) 10jerkins-bot: [V: 04-1] Set up scap target for deploying the phatality plugin into kibana [puppet] - 10https://gerrit.wikimedia.org/r/537240 (https://phabricator.wikimedia.org/T230752) (owner: 1020after4) [20:36:40] (03PS6) 1020after4: Set up scap target for deploying the phatality plugin into kibana [puppet] - 10https://gerrit.wikimedia.org/r/537240 (https://phabricator.wikimedia.org/T230752) [20:46:54] (03CR) 10Paladox: "recheck" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/538619 (owner: 10Paladox) [20:48:49] (03PS1) 10Ottomata: Create new camus job for mediawiki analytics events [puppet] - 10https://gerrit.wikimedia.org/r/538704 [20:51:21] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler1002/18516/an-coord1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/538704 (owner: 10Ottomata) [20:51:23] (03CR) 10Ottomata: [C: 03+2] Create new camus job for mediawiki analytics events [puppet] - 10https://gerrit.wikimedia.org/r/538704 (owner: 10Ottomata) [20:54:27] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Apache configuration: SVGs served by MediaWiki aren't gzipped - https://phabricator.wikimedia.org/T232615 (10Krinkle) [20:54:54] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Enable gzip compression for interface icon SVGs served by MediaWiki - https://phabricator.wikimedia.org/T232615 (10Krinkle) [20:56:31] 10Operations, 10Wikimedia-Mailing-lists, 10Wikispore: Wikispore mailing list - https://phabricator.wikimedia.org/T232961 (10Tgr) >>! In T232961#5494391, @Pharos wrote: > While Wikimedia Space might be nice too as a supplement, it's still a niche platform, and the vast majority of potential Wikimedians intere... [20:57:36] (03PS1) 10Ottomata: Fix kafka.whitelist.topics for camus mediawiki_analytics_events job [puppet] - 10https://gerrit.wikimedia.org/r/538705 [20:57:51] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Fix kafka.whitelist.topics for camus mediawiki_analytics_events job [puppet] - 10https://gerrit.wikimedia.org/r/538705 (owner: 10Ottomata) [21:00:04] Reedy and sbassett: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190923T2100). [21:10:04] 10Operations, 10Fundraising-Backlog, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10EYener) [21:20:33] 10Operations, 10Fundraising-Backlog, 10SRE-Access-Requests: Banner History and page view data access for fundraising analysts - Jerrie and Erin - https://phabricator.wikimedia.org/T233636 (10EYener) Thanks for the link, @Ejegg! After reviewing the documentation and other resources, I believe @jkumalah and I... [21:22:11] (03Abandoned) 10Andrew Bogott: Keystone/Newton: create /var/lib/keystone [puppet] - 10https://gerrit.wikimedia.org/r/538444 (owner: 10Andrew Bogott) [21:22:27] (03PS10) 10Andrew Bogott: Keystone/newton: install python-keystone [puppet] - 10https://gerrit.wikimedia.org/r/538445 [21:22:29] (03PS1) 10Andrew Bogott: neutron-l3-agent: forward our routing hacks to Newton [puppet] - 10https://gerrit.wikimedia.org/r/538707 (https://phabricator.wikimedia.org/T233665) [21:23:27] (03CR) 10jerkins-bot: [V: 04-1] neutron-l3-agent: forward our routing hacks to Newton [puppet] - 10https://gerrit.wikimedia.org/r/538707 (https://phabricator.wikimedia.org/T233665) (owner: 10Andrew Bogott) [21:27:17] (03PS2) 10Catrope: GrowthExperiments: Enable suggested edits feature flag on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538690 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [21:27:29] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable suggested edits feature flag on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538690 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [21:27:48] (03PS3) 10Catrope: GrowthExperiments: Ensure suggested edits feature is off in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538691 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [21:28:13] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Ensure suggested edits feature is off in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538691 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [21:28:30] 10Operations, 10Wikimedia-Mailing-lists: Close wikimediameta-l mailing list - https://phabricator.wikimedia.org/T233666 (10MarcoAurelio) [21:28:36] (03Merged) 10jenkins-bot: GrowthExperiments: Enable suggested edits feature flag on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538690 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [21:28:54] (03CR) 10jenkins-bot: GrowthExperiments: Enable suggested edits feature flag on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538690 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [21:29:44] (03Merged) 10jenkins-bot: GrowthExperiments: Ensure suggested edits feature is off in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538691 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [21:30:57] (03CR) 10jenkins-bot: GrowthExperiments: Ensure suggested edits feature is off in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538691 (https://phabricator.wikimedia.org/T232419) (owner: 10Kosta Harlan) [21:32:20] !log catrope@deploy1001 Synchronized wmf-config/VariantSettings.php: Syncing no-op change for T232419 (duration: 00m 57s) [21:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:24] T232419: Newcomer tasks: suggested edits initiation and overlays - https://phabricator.wikimedia.org/T232419 [21:36:14] (03PS1) 10MarcoAurelio: Follow-up 8f3f0705baed: add missing namespace for eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538709 (https://phabricator.wikimedia.org/T233562) [21:41:46] 10Operations, 10Traffic, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) Thanks for the depool @CDanis, so it looks like a combination of things: nginx/ATS set `X-Client-IP` and `X-Forwarded-For`. The behavior for `X-Client-IP` is t... [21:51:36] (03PS7) 1020after4: Set up scap target for deploying the phatality plugin into kibana [puppet] - 10https://gerrit.wikimedia.org/r/537240 (https://phabricator.wikimedia.org/T230752) [21:58:42] 10Operations, 10Traffic: varnish-fe is handling X-Forwarded-For differently when ats is in front of it - https://phabricator.wikimedia.org/T233667 (10Vgutierrez) [21:58:55] 10Operations, 10Traffic: varnish-fe is handling X-Forwarded-For differently when ats is in front of it - https://phabricator.wikimedia.org/T233667 (10Vgutierrez) p:05Triage→03High [21:59:32] 10Operations, 10Traffic: varnish-fe is handling X-Forwarded-For differently when ats is in front of it - https://phabricator.wikimedia.org/T233667 (10Vgutierrez) [21:59:36] 10Operations, 10Anti-Harassment, 10CheckUser, 10MediaWiki-User-management, 10Traffic: Users editing from 127.0.0.1 (due to experimenting with ATS terminating TLS) - https://phabricator.wikimedia.org/T233657 (10Vgutierrez) [22:22:57] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:27:51] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:32:45] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:34:41] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [22:35:41] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [22:35:41] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is CRITICAL: cluster=cache_text site=eqsin https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [22:36:21] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [22:37:19] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [22:37:19] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [22:37:41] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [22:41:17] (03PS1) 10Cwhite: change EndpointMetrics from static to instance variable [software/service-checker] - 10https://gerrit.wikimedia.org/r/538711 [22:42:16] (03CR) 10jerkins-bot: [V: 04-1] change EndpointMetrics from static to instance variable [software/service-checker] - 10https://gerrit.wikimedia.org/r/538711 (owner: 10Cwhite) [22:44:02] (03PS2) 10Cwhite: change EndpointMetrics from static to instance variable [software/service-checker] - 10https://gerrit.wikimedia.org/r/538711 [22:51:40] (03CR) 10Cwhite: [C: 03+1] base::puppet: add preferred_serialization_format = pson [puppet] - 10https://gerrit.wikimedia.org/r/538682 (https://phabricator.wikimedia.org/T233643) (owner: 10Jbond) [22:52:01] 10Operations, 10Core Platform Team, 10Performance-Team, 10TechCom-RFC, and 6 others: Serve Main Page of WMF wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Izno) >>! In T120085#5517353, @Krinkle wrote: > This is still an open RFC. [snip] Totally missed this was in the RFCs bucke... [23:00:04] MaxSem, RoanKattouw, Niharika, and Urbanecm: How many deployers does it take to do Evening SWAT (Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190923T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:43:02] !log dzahn@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset [23:43:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:43:29] !log dzahn@cumin1001 Updating IPMI password on 92 hosts - dzahn@cumin1001 [23:43:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:26] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0) [23:46:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:13] !log dzahn@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset [23:50:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:27] !log dzahn@cumin1001 Updating IPMI password on 92 hosts - dzahn@cumin1001 [23:50:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:53:35] (03CR) 10Bstorm: "Documented this stuff here: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Toolforge_Kubernetes_RB" [puppet] - 10https://gerrit.wikimedia.org/r/537755 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [23:54:16] (03CR) 10Bstorm: "Documented largely here https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/Toolforge_Kubernetes_RBAC_a" [puppet] - 10https://gerrit.wikimedia.org/r/537732 (https://phabricator.wikimedia.org/T227290) (owner: 10Bstorm) [23:56:52] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0) [23:56:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log