[00:07:47] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 53 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [00:36:55] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 45 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [00:40:49] PROBLEM - Host mw1310.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [00:46:49] RECOVERY - Host mw1310.mgmt is UP: PING WARNING - Packet loss = 66%, RTA = 0.77 ms [00:47:59] 10Operations, 10Traffic, 10Goal, 10Performance-Team (Radar), 10Wikimedia-Incident: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10Krinkle) 05Resolved→03Open Re-opening and tracking as on-going perf incident per the above. As @Gilles mentioned, it would help if we can at least isolate/... [00:48:02] 10Operations, 10Traffic, 10HTTPS, 10Security: Investigate our mitigation strategy for HTTPS response length attacks - https://phabricator.wikimedia.org/T92298 (10Krinkle) [00:52:12] 10Operations, 10Performance-Team, 10Traffic, 10Goal, 10Wikimedia-Incident: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10Krinkle) [00:52:32] 10Operations, 10Traffic, 10Goal, 10Performance-Team (Radar), 10Wikimedia-Incident: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10Krinkle) [00:53:39] 10Operations, 10Research, 10Patch-For-Review: recommendation api's test on scb nodes are flapping - https://phabricator.wikimedia.org/T247732 (10bmansurov) @elukey With the above patch some 503 errors will be logged correctly with informative message. I'll deploy the patch as soon as possible. I also figure... [00:55:48] 10Operations, 10Traffic, 10Patch-For-Review, 10Wikimedia-Incident: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 (10Krinkle) [01:44:55] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 51 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [01:57:58] (03CR) 10Andrew Bogott: [C: 03+2] openstack::keystone::cleanup: remove all timers [puppet] - 10https://gerrit.wikimedia.org/r/589877 (https://phabricator.wikimedia.org/T243418) (owner: 10Andrew Bogott) [02:02:33] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 43 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [02:03:43] PROBLEM - SSH on mw1310.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:31:31] (03PS1) 10Andrew Bogott: rabbitmq: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/590813 (https://phabricator.wikimedia.org/T249941) [02:31:33] (03PS1) 10Andrew Bogott: openstack::designate::service: remove some unused internal variables [puppet] - 10https://gerrit.wikimedia.org/r/590814 (https://phabricator.wikimedia.org/T249941) [02:31:35] (03PS1) 10Andrew Bogott: Designate: remove use of nova_controller and nova_controller_standby [puppet] - 10https://gerrit.wikimedia.org/r/590815 (https://phabricator.wikimedia.org/T249941) [02:36:50] (03PS3) 10Andrew Bogott: Keystone: remove openstack::keystone::cleanup [puppet] - 10https://gerrit.wikimedia.org/r/589876 (https://phabricator.wikimedia.org/T243418) [02:44:27] (03PS2) 10Andrew Bogott: Designate: remove use of nova_controller and nova_controller_standby [puppet] - 10https://gerrit.wikimedia.org/r/590815 (https://phabricator.wikimedia.org/T249941) [02:44:30] (03CR) 10Andrew Bogott: [C: 03+2] Keystone: remove openstack::keystone::cleanup [puppet] - 10https://gerrit.wikimedia.org/r/589876 (https://phabricator.wikimedia.org/T243418) (owner: 10Andrew Bogott) [02:50:44] (03CR) 10Andrew Bogott: "https://puppet-compiler.wmflabs.org/compiler1003/22055/cloudservices1003.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/590815 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [03:11:39] (03PS1) 10Andrew Bogott: designate firewall: use openstack_controllers instead of nova-specific things [puppet] - 10https://gerrit.wikimedia.org/r/590822 (https://phabricator.wikimedia.org/T249941) [03:11:40] (03PS1) 10Andrew Bogott: Openstack env scripts: use keystone_api_fqdn [puppet] - 10https://gerrit.wikimedia.org/r/590823 (https://phabricator.wikimedia.org/T249941) [03:11:43] (03PS1) 10Andrew Bogott: nova common: use keystone_api_fqdn instead of keystone_host [puppet] - 10https://gerrit.wikimedia.org/r/590824 (https://phabricator.wikimedia.org/T249941) [03:15:35] (03CR) 10jerkins-bot: [V: 04-1] nova common: use keystone_api_fqdn instead of keystone_host [puppet] - 10https://gerrit.wikimedia.org/r/590824 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [03:15:56] (03CR) 10jerkins-bot: [V: 04-1] Openstack env scripts: use keystone_api_fqdn [puppet] - 10https://gerrit.wikimedia.org/r/590823 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [03:22:07] (03PS2) 10Andrew Bogott: Openstack env scripts: use keystone_api_fqdn [puppet] - 10https://gerrit.wikimedia.org/r/590823 (https://phabricator.wikimedia.org/T249941) [03:22:09] (03PS2) 10Andrew Bogott: nova common: use keystone_api_fqdn instead of keystone_host [puppet] - 10https://gerrit.wikimedia.org/r/590824 (https://phabricator.wikimedia.org/T249941) [03:29:18] (03CR) 10Andrew Bogott: [C: 03+2] rabbitmq: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/590813 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [03:33:17] (03CR) 10Andrew Bogott: [C: 03+2] "no-op, as hoped. https://puppet-compiler.wmflabs.org/compiler1001/22059/" [puppet] - 10https://gerrit.wikimedia.org/r/590814 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [03:38:35] (03PS3) 10Andrew Bogott: Designate: remove use of nova_controller and nova_controller_standby [puppet] - 10https://gerrit.wikimedia.org/r/590815 (https://phabricator.wikimedia.org/T249941) [03:42:30] (03CR) 10Andrew Bogott: [C: 03+2] Designate: remove use of nova_controller and nova_controller_standby [puppet] - 10https://gerrit.wikimedia.org/r/590815 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [03:44:14] (03CR) 10Andrew Bogott: [C: 03+2] designate firewall: use openstack_controllers instead of nova-specific things [puppet] - 10https://gerrit.wikimedia.org/r/590822 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [03:45:45] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [03:45:47] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [03:47:33] (03CR) 10Andrew Bogott: [C: 03+2] Openstack env scripts: use keystone_api_fqdn [puppet] - 10https://gerrit.wikimedia.org/r/590823 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [03:47:39] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [03:47:41] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [03:50:05] PROBLEM - SSH on ganeti1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [03:51:28] (03CR) 10Andrew Bogott: [C: 03+2] nova common: use keystone_api_fqdn instead of keystone_host [puppet] - 10https://gerrit.wikimedia.org/r/590824 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [04:00:41] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [04:00:43] PROBLEM - BFD status on cr1-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [04:02:31] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [04:02:35] RECOVERY - BFD status on cr1-eqsin is OK: OK: UP: 4 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [04:08:37] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 51 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:20:21] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 50 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:39:29] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 52 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [04:47:25] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [04:47:33] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [04:49:21] PROBLEM - BFD status on cr1-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [04:51:15] RECOVERY - BFD status on cr1-eqsin is OK: OK: UP: 4 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [04:51:19] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [04:55:00] !log ariel@deploy1001 Started deploy [dumps/dumps@b813c8a]: no private table dumps, check for existence of 7z,bz2 page content files before dumping, various unit tests [04:55:04] !log ariel@deploy1001 Finished deploy [dumps/dumps@b813c8a]: no private table dumps, check for existence of 7z,bz2 page content files before dumping, various unit tests (duration: 00m 04s) [04:55:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:55:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:56:47] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [04:57:09] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 38 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [05:06:09] RECOVERY - SSH on mw1310.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:12:14] (03CR) 10Muehlenhoff: [C: 03+2] Add CNAME for idp-test.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/589583 (https://phabricator.wikimedia.org/T233930) (owner: 10Muehlenhoff) [05:15:56] (03PS2) 10Giuseppe Lavagetto: mediawiki::php::admin: allow inspecting ini values [puppet] - 10https://gerrit.wikimedia.org/r/589541 [05:16:49] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki::php::admin: allow inspecting ini values [puppet] - 10https://gerrit.wikimedia.org/r/589541 (owner: 10Giuseppe Lavagetto) [05:17:01] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] mediawiki::php::admin: allow inspecting ini values [puppet] - 10https://gerrit.wikimedia.org/r/589541 (owner: 10Giuseppe Lavagetto) [05:22:54] (03PS4) 10Muehlenhoff: Fix installation of graphite-web on Buster [puppet] - 10https://gerrit.wikimedia.org/r/589599 (https://phabricator.wikimedia.org/T247963) [05:24:54] (03CR) 10Muehlenhoff: [C: 03+2] Fix installation of graphite-web on Buster [puppet] - 10https://gerrit.wikimedia.org/r/589599 (https://phabricator.wikimedia.org/T247963) (owner: 10Muehlenhoff) [05:24:58] (03PS1) 10Giuseppe Lavagetto: mediawiki::php::admin: fix function name, output newlines [puppet] - 10https://gerrit.wikimedia.org/r/590852 [05:25:25] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] mediawiki::php::admin: fix function name, output newlines [puppet] - 10https://gerrit.wikimedia.org/r/590852 (owner: 10Giuseppe Lavagetto) [05:28:08] (03PS2) 10Muehlenhoff: Remove jessie support from osm/maps [puppet] - 10https://gerrit.wikimedia.org/r/583954 [05:28:45] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:31:08] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "This is a matter of personal preference - I prefer you to have to add a "." at the end of your cli command than having people mistakenly r" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/589710 (owner: 10Hashar) [05:32:27] PROBLEM - BFD status on cr1-eqsin is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [05:32:31] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:34:21] RECOVERY - BFD status on cr1-eqsin is OK: OK: UP: 4 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [05:36:07] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:38:01] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:44:29] (03CR) 10Muehlenhoff: [C: 03+2] Remove jessie support from osm/maps [puppet] - 10https://gerrit.wikimedia.org/r/583954 (owner: 10Muehlenhoff) [05:50:22] !log Deploy schema change on s8 codfw - lag will show up T250060 [05:50:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:50:28] T250060: tl_namespace index on templatelinks is unique only in s8 - https://phabricator.wikimedia.org/T250060 [05:51:19] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:51:41] RECOVERY - SSH on ganeti1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:53:09] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:53:55] !log Deploy schema change on s8 eqiad hosts T250060 [05:54:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:54:50] !log rolling restart of ats-tls in cp[3052,3054,3056,3058,3060,4028,4029,4030,4031,4032] - T249335 [05:54:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:54:55] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [05:57:14] (03CR) 10Muehlenhoff: [C: 03+2] Add acmechief config for idp-test [puppet] - 10https://gerrit.wikimedia.org/r/589582 (https://phabricator.wikimedia.org/T233930) (owner: 10Muehlenhoff) [06:04:07] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:04:17] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:05:19] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 58 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:07:49] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:07:59] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:11:09] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 50 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [06:13:33] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:15:07] RECOVERY - Check that envoy is running on idp-test2001 is OK: OK - envoyproxy.service is active https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy [06:15:25] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:16:21] !log installing bash updates on jessie [06:16:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:22:52] (03PS1) 10Marostegui: check_depooled: Include x1 [software] - 10https://gerrit.wikimedia.org/r/590859 [06:22:53] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:23:58] (03PS2) 10Marostegui: wmnet: Remove production dns entry for dbproxy1011 [dns] - 10https://gerrit.wikimedia.org/r/589186 (https://phabricator.wikimedia.org/T249590) [06:24:15] (03PS2) 10Marostegui: mariadb: Decommission dbproxy1011 [puppet] - 10https://gerrit.wikimedia.org/r/589185 (https://phabricator.wikimedia.org/T249590) [06:24:27] (03CR) 10Marostegui: [C: 03+2] check_depooled: Include x1 [software] - 10https://gerrit.wikimedia.org/r/590859 (owner: 10Marostegui) [06:25:07] !log installing libxdmcp security updates on jessie [06:25:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:29:44] (03PS1) 10Muehlenhoff: library hint for libxdmcp [puppet] - 10https://gerrit.wikimedia.org/r/590860 [06:30:13] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:33:56] mmm is there maintenance for --^ ? [06:34:19] PROBLEM - Host mw1307.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [06:35:25] so it is the Telia transit between cr1-codfw to cr1-eqsin [06:35:58] (03CR) 10Muehlenhoff: [C: 03+2] library hint for libxdmcp [puppet] - 10https://gerrit.wikimedia.org/r/590860 (owner: 10Muehlenhoff) [06:36:44] I don't see anything in gcal or emails [06:37:47] ah no Arzhel is already on it :) [06:39:08] elukey: Opening a task as we speak, I have a thread with telia support since saturday 8am... [06:39:21] noc is CCed too [06:40:19] RECOVERY - Host mw1307.mgmt is UP: PING WARNING - Packet loss = 90%, RTA = 0.79 ms [06:40:56] XioNoX: <3 [06:41:07] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 6/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:41:15] !log execute find -mtime +30 -delete in /var/log/airflow/scheduler on an-airflow1001 to free space [06:41:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:59] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:44:59] !log installing python2.7 security updates on jessie [06:45:01] PROBLEM - DPKG on scb2004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [06:45:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:24] (03PS1) 10RhinosF1: Add 'media.api.aucklandmuseum.com' to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590983 [06:55:08] (03CR) 10Giuseppe Lavagetto: [C: 03+2] safe-service-restart: formatting fixed [puppet] - 10https://gerrit.wikimedia.org/r/588960 (owner: 10Giuseppe Lavagetto) [06:58:13] PROBLEM - BFD status on cr1-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [07:00:12] (03PS2) 10Giuseppe Lavagetto: safe-service-restart: allow for a grace period after depooling [puppet] - 10https://gerrit.wikimedia.org/r/588961 [07:01:53] RECOVERY - BFD status on cr1-codfw is OK: OK: UP: 13 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [07:04:43] (03PS2) 10RhinosF1: Add 'media.api.aucklandmuseum.com' to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590983 [07:05:03] (03PS3) 10RhinosF1: Add 'media.api.aucklandmuseum.com' to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590983 (https://phabricator.wikimedia.org/T250646) [07:15:51] RECOVERY - DPKG on scb2004 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg [07:20:58] !log Re add tl_namespace index to db1104 and db1092 - T250060 [07:21:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:21:04] T250060: tl_namespace index on templatelinks is unique only in s8 - https://phabricator.wikimedia.org/T250060 [07:22:18] <_joe_> !log restarting php-fpm on the eqiad appservers to pick up the new max_execution_time [07:22:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:07] (03Abandoned) 10Giuseppe Lavagetto: services_proxy: reduce the number of requests per connection [puppet] - 10https://gerrit.wikimedia.org/r/587637 (owner: 10Giuseppe Lavagetto) [07:29:25] (03PS1) 10Vgutierrez: Release 8.0.7-rc0-1wm3asan [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/590993 (https://phabricator.wikimedia.org/T249335) [07:33:38] 10Operations, 10ops-eqiad: db1096 management interface unresposive remotelly, likely connectivity issue - https://phabricator.wikimedia.org/T250652 (10jcrespo) [07:36:07] 10Operations, 10ops-eqiad: db1096 management interface unresposive remotelly, likely connectivity issue - https://phabricator.wikimedia.org/T250652 (10jcrespo) [07:41:03] elukey: https://phabricator.wikimedia.org/T250653 [07:46:05] ACKNOWLEDGEMENT - SSH on db1096.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds Marostegui T250652 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [07:46:05] nice! [07:46:50] 10Operations, 10ops-eqiad: db1096 management interface unresposive remotelly, likely connectivity issue - https://phabricator.wikimedia.org/T250652 (10Marostegui) p:05Triage→03Medium For the record this is an s5 slave. [07:47:02] (03PS1) 10Jcrespo: mariadb-backups: Add s2, x1 to db1095 (eqiad backup source) [puppet] - 10https://gerrit.wikimedia.org/r/590995 (https://phabricator.wikimedia.org/T250602) [07:47:09] 10Operations, 10ops-eqiad, 10DBA: db1096 management interface unresposive remotelly, likely connectivity issue - https://phabricator.wikimedia.org/T250652 (10Marostegui) [07:53:04] (03CR) 10Marostegui: [C: 03+1] "Space on db1095 looks good and I also checked db1140.yaml to make sure those rae the ones" [puppet] - 10https://gerrit.wikimedia.org/r/590995 (https://phabricator.wikimedia.org/T250602) (owner: 10Jcrespo) [07:58:19] PROBLEM - PHP opcache health on mw1294 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [08:00:09] RECOVERY - PHP opcache health on mw1294 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [08:03:26] (03PS2) 10Jcrespo: mariadb-backups: Add s2, x1 to db1095 (eqiad backup source) [puppet] - 10https://gerrit.wikimedia.org/r/590995 (https://phabricator.wikimedia.org/T250602) [08:06:37] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Add s2, x1 to db1095 (eqiad backup source) [puppet] - 10https://gerrit.wikimedia.org/r/590995 (https://phabricator.wikimedia.org/T250602) (owner: 10Jcrespo) [08:09:44] !log restarting s3 instance on db1095 to reduce its buffer pool T250602 [08:09:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:51] T250602: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 [08:14:04] !log Remove img_deleted column from db1089 (enwiki), db1081 (commonswiki, db1111 (wikidatawiki) - T250055 [08:14:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:10] T250055: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 [08:16:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1089', diff saved to https://phabricator.wikimedia.org/P11017 and previous config saved to /var/cache/conftool/dbconfig/20200420-081623-marostegui.json [08:16:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1081', diff saved to https://phabricator.wikimedia.org/P11018 and previous config saved to /var/cache/conftool/dbconfig/20200420-081911-marostegui.json [08:19:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'Temporary pool db1097:3314 into API', diff saved to https://phabricator.wikimedia.org/P11019 and previous config saved to /var/cache/conftool/dbconfig/20200420-082019-marostegui.json [08:20:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:43] (03PS1) 10Jcrespo: mariadb-backups: Move s2, x1 eqiad backups to db1095 [puppet] - 10https://gerrit.wikimedia.org/r/590999 (https://phabricator.wikimedia.org/T250602) [08:22:46] (03PS1) 10Dzahn: tlsproxy::envoy: allow limiting firewall srange [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) [08:24:00] (03CR) 10Marostegui: [C: 03+1] mariadb-backups: Move s2, x1 eqiad backups to db1095 [puppet] - 10https://gerrit.wikimedia.org/r/590999 (https://phabricator.wikimedia.org/T250602) (owner: 10Jcrespo) [08:25:03] (03CR) 10jerkins-bot: [V: 04-1] tlsproxy::envoy: allow limiting firewall srange [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) (owner: 10Dzahn) [08:25:22] (03PS2) 10Dzahn: tlsproxy::envoy: allow limiting firewall srange [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) [08:26:57] (03PS3) 10Dzahn: tlsproxy::envoy: allow limiting firewall srange [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) [08:29:13] (03PS2) 10Dzahn: ganeti: add monitoring for gnt-rapi daemon process [puppet] - 10https://gerrit.wikimedia.org/r/589608 [08:30:03] (03PS2) 10Vgutierrez: Release 8.0.7-rc0-1wm3asan [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/590993 (https://phabricator.wikimedia.org/T249335) [08:31:04] (03CR) 10Dzahn: "compiler looks good to me, comparing contint to mwdebug" [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) (owner: 10Dzahn) [08:32:14] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Rebuild helmfile for buster-wikimedia - https://phabricator.wikimedia.org/T250479 (10JMeybohm) a:03JMeybohm [08:33:03] (03CR) 10jerkins-bot: [V: 04-1] Release 8.0.7-rc0-1wm3asan [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/590993 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez) [08:34:56] (03CR) 10Dzahn: "do you want to amend it to address Subbu's comment?" [puppet] - 10https://gerrit.wikimedia.org/r/577656 (owner: 10C. Scott Ananian) [08:35:55] !log imported helmfile 0.66.0-1+deb10u1 to main for buster-wikimedia [08:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:34] (03PS3) 10Vgutierrez: Release 8.0.7-rc0-1wm3asan [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/590993 (https://phabricator.wikimedia.org/T249335) [08:37:03] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Rebuild helmfile for buster-wikimedia - https://phabricator.wikimedia.org/T250479 (10JMeybohm) 05Open→03Resolved imported helmfile 0.66.0-1+deb1... [08:37:08] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10JMeybohm) [08:40:07] (03PS4) 10Dzahn: tlsproxy::envoy: allow limiting firewall srange [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) [08:42:53] (03PS5) 10Dzahn: tlsproxy::envoy: allow limiting firewall srange [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) [08:43:05] (03PS6) 10Dzahn: tlsproxy::envoy: allow limiting firewall srange [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) [08:45:37] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:46:49] !log restart ats-tls in cp3064 - T249335 [08:46:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:55] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [08:47:59] (03CR) 10Vgutierrez: [C: 03+1] Add metric 'purged_udp_bytes_read_total' [software/purged] - 10https://gerrit.wikimedia.org/r/589470 (owner: 10Ema) [08:49:01] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22663 bytes in 0.274 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:49:17] 10Operations, 10ops-eqiad, 10DBA: db1096 management interface unresposive remotelly, likely connectivity issue - https://phabricator.wikimedia.org/T250652 (10jcrespo) I saw a few hosts flop down and up its host up status on icinga, all on A6: in additiona to db1096.mgmt, mw1311.mgmt and ganeti1006.mgmt CC @a... [08:50:16] (03CR) 10Filippo Giunchedi: [C: 03+1] "Yup, looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/589703 (https://phabricator.wikimedia.org/T250401) (owner: 10CDanis) [08:50:21] (03CR) 10Giuseppe Lavagetto: [C: 03+1] tlsproxy::envoy: allow limiting firewall srange (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) (owner: 10Dzahn) [08:53:15] PROBLEM - SSH on mw1311.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [08:53:48] (03CR) 10Dzahn: tlsproxy::envoy: allow limiting firewall srange (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) (owner: 10Dzahn) [08:54:09] (03PS1) 10Elukey: profile::analytics::search::airflow: clean up old logs after 30d [puppet] - 10https://gerrit.wikimedia.org/r/591003 [09:00:57] (03CR) 10Elukey: [C: 03+2] profile::analytics::search::airflow: clean up old logs after 30d [puppet] - 10https://gerrit.wikimedia.org/r/591003 (owner: 10Elukey) [09:07:21] (03PS1) 10Filippo Giunchedi: admin: add awight and wmde-fish to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/591005 (https://phabricator.wikimedia.org/T250364) [09:07:28] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for Jim Maddock - https://phabricator.wikimedia.org/T249873 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi [09:09:47] (03PS1) 10Elukey: profile::analytics::search::airflow: fix systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/591006 [09:11:41] * Urbanecm claims the deployment host [09:12:07] ^ thanks :) [09:13:34] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Rebuild helmfile for buster-wikimedia - https://phabricator.wikimedia.org/T250479 (10Dzahn) contint2001 now runs puppet without any errors. thank you [09:14:13] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10Dzahn) [09:14:17] (03CR) 10Elukey: [C: 03+2] profile::analytics::search::airflow: fix systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/591006 (owner: 10Elukey) [09:17:19] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10Dzahn) @hashar Thanks to Janis also building h... [09:17:40] (03CR) 10Volans: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/588961 (owner: 10Giuseppe Lavagetto) [09:18:40] (03CR) 10Ema: [C: 03+2] Add metric 'purged_udp_bytes_read_total' [software/purged] - 10https://gerrit.wikimedia.org/r/589470 (owner: 10Ema) [09:19:12] !log Security deploy for T250594 [09:19:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:25] 10Operations, 10DBA, 10MediaWiki-General: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished - https://phabricator.wikimedia.org/T193224 (10jcrespo) [09:19:59] * Urbanecm done [09:20:03] (03PS7) 10Dzahn: tlsproxy::envoy: allow limiting firewall srange [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) [09:24:39] 10Operations, 10DBA, 10MediaWiki-General: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished - https://phabricator.wikimedia.org/T193224 (10jcrespo) 05Open→03Resolved a:03Marostegui As per meetings, the decision was: no decision. Closing this in... [09:24:43] 10Operations, 10Patch-For-Review: Prepare our base system layer for Debian buster - https://phabricator.wikimedia.org/T213527 (10jcrespo) [09:24:46] 10Operations, 10ops-eqiad, 10DBA: db1096 management interface unresposive remotelly, likely connectivity issue - https://phabricator.wikimedia.org/T250652 (10ayounsi) p:05Medium→03High [09:25:03] (03CR) 10Dzahn: [C: 03+2] "no change on mw hosts:" [puppet] - 10https://gerrit.wikimedia.org/r/591000 (https://phabricator.wikimedia.org/T149804) (owner: 10Dzahn) [09:25:42] 10Operations, 10DBA, 10MediaWiki-General: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished - https://phabricator.wikimedia.org/T193224 (10jcrespo) [09:27:59] PROBLEM - Host mw1309.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [09:28:12] eh.. that's the broken mgmt switch [09:28:48] 10Operations, 10ops-eqiad, 10DBA: db1096 management interface unresposive remotelly, likely connectivity issue - https://phabricator.wikimedia.org/T250652 (10ayounsi) As the mgmt switch is 9yo and there are no errors on the msw1-eqiad side I'd say let's replace it. [09:28:56] 10Operations, 10DBA, 10MediaWiki-General: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished - https://phabricator.wikimedia.org/T193224 (10jcrespo) [09:30:20] volans: re: open port on contint. this was now possible to close with https://gerrit.wikimedia.org/r/c/operations/puppet/+/591000 [09:30:37] so nextdiff email should show that [09:30:45] mutante: thanks, saw the CR passing by [09:30:58] thx as well [09:31:51] (03CR) 10Vgutierrez: multicast: test URL extraction (031 comment) [software/purged] - 10https://gerrit.wikimedia.org/r/589471 (owner: 10Ema) [09:32:30] (03CR) 10Vgutierrez: [C: 03+1] multicast: test URL extraction [software/purged] - 10https://gerrit.wikimedia.org/r/589471 (owner: 10Ema) [09:33:25] RECOVERY - Host mw1309.mgmt is UP: PING WARNING - Packet loss = 71%, RTA = 0.78 ms [09:33:32] (03PS1) 10Jcrespo: mariadb-backups: Allow reimage of test-s1 codfw db db2102 to buster [puppet] - 10https://gerrit.wikimedia.org/r/591008 (https://phabricator.wikimedia.org/T250666) [09:38:03] !log uRPF, sample + discard in ulsfo - T244147 [09:38:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:35] (03PS4) 10Dzahn: merge microsites into webserver_misc_apps [puppet] - 10https://gerrit.wikimedia.org/r/587985 (https://phabricator.wikimedia.org/T247650) [09:42:20] (03CR) 10Ema: multicast: test URL extraction (031 comment) [software/purged] - 10https://gerrit.wikimedia.org/r/589471 (owner: 10Ema) [09:42:39] 10Operations, 10ops-eqiad, 10DBA: msw1-a6-eqiad flopping up and down mgmt connections on A6 - https://phabricator.wikimedia.org/T250652 (10jcrespo) [09:42:51] 10Operations, 10ops-eqiad, 10DBA: msw1-a6-eqiad flopping up and down mgmt connections on A6 - https://phabricator.wikimedia.org/T250652 (10jcrespo) [09:43:30] (03PS4) 10Ema: multicast: test URL extraction [software/purged] - 10https://gerrit.wikimedia.org/r/589471 [09:44:07] (03CR) 10Dzahn: [C: 03+2] merge microsites into webserver_misc_apps [puppet] - 10https://gerrit.wikimedia.org/r/587985 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [09:45:44] (03CR) 10Ema: [C: 03+2] multicast: test URL extraction [software/purged] - 10https://gerrit.wikimedia.org/r/589471 (owner: 10Ema) [09:45:46] ^ merging "misc_static" into "misc_apps". reducing to juse one pair of VMs instead of 2 [09:51:15] !log uRPF, sample + discard in dfw - T244147 [09:51:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:44] 10Operations, 10Epic: Migrate all of production metal and VMs to Buster or later - https://phabricator.wikimedia.org/T247045 (10jcrespo) [09:52:18] (03PS1) 10Elukey: profile::analytics::search::airflow: improve cleanup timer [puppet] - 10https://gerrit.wikimedia.org/r/591009 [09:53:57] RECOVERY - SSH on mw1311.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [09:54:43] (03PS2) 10Elukey: profile::analytics::search::airflow: improve cleanup timer [puppet] - 10https://gerrit.wikimedia.org/r/591009 [10:00:00] (03PS3) 10Elukey: profile::analytics::search::airflow: improve cleanup timer [puppet] - 10https://gerrit.wikimedia.org/r/591009 [10:05:24] (03CR) 10Elukey: [C: 03+2] profile::analytics::search::airflow: improve cleanup timer [puppet] - 10https://gerrit.wikimedia.org/r/591009 (owner: 10Elukey) [10:06:42] !log uRPF, sample + discard in eqord - T244147 [10:06:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:58] (03CR) 10Filippo Giunchedi: "Did the "systemd degraded mode" (forgot the exact name) alert fire? In other words was the unit marked as failed? I'm saying this because " [puppet] - 10https://gerrit.wikimedia.org/r/589608 (owner: 10Dzahn) [10:08:27] (03PS1) 10Ema: prometheus::node_vhtcpd: add 'ensure' attribute [puppet] - 10https://gerrit.wikimedia.org/r/591010 (https://phabricator.wikimedia.org/T249583) [10:08:58] !log uRPF, sample + discard in eqiad - T244147 [10:09:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:31] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/591010 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [10:12:57] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] legacy ingress: propagate query string to toolforge domain [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/590671 (https://phabricator.wikimedia.org/T250625) (owner: 10BryanDavis) [10:13:15] (03PS1) 10Elukey: profile::analytics::search::airflow: import systemd timer - part 2 [puppet] - 10https://gerrit.wikimedia.org/r/591011 [10:13:21] (03Merged) 10jenkins-bot: legacy ingress: propagate query string to toolforge domain [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/590671 (https://phabricator.wikimedia.org/T250625) (owner: 10BryanDavis) [10:14:18] (03PS2) 10Elukey: profile::analytics::search::airflow: import systemd timer - part 2 [puppet] - 10https://gerrit.wikimedia.org/r/591011 [10:17:39] (03CR) 10Ayounsi: [C: 03+2] uRPF: sample and discard [homer/public] - 10https://gerrit.wikimedia.org/r/588948 (https://phabricator.wikimedia.org/T244147) (owner: 10Ayounsi) [10:18:07] (03PS2) 10Ema: prometheus::node_vhtcpd: add 'ensure' attribute [puppet] - 10https://gerrit.wikimedia.org/r/591010 (https://phabricator.wikimedia.org/T249583) [10:18:18] (03CR) 10Elukey: [C: 03+2] profile::analytics::search::airflow: import systemd timer - part 2 [puppet] - 10https://gerrit.wikimedia.org/r/591011 (owner: 10Elukey) [10:21:32] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 53 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:22:26] (03CR) 10Elukey: Deprecate statistics::rsync::mediawiki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/589600 (owner: 10Elukey) [10:22:38] (03CR) 10Ema: "pcc looks fine: https://puppet-compiler.wmflabs.org/compiler1002/22073/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/591010 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [10:22:40] (03CR) 10Elukey: [C: 03+2] Deprecate statistics::rsync::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/589600 (owner: 10Elukey) [10:23:27] (03CR) 10Ema: [C: 03+2] prometheus::node_vhtcpd: add 'ensure' attribute [puppet] - 10https://gerrit.wikimedia.org/r/591010 (https://phabricator.wikimedia.org/T249583) (owner: 10Ema) [10:23:30] (03PS1) 10Awight: Temporarily enable event oversampling for conflicts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591017 (https://phabricator.wikimedia.org/T249616) [10:25:52] PROBLEM - Host mw1312.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [10:26:40] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 46 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [10:29:46] (03PS1) 10Giuseppe Lavagetto: mediawiki::mcrouter_wancache: add a failover route to a secondary pool [puppet] - 10https://gerrit.wikimedia.org/r/591019 (https://phabricator.wikimedia.org/T213089) [10:30:04] jan_drewniak: (Dis)respected human, time to deploy Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T1030). Please do the needful. [10:30:22] (03CR) 10Jbond: "LGTM you will need to manually remove the following from stat1007" [puppet] - 10https://gerrit.wikimedia.org/r/589600 (owner: 10Elukey) [10:31:13] 10Operations, 10netops: Investigate unicast RPF loose mode - https://phabricator.wikimedia.org/T244147 (10ayounsi) 05Open→03Resolved Default changed to sample + discard on all routers. [10:31:14] RECOVERY - Host mw1312.mgmt is UP: PING WARNING - Packet loss = 60%, RTA = 0.86 ms [10:31:16] 10Operations, 10netops: Investigate unicast RPF loose mode - https://phabricator.wikimedia.org/T244147 (10ayounsi) [10:31:17] elukey: just added comments on what needs cleaning up ^^ [10:31:50] jbond42: <3 [10:31:56] so rsync too [10:32:01] *rsyncd [10:32:09] as far as i can tell this is the only thing that used rsync [10:32:40] for stat1007 I am already cleaning up, will free 1.3TB from the host [10:32:44] so happy about it [10:32:49] (03CR) 10jerkins-bot: [V: 04-1] mediawiki::mcrouter_wancache: add a failover route to a secondary pool [puppet] - 10https://gerrit.wikimedia.org/r/591019 (https://phabricator.wikimedia.org/T213089) (owner: 10Giuseppe Lavagetto) [10:34:08] elukey: if you want to be caustios you can stop rsyncd before removing it. run puppet, if nothing elses tries to start/enable it again you can be pretty sure. however the only frag in /etc/rsync.d is for udp2log so im pretty sure its safe [10:34:46] jbond42: ack +1 [10:37:18] (03CR) 10Jbond: [C: 03+1] "lgtm when telia give the ok" [homer/public] - 10https://gerrit.wikimedia.org/r/589832 (owner: 10Ayounsi) [10:37:31] !log apt-get purge rsync on mwlog* after https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589600/ [10:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:41:45] (03PS1) 10Dzahn: misc_apps: setup rsync to copy microsite data for migration [puppet] - 10https://gerrit.wikimedia.org/r/591022 (https://phabricator.wikimedia.org/T247650) [10:42:39] (03CR) 10Dzahn: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/589608 (owner: 10Dzahn) [10:42:40] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591023 (https://phabricator.wikimedia.org/T128546) [10:44:40] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591023 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:45:49] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591023 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:46:36] 10Operations, 10Graphoid, 10serviceops, 10Core Platform Team (Icebox): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Jseddon) a:03Seddon [10:46:45] (03PS2) 10Giuseppe Lavagetto: mediawiki::mcrouter_wancache: add a failover route to a secondary pool [puppet] - 10https://gerrit.wikimedia.org/r/591019 (https://phabricator.wikimedia.org/T213089) [10:47:57] (03PS2) 10Dzahn: misc_apps: setup rsync to copy microsite data for migration [puppet] - 10https://gerrit.wikimedia.org/r/591022 (https://phabricator.wikimedia.org/T247650) [10:48:29] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:591023| Bumping portals to master (563985)]] (duration: 01m 03s) [10:48:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:26] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:591023| Bumping portals to master (563985)]] (duration: 00m 57s) [10:49:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:58] RECOVERY - Stale file for node-exporter textfile in codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile [10:56:29] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/22076/" [puppet] - 10https://gerrit.wikimedia.org/r/591022 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [10:57:40] RECOVERY - Stale file for node-exporter textfile in esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: How many deployers does it take to do European Mid-day SWAT(Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T1100). [11:00:04] awight: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:12] I can self-deploy these. [11:02:05] (03PS3) 10Giuseppe Lavagetto: mediawiki::mcrouter_wancache: add a failover route to a secondary pool [puppet] - 10https://gerrit.wikimedia.org/r/591019 (https://phabricator.wikimedia.org/T213089) [11:02:21] !log bromine/vega: stop rsyncd which was removed from puppet [11:02:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:57] !log rsyncing static-bugzilla files from bromine to miscweb1002 (T247650) [11:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:04] T247650: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 [11:06:52] (03CR) 10Giuseppe Lavagetto: "This change generates this configuration on mwdebug1001:" [puppet] - 10https://gerrit.wikimedia.org/r/591019 (https://phabricator.wikimedia.org/T213089) (owner: 10Giuseppe Lavagetto) [11:10:09] (03PS1) 10Ema: Add Host regex filtering [software/purged] - 10https://gerrit.wikimedia.org/r/591024 (https://phabricator.wikimedia.org/T249583) [11:10:40] PROBLEM - SSH on mw1309.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [11:10:59] (03PS1) 10Dzahn: miscapps: allow rsyncing app data to multiple destination hosts [puppet] - 10https://gerrit.wikimedia.org/r/591025 (https://phabricator.wikimedia.org/T247650) [11:16:32] (03PS1) 10Arturo Borrero Gonzalez: toolforge: add wmcs-package-build.py script [puppet] - 10https://gerrit.wikimedia.org/r/591026 (https://phabricator.wikimedia.org/T249837) [11:17:40] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 52 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:19:17] (03CR) 10Awight: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591017 (https://phabricator.wikimedia.org/T249616) (owner: 10Awight) [11:19:30] (03PS2) 10Dzahn: miscapps: allow rsyncing app data to multiple destination hosts [puppet] - 10https://gerrit.wikimedia.org/r/591025 (https://phabricator.wikimedia.org/T247650) [11:20:22] (03Merged) 10jenkins-bot: Temporarily enable event oversampling for conflicts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591017 (https://phabricator.wikimedia.org/T249616) (owner: 10Awight) [11:20:33] (03PS7) 10Jbond: apereo_cas: update templates login page [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) [11:21:26] 10Operations, 10Patch-For-Review, 10User-jbond: Wikimedia theme for SSO login page - https://phabricator.wikimedia.org/T233939 (10jbond) screen shot for [[ https://gerrit.wikimedia.org/r/c/operations/software/cas-overlay-template/+/587538/7 | PS7 ]] {F31768111} [11:25:22] !log awight@deploy1001 Synchronized php-1.35.0-wmf.28/extensions/TwoColConflict: SWAT: [[gerrit:591016|Configurable EditStepAttempt oversampling for conflicts (T249616)]] (duration: 01m 03s) [11:25:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:32] T249616: Prepare metrics to answer usability questions - https://phabricator.wikimedia.org/T249616 [11:27:06] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/22080/miscweb2002.codfw.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/591025 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [11:27:27] !log awight@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:591017|Temporarily enable event oversampling for conflicts (T249616)]] (duration: 01m 00s) [11:27:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:20] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 41 probes of 554 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [11:31:26] (03PS8) 10Jbond: apereo_cas: update templates login page [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) [11:31:40] ema: ok to push ATS config changes? do we still watch for that bug? [11:32:16] (03PS2) 10Arturo Borrero Gonzalez: toolforge: add wmcs-package-build.py script [puppet] - 10https://gerrit.wikimedia.org/r/591026 (https://phabricator.wikimedia.org/T249837) [11:32:38] (03CR) 10Jbond: "thanks updated" (032 comments) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) (owner: 10Jbond) [11:32:41] (03PS1) 10Dzahn: ATS: switch backend for static-bugzilla to misc-apps [puppet] - 10https://gerrit.wikimedia.org/r/591027 (https://phabricator.wikimedia.org/T247650) [11:32:45] (03CR) 10Volans: [C: 04-1] "As far as I know the RAPI process is active only on the master node for each cluster." [puppet] - 10https://gerrit.wikimedia.org/r/589608 (owner: 10Dzahn) [11:34:52] mutante: go ahead, the bug is annoying and still there but there's no action required on our side in case it happens [11:35:16] (03PS1) 10MSantos: Re-enable OSM replication [puppet] - 10https://gerrit.wikimedia.org/r/591028 (https://phabricator.wikimedia.org/T249086) [11:35:37] ema: ack, thx [11:36:08] (03CR) 10Dzahn: [C: 03+2] ATS: switch backend for static-bugzilla to misc-apps [puppet] - 10https://gerrit.wikimedia.org/r/591027 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [11:36:34] 10Operations, 10Analytics, 10Traffic: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993 (10ema) >>! In T237993#6066376, @elukey wrote: > There is currently too much data that flows to kafka, for cp3050 we have 36GB * 12 partitions for a single day, definitely too much. How much... [11:38:43] (03CR) 10Volans: [C: 04-1] "If we want to check that RAPI works for a cluster I think we should just connect to it instead of checking the process itself." [puppet] - 10https://gerrit.wikimedia.org/r/589608 (owner: 10Dzahn) [11:40:48] (03CR) 10Jcrespo: "TODO (also): Remove db1140:s2 and db1140:x1 from tendril, zarcillo; add the 2 new instances." [puppet] - 10https://gerrit.wikimedia.org/r/590995 (https://phabricator.wikimedia.org/T250602) (owner: 10Jcrespo) [11:41:35] !log puppetmaster - revoking cert for webserver-misc-apps.discovery.wmnet and recreating it with additional static microsite names (T247650) [11:41:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:41] T247650: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 [11:42:29] (03CR) 10Jcrespo: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/590995 (https://phabricator.wikimedia.org/T250602) (owner: 10Jcrespo) [11:42:39] (03PS1) 10Arturo Borrero Gonzalez: d/changelog: generate entry for 0.68 stretch [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/591029 [11:47:35] 10Operations, 10Traffic, 10Performance-Team (Radar): Support brotli compression - https://phabricator.wikimedia.org/T137979 (10Gilles) Linkedin saw a 2-6% improvement on page load time: https://engineering.linkedin.com/blog/2017/05/boosting-site-speed-using-brotli-compression [11:47:38] (03PS1) 10Dzahn: ssl: update certificate for webserver-misc-apps [puppet] - 10https://gerrit.wikimedia.org/r/591031 (https://phabricator.wikimedia.org/T247650) [11:48:53] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] d/changelog: generate entry for 0.68 stretch [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/591029 (owner: 10Arturo Borrero Gonzalez) [11:49:07] 10Operations, 10Traffic, 10Performance-Team (Radar): Support brotli compression - https://phabricator.wikimedia.org/T137979 (10Gilles) @ema I assume that ATS frontends as currently deployed support Brotli, right? [11:49:37] (03Merged) 10jenkins-bot: d/changelog: generate entry for 0.68 stretch [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/591029 (owner: 10Arturo Borrero Gonzalez) [11:50:17] (03CR) 10Dzahn: [C: 03+2] "openssl x509 -in webserver-misc-apps.discovery.wmnet.crt -text -noout | grep DNS" [puppet] - 10https://gerrit.wikimedia.org/r/591031 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [11:53:50] (03PS3) 10Arturo Borrero Gonzalez: toolforge: add wmcs-package-build.py script [puppet] - 10https://gerrit.wikimedia.org/r/591026 (https://phabricator.wikimedia.org/T249837) [11:55:02] (03PS4) 10Arturo Borrero Gonzalez: toolforge: add wmcs-package-build.py script [puppet] - 10https://gerrit.wikimedia.org/r/591026 (https://phabricator.wikimedia.org/T249837) [11:57:20] (03CR) 10Dzahn: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/589608 (owner: 10Dzahn) [12:06:55] (03CR) 10DCausse: [C: 03+1] elasticsearch::instance: avoid UseCMSInitiatingOccupancyOnly [puppet] - 10https://gerrit.wikimedia.org/r/589472 (https://phabricator.wikimedia.org/T231517) (owner: 10Elukey) [12:09:28] PROBLEM - Unmerged changes on repository puppet on labtestpuppetmaster2001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [12:11:21] !log Running `REINDEX DATABASE gis` in maps2004.codfw.wmnet (which is depooled at the moment) [12:11:24] RECOVERY - SSH on mw1309.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [12:11:24] ^ gehel [12:11:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:37] mateusbs17: thanks! [12:13:10] RECOVERY - Unmerged changes on repository puppet on labtestpuppetmaster2001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [12:17:00] (03PS1) 10Jbond: ferm_status: store each chain in its own hash [puppet] - 10https://gerrit.wikimedia.org/r/591036 (https://phabricator.wikimedia.org/T206951) [12:19:17] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me!" [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) (owner: 10Jbond) [12:20:48] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/591005 (https://phabricator.wikimedia.org/T250364) (owner: 10Filippo Giunchedi) [12:21:19] (03CR) 10Jbond: [C: 03+2] ferm_status: store each chain in its own hash [puppet] - 10https://gerrit.wikimedia.org/r/591036 (https://phabricator.wikimedia.org/T206951) (owner: 10Jbond) [12:28:26] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [12:28:36] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [12:29:36] (03PS1) 10Hashar: contint: ignore Docker partitions disk check [puppet] - 10https://gerrit.wikimedia.org/r/591037 [12:29:38] (03PS1) 10Hashar: contint: move Docker out of / on contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/591038 [12:30:28] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [12:31:45] !log remove all disabled BGP neighbors on cr2-esams [12:31:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:51] (03CR) 10Filippo Giunchedi: [C: 03+2] admin: add awight and wmde-fish to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/591005 (https://phabricator.wikimedia.org/T250364) (owner: 10Filippo Giunchedi) [12:32:08] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [12:34:05] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-wmde-users for awight - https://phabricator.wikimedia.org/T250364 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Access granted, will be live in ~half an hour [12:34:10] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-wmde-users for Christoph Jauera - https://phabricator.wikimedia.org/T250362 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi Access granted, will be live in ~half an hour [12:40:33] !log remove all disabled termsfrom cr2-eqiad [12:40:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:30] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Release-Engineering-Team (CI & Testing services): Rebuild helmfile for buster-wikimedia - https://phabricator.wikimedia.org/T250479 (10hashar) Well done! Thank you [12:42:52] (03PS1) 10Seddon: Undeploying graphoid on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591040 [12:43:54] (03CR) 10jerkins-bot: [V: 04-1] Undeploying graphoid on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591040 (owner: 10Seddon) [12:44:44] PROBLEM - SSH on db1096.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [12:45:17] (03CR) 10Marostegui: [C: 04-1] "Missing db2100" [puppet] - 10https://gerrit.wikimedia.org/r/591008 (https://phabricator.wikimedia.org/T250666) (owner: 10Jcrespo) [12:47:00] (03PS2) 10Hashar: contint: ignore more Docker partitions disk checks [puppet] - 10https://gerrit.wikimedia.org/r/591037 (https://phabricator.wikimedia.org/T224591) [12:47:02] (03PS2) 10Hashar: contint: move Docker data out of / on contint2001 [puppet] - 10https://gerrit.wikimedia.org/r/591038 (https://phabricator.wikimedia.org/T224591) [12:48:04] (03PS2) 10Seddon: Undeploying graphoid on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591040 [12:48:18] (03CR) 10Jcrespo: "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/591008 (https://phabricator.wikimedia.org/T250666) (owner: 10Jcrespo) [12:49:47] (03PS3) 10Seddon: Undeploying graphoid on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591040 [12:50:15] (03CR) 10Filippo Giunchedi: [C: 04-1] "LGTM overall, see inline" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/587811 (https://phabricator.wikimedia.org/T199236) (owner: 10Cwhite) [12:50:20] (03PS2) 10Jcrespo: mariadb-backups: Move s2, x1 eqiad backups to db1095 [puppet] - 10https://gerrit.wikimedia.org/r/590999 (https://phabricator.wikimedia.org/T250602) [12:50:22] (03PS2) 10Jcrespo: mariadb-backups: Allow reimage of test-s1 codfw db db2102 to buster [puppet] - 10https://gerrit.wikimedia.org/r/591008 (https://phabricator.wikimedia.org/T250666) [12:50:35] (03CR) 10Hashar: "On contint1001, /mnt/docker is a mount and exists." [puppet] - 10https://gerrit.wikimedia.org/r/591038 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [12:51:02] (03PS5) 10Arturo Borrero Gonzalez: toolforge: add wmcs-package-build.py script [puppet] - 10https://gerrit.wikimedia.org/r/591026 (https://phabricator.wikimedia.org/T249837) [12:51:05] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Move s2, x1 eqiad backups to db1095 [puppet] - 10https://gerrit.wikimedia.org/r/590999 (https://phabricator.wikimedia.org/T250602) (owner: 10Jcrespo) [12:51:49] (03CR) 10Filippo Giunchedi: [C: 03+1] smart: abstract parsing from data gathering and add tests [puppet] - 10https://gerrit.wikimedia.org/r/587816 (https://phabricator.wikimedia.org/T199236) (owner: 10Cwhite) [12:52:01] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] mariadb-backups: Move s2, x1 eqiad backups to db1095 [puppet] - 10https://gerrit.wikimedia.org/r/590999 (https://phabricator.wikimedia.org/T250602) (owner: 10Jcrespo) [12:52:20] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: add wmcs-package-build.py script [puppet] - 10https://gerrit.wikimedia.org/r/591026 (https://phabricator.wikimedia.org/T249837) (owner: 10Arturo Borrero Gonzalez) [12:52:34] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) Over the week-end I have noticed a fe... [12:53:39] (03CR) 10Marostegui: [C: 03+1] mariadb-backups: Allow reimage of test-s1 codfw db db2102 to buster [puppet] - 10https://gerrit.wikimedia.org/r/591008 (https://phabricator.wikimedia.org/T250666) (owner: 10Jcrespo) [12:54:36] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Allow reimage of test-s1 codfw db db2102 to buster [puppet] - 10https://gerrit.wikimedia.org/r/591008 (https://phabricator.wikimedia.org/T250666) (owner: 10Jcrespo) [12:54:38] (03PS4) 10Seddon: Undeploying graphoid on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591040 [12:56:37] (03PS1) 10Vgutierrez: ATS: Stop ats-tls from adding the Age header [puppet] - 10https://gerrit.wikimedia.org/r/591044 [12:57:36] (03PS1) 10Paladox: Allow Project owners to push signed tags [software/tools-webservice] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/591045 [12:57:48] (03CR) 10Filippo Giunchedi: [C: 03+1] smart: add tests for _parse_smart_info and _parse_smart_attributes [puppet] - 10https://gerrit.wikimedia.org/r/587877 (https://phabricator.wikimedia.org/T199236) (owner: 10Cwhite) [12:58:06] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Allow Project owners to push signed tags [software/tools-webservice] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/591045 (owner: 10Paladox) [12:58:17] (03CR) 10Paladox: [V: 03+2 C: 03+2] Allow Project owners to push signed tags [software/tools-webservice] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/591045 (owner: 10Paladox) [12:59:18] (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1003/22081/" [puppet] - 10https://gerrit.wikimedia.org/r/591044 (owner: 10Vgutierrez) [13:02:55] (03CR) 10Elukey: [C: 03+2] elasticsearch::instance: avoid UseCMSInitiatingOccupancyOnly [puppet] - 10https://gerrit.wikimedia.org/r/589472 (https://phabricator.wikimedia.org/T231517) (owner: 10Elukey) [13:03:03] (03CR) 10Muehlenhoff: contint: move Docker data out of / on contint2001 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/591038 (https://phabricator.wikimedia.org/T224591) (owner: 10Hashar) [13:07:57] (03CR) 10Filippo Giunchedi: [C: 03+1] smart: simplify PD [puppet] - 10https://gerrit.wikimedia.org/r/588515 (https://phabricator.wikimedia.org/T199236) (owner: 10Cwhite) [13:11:19] (03CR) 10Filippo Giunchedi: "LGTM" (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [13:11:36] PROBLEM - PHP opcache health on mwdebug1002 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:14:32] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10jcrespo) a:05jcrespo→03wiki_willy Service failover done, this is now 100% on #dc-ops side for vendor handling, as per description and initial triage by Manuel. No production service... [13:17:47] (03CR) 10Filippo Giunchedi: pcc templates: refactor templates to make them more DRY (032 comments) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588735 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [13:18:05] (03CR) 10Reedy: [C: 03+2] Undeploying graphoid on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591040 (owner: 10Seddon) [13:18:14] (03PS1) 10Jcrespo: mariadb-backups: Enable notifications on db1095 after failover from db1140 [puppet] - 10https://gerrit.wikimedia.org/r/591047 (https://phabricator.wikimedia.org/T250602) [13:18:25] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1081 after schema change, restore db1097:3314 original weights', diff saved to https://phabricator.wikimedia.org/P11021 and previous config saved to /var/cache/conftool/dbconfig/20200420-131823-marostegui.json [13:18:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:12] (03Merged) 10jenkins-bot: Undeploying graphoid on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591040 (owner: 10Seddon) [13:20:52] RECOVERY - PHP opcache health on mwdebug1002 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:28:03] (03PS1) 10Jbond: ferm_status: use ip_network to normalise src and dst addresses [puppet] - 10https://gerrit.wikimedia.org/r/591049 (https://phabricator.wikimedia.org/T206951) [13:30:45] !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Undeploying graphoid on beta (duration: 01m 07s) [13:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:31:33] (03CR) 10Jbond: [C: 03+2] ferm_status: use ip_network to normalise src and dst addresses [puppet] - 10https://gerrit.wikimedia.org/r/591049 (https://phabricator.wikimedia.org/T206951) (owner: 10Jbond) [13:37:04] (03CR) 10Ema: [C: 03+1] ATS: Stop ats-tls from adding the Age header [puppet] - 10https://gerrit.wikimedia.org/r/591044 (owner: 10Vgutierrez) [13:37:17] 10Operations, 10ops-eqiad, 10DBA: msw1-a6-eqiad flopping up and down mgmt connections on A6 - https://phabricator.wikimedia.org/T250652 (10Cmjohnson) I have a spare Netgear switch already in storage, I will request access to the cage and complete this on Tuesday 4/21 @wiki_willy can we order a replacement t... [13:37:24] (03PS4) 10Giuseppe Lavagetto: mediawiki::mcrouter_wancache: add a failover route to a secondary pool [puppet] - 10https://gerrit.wikimedia.org/r/591019 (https://phabricator.wikimedia.org/T213089) [13:37:36] (03CR) 10Vgutierrez: [C: 03+2] ATS: Stop ats-tls from adding the Age header [puppet] - 10https://gerrit.wikimedia.org/r/591044 (owner: 10Vgutierrez) [13:45:28] RECOVERY - SSH on db1096.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [13:46:21] (03CR) 10Elukey: [C: 03+1] "IPs looks good, https://puppet-compiler.wmflabs.org/compiler1001/22082/ looks good, I think that we can try!" [puppet] - 10https://gerrit.wikimedia.org/r/591019 (https://phabricator.wikimedia.org/T213089) (owner: 10Giuseppe Lavagetto) [13:50:11] !log Deploy schema change on codfw master - T250055 [13:50:16] (03PS3) 10Jbond: pcc templates: add cli instructions to template footer [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) [13:50:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:18] T250055: Remove image.img_deleted column from production - https://phabricator.wikimedia.org/T250055 [13:50:32] (03CR) 10Jbond: "updated thanks" (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [13:50:34] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki::mcrouter_wancache: add a failover route to a secondary pool [puppet] - 10https://gerrit.wikimedia.org/r/591019 (https://phabricator.wikimedia.org/T213089) (owner: 10Giuseppe Lavagetto) [13:50:49] (03PS5) 10Giuseppe Lavagetto: mediawiki::mcrouter_wancache: add a failover route to a secondary pool [puppet] - 10https://gerrit.wikimedia.org/r/591019 (https://phabricator.wikimedia.org/T213089) [13:53:32] (03PS1) 10Dzahn: ATS: switch backends for annual.wm.org and 15.wp.org [puppet] - 10https://gerrit.wikimedia.org/r/591052 [13:55:46] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops, 10Patch-For-Review: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10Cmjohnson) a:05wiki_willy→03Cmjohnson @Marostegui @jynus Before I can start a ticket with HPE some local troubleshooting has to be done. [13:56:26] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/589400 (owner: 10Ottomata) [13:56:43] 10Operations, 10ops-eqiad, 10DC-Ops, 10serviceops: scb1001: Memory correctable errors -EDAC- - https://phabricator.wikimedia.org/T250482 (10Cmjohnson) this is a good candidate for a replacement. [13:58:16] 10Operations, 10ops-eqiad, 10DC-Ops: Interface errors on asw2-c-eqiad - ge-3/0/9 (pc1009) - https://phabricator.wikimedia.org/T250257 (10Cmjohnson) 05Open→03Resolved No new framing errors, resolving this [13:58:26] (03CR) 10Ottomata: [C: 03+2] Refactor logstash::input::kafka to DRY ssl_truststore_location logic [puppet] - 10https://gerrit.wikimedia.org/r/589400 (owner: 10Ottomata) [13:58:34] (03PS8) 10Ottomata: Refactor logstash::input::kafka to DRY ssl_truststore_location logic [puppet] - 10https://gerrit.wikimedia.org/r/589400 [14:00:08] 10Operations, 10ops-eqiad, 10DC-Ops, 10serviceops: scb1001: Memory correctable errors -EDAC- - https://phabricator.wikimedia.org/T250482 (10MoritzMuehlenhoff) These servers will be removed from service soon (when the services currently running on it are completely moved to Kubernetes). @akosiaris can best... [14:02:19] (03CR) 10Filippo Giunchedi: [C: 04-1] pcc templates: add cli instructions to template footer (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [14:06:42] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P11022 and previous config saved to /var/cache/conftool/dbconfig/20200420-140642-marostegui.json [14:06:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:47] (03CR) 10Ottomata: "Let's hold off on this for now." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/589597 (https://phabricator.wikimedia.org/T116719) (owner: 10Ottomata) [14:09:37] jouncebot: next [14:09:37] In 2 hour(s) and 50 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T1700) [14:10:17] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1127 for upgrade', diff saved to https://phabricator.wikimedia.org/P11023 and previous config saved to /var/cache/conftool/dbconfig/20200420-141017-marostegui.json [14:10:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:44] !log Upgrade db1127 [14:10:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:18] !log Upgrade db2131 [14:13:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:11] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1127 after upgrade', diff saved to https://phabricator.wikimedia.org/P11025 and previous config saved to /var/cache/conftool/dbconfig/20200420-141711-marostegui.json [14:17:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:00] !log Upgrade dbstore1005 [14:18:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:46] (03PS1) 10Jhedden: cloudvps: Add new role for metricsinfra [puppet] - 10https://gerrit.wikimedia.org/r/591053 (https://phabricator.wikimedia.org/T250206) [14:22:03] (03PS2) 10CDanis: full deployment of nic_saturation_exporter [puppet] - 10https://gerrit.wikimedia.org/r/589703 (https://phabricator.wikimedia.org/T250401) [14:22:05] (03PS1) 10CDanis: nic-saturation-exporter: skip virtual interfaces [puppet] - 10https://gerrit.wikimedia.org/r/591054 (https://phabricator.wikimedia.org/T250401) [14:23:49] 10Operations, 10Traffic, 10Goal, 10Performance-Team (Radar), 10Wikimedia-Incident: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10ema) >>! In T170567#6069972, @Krinkle wrote: > Re-opening and tracking as on-going perf incident per the above. As @Gilles mentioned, it would help if we can a... [14:24:01] (03CR) 10Jhedden: [C: 03+2] cloudvps: Add new role for metricsinfra [puppet] - 10https://gerrit.wikimedia.org/r/591053 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [14:24:50] !log Upgrade db2101 [14:24:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:29] !log Upgrade db2096 (x1 codfw master) [14:28:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:53] (03CR) 10RLazarus: [C: 03+1] "👍" [puppet] - 10https://gerrit.wikimedia.org/r/591054 (https://phabricator.wikimedia.org/T250401) (owner: 10CDanis) [14:29:35] 10Operations, 10Traffic, 10Performance-Team (Radar): Support brotli compression - https://phabricator.wikimedia.org/T137979 (10ema) >>! In T137979#6071140, @Gilles wrote: > @ema I assume that ATS frontends as currently deployed support Brotli, right? We would need to enable the [[https://docs.trafficserver.... [14:31:27] (03PS4) 10Jbond: pcc templates: add cli instructions to template footer [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) [14:31:35] (03CR) 10Jbond: "thanks updated" (032 comments) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588735 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [14:35:41] (03CR) 10Jbond: "see inline" (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [14:35:51] (03CR) 10CDanis: [C: 03+2] nic-saturation-exporter: skip virtual interfaces [puppet] - 10https://gerrit.wikimedia.org/r/591054 (https://phabricator.wikimedia.org/T250401) (owner: 10CDanis) [14:36:05] (03CR) 10CDanis: [C: 03+2] full deployment of nic_saturation_exporter [puppet] - 10https://gerrit.wikimedia.org/r/589703 (https://phabricator.wikimedia.org/T250401) (owner: 10CDanis) [14:45:20] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "Just to confirm this. :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591017 (https://phabricator.wikimedia.org/T249616) (owner: 10Awight) [14:50:46] 10Operations, 10Cognate, 10ContentTranslation, 10DBA, and 10 others: Restart extension1 (x1) database primary master (db1120) - https://phabricator.wikimedia.org/T250701 (10Marostegui) [14:51:44] 10Operations, 10Puppet, 10DBA, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [14:52:24] PROBLEM - PHP opcache health on wtp2003 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:52:32] PROBLEM - PHP opcache health on wtp2001 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:53:40] !log depool wdqs1006 as it stopped updating [14:53:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:17] !log restart blazegraph on wdqs1006 [14:54:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:50] PROBLEM - PHP opcache health on wtp2007 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:55:36] (03CR) 10Filippo Giunchedi: pcc templates: add cli instructions to template footer (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [14:55:38] PROBLEM - PHP opcache health on wtp2016 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:56:02] PROBLEM - PHP opcache health on wtp2019 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:56:10] PROBLEM - PHP opcache health on wtp2009 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:56:10] PROBLEM - PHP opcache health on wtp2006 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:56:15] cdanis: If your around, would you mind switching over to codfw again for wdqs? [14:56:30] PROBLEM - PHP opcache health on wtp2005 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:56:42] (03CR) 10Filippo Giunchedi: [C: 03+1] pcc templates: add cli instructions to template footer [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [14:56:43] addshore: sure, is eqiad lagged again? [14:56:50] PROBLEM - PHP opcache health on wtp2018 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:56:52] yup https://grafana.wikimedia.org/d/000000489/wikidata-query-service?panelId=8&fullscreen&orgId=1&refresh=1m [14:56:53] ah i see you just depooled 1006 [14:56:54] PROBLEM - PHP opcache health on wtp2011 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:57:03] !log cdanis@cumin1001 conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad [14:57:03] only by 2-4 hours this time .... [14:57:04] PROBLEM - PHP opcache health on wtp2015 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:57:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:12] ty! [14:57:28] PROBLEM - PHP opcache health on wtp2010 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:02] PROBLEM - PHP opcache health on wtp2008 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [14:58:40] PROBLEM - PHP opcache health on wtp2012 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:00:00] PROBLEM - PHP opcache health on wtp2004 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:03:14] PROBLEM - SSH on mw1307.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [15:03:38] PROBLEM - PHP opcache health on wtp2002 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:03:42] (03PS1) 10Andrew Bogott: mariadb wmcs ferm: use openstack_controllers instead of nova_controller etc. [puppet] - 10https://gerrit.wikimedia.org/r/591062 (https://phabricator.wikimedia.org/T249941) [15:03:44] (03PS1) 10Andrew Bogott: mariadb wmcs ferm: update ::wmcs profiles to use types and lookup() [puppet] - 10https://gerrit.wikimedia.org/r/591063 (https://phabricator.wikimedia.org/T249941) [15:03:47] (03PS1) 10Andrew Bogott: mariadb wmcs ferm: remove references to osm_host and cloudweb_dev_hosts [puppet] - 10https://gerrit.wikimedia.org/r/591064 [15:03:49] (03PS1) 10Andrew Bogott: mariadb wmcs ferm: add ipv6 firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/591065 [15:04:10] (03PS5) 10Jbond: pcc templates: add cli instructions to template footer [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) [15:05:03] 10Operations, 10netops, 10observability: Investigate Juniper structured logs - https://phabricator.wikimedia.org/T250703 (10ayounsi) p:05Triage→03Low [15:06:04] PROBLEM - PHP opcache health on wtp2017 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:06:31] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T250050 (10Papaul) @Eevans /dev/sdc has been replaced. Let me know if you have any questions [15:07:10] (03PS1) 10Jbond: ferm-status: update __str__ function to deal with dict rules [puppet] - 10https://gerrit.wikimedia.org/r/591079 [15:10:27] !log Upgrade db2079 [15:10:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:10] (03CR) 10Jbond: [C: 03+2] ferm-status: update __str__ function to deal with dict rules [puppet] - 10https://gerrit.wikimedia.org/r/591079 (owner: 10Jbond) [15:13:30] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/588736 (https://phabricator.wikimedia.org/T250169) (owner: 10Jbond) [15:16:55] 10Operations, 10Traffic, 10Performance-Team (Radar): Edge cache response time per server should be monitored - https://phabricator.wikimedia.org/T238086 (10Gilles) a:03Gilles [15:17:12] 10Operations, 10Traffic, 10Performance-Team (Radar): Edge cache response time per server should be monitored - https://phabricator.wikimedia.org/T238086 (10Gilles) p:05Medium→03High [15:17:14] (03CR) 10Andrew Bogott: "pcc results for this whole patchset: https://puppet-compiler.wmflabs.org/compiler1001/22083/db1133.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/591062 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [15:17:28] (03CR) 10Andrew Bogott: "pcc results for this whole patchset: https://puppet-compiler.wmflabs.org/compiler1001/22083/db1133.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/591063 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [15:17:39] (03CR) 10Andrew Bogott: "pcc results for this whole patchset: https://puppet-compiler.wmflabs.org/compiler1001/22083/db1133.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/591064 (owner: 10Andrew Bogott) [15:17:53] (03CR) 10Andrew Bogott: "pcc results for this whole patchset: https://puppet-compiler.wmflabs.org/compiler1001/22083/db1133.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/591065 (owner: 10Andrew Bogott) [15:18:31] (03PS1) 10Andrew Bogott: profile::openstack::codfw1dev::db: tidy up firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/591082 (https://phabricator.wikimedia.org/T249941) [15:19:54] RECOVERY - PHP opcache health on wtp2019 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:21:54] RECOVERY - PHP opcache health on wtp2001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:23:34] RECOVERY - PHP opcache health on wtp2003 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:23:38] RECOVERY - PHP opcache health on wtp2009 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:24:33] (03PS1) 10Urbanecm: Initial configuration for gomwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591084 (https://phabricator.wikimedia.org/T249506) [15:25:40] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for gomwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591084 (https://phabricator.wikimedia.org/T249506) (owner: 10Urbanecm) [15:26:06] RECOVERY - PHP opcache health on wtp2012 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:30:50] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [15:30:54] (03PS2) 10Urbanecm: Initial configuration for gomwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591084 (https://phabricator.wikimedia.org/T249506) [15:32:08] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [15:32:54] (03PS1) 10Andrew Bogott: cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) [15:33:06] !log mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Victorgrigas /home/urbanecm/upload (T250687) [15:33:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:12] RECOVERY - PHP opcache health on wtp2007 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:33:13] T250687: Server side upload for Victorgrigas - https://phabricator.wikimedia.org/T250687 [15:35:12] RECOVERY - PHP opcache health on wtp2018 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:35:33] jouncebot: next [15:35:34] In 1 hour(s) and 24 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T1700) [15:36:22] (03CR) 10jerkins-bot: [V: 04-1] cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [15:36:26] RECOVERY - PHP opcache health on wtp2008 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:37:04] RECOVERY - PHP opcache health on wtp2011 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:37:15] 10Operations, 10ops-codfw, 10cloud-services-team (Hardware): (Need by: TBD) rack/setup/install cloudcontrol2004-dev - https://phabricator.wikimedia.org/T250708 (10Papaul) [15:37:40] RECOVERY - PHP opcache health on wtp2016 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:38:32] RECOVERY - PHP opcache health on wtp2005 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:39:04] RECOVERY - PHP opcache health on wtp2015 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:39:12] (03PS2) 10Andrew Bogott: cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) [15:39:49] 10Operations, 10ops-codfw, 10cloud-services-team (Hardware): (Need by: TBD) rack/setup/install cloudcontrol2004-dev - https://phabricator.wikimedia.org/T250708 (10Papaul) p:05Triage→03Medium [15:41:05] (03PS2) 10Andrew Bogott: profile::openstack::codfw1dev::db: tidy up firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/591082 (https://phabricator.wikimedia.org/T249941) [15:41:07] (03PS3) 10Andrew Bogott: cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) [15:41:41] 10Operations, 10serviceops: upgrade people.wikimedia.org backend to buster - https://phabricator.wikimedia.org/T247649 (10Dzahn) p:05Triage→03Medium [15:41:47] 10Operations, 10serviceops: replace bromine and vega with buster VMs - https://phabricator.wikimedia.org/T247650 (10Dzahn) p:05Triage→03Medium [15:42:04] RECOVERY - PHP opcache health on wtp2002 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:42:04] RECOVERY - PHP opcache health on wtp2004 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:42:26] (03CR) 10jerkins-bot: [V: 04-1] cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [15:42:51] 10Operations, 10ops-codfw, 10cloud-services-team (Hardware): (Need by: TBD) rack/setup/install cloudcontrol2004-dev - https://phabricator.wikimedia.org/T250708 (10Papaul) [15:44:13] 10Operations, 10ops-codfw, 10cloud-services-team (Hardware): (Need by: TBD) rack/setup/install cloudcontrol2004-dev - https://phabricator.wikimedia.org/T250708 (10Papaul) [15:44:45] (03CR) 10jerkins-bot: [V: 04-1] cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [15:45:00] RECOVERY - PHP opcache health on wtp2010 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:45:44] (03PS4) 10Andrew Bogott: cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) [15:46:20] RECOVERY - PHP opcache health on wtp2017 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:48:38] (03PS5) 10Andrew Bogott: cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) [15:50:41] 10Operations, 10ops-codfw: Degraded RAID on restbase2014 - https://phabricator.wikimedia.org/T250050 (10Eevans) >>! In T250050#6071807, @Papaul wrote: > @Eevans /dev/sdc has been replaced. Let me know if you have any questions Thanks @Papaul; /cc @hnowlan [15:51:51] (03CR) 10jerkins-bot: [V: 04-1] cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [15:52:28] (03CR) 10Andrew Bogott: "I am the typo king" [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [15:52:54] 10Operations, 10observability: run nic_saturation_exporter on all physical hosts - https://phabricator.wikimedia.org/T250401 (10CDanis) [15:52:58] RECOVERY - PHP opcache health on wtp2006 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [15:53:37] 10Operations, 10LDAP-Access-Requests: LDAP/NDA Access Request for mshaver - https://phabricator.wikimedia.org/T250430 (10MNoorWMF) yes please, i'm trying to move forward with mshaver as the default across all platforms. [15:53:51] (03PS6) 10Andrew Bogott: cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) [15:54:03] 10Operations, 10ops-eqiad, 10DBA: msw1-a6-eqiad flopping up and down mgmt connections on A6 - https://phabricator.wikimedia.org/T250652 (10wiki_willy) a:03Cmjohnson [15:55:33] 10Operations, 10ops-eqiad, 10DBA: msw1-a6-eqiad flopping up and down mgmt connections on A6 - https://phabricator.wikimedia.org/T250652 (10wiki_willy) @cmjohnson - we have a refresh for the eqiad management switches scheduled to be ordered this quarter, so I'll check with Rob to see when those are coming in.... [15:57:01] 10Operations, 10ops-eqiad, 10DBA: msw1-a6-eqiad flopping up and down mgmt connections on A6 - https://phabricator.wikimedia.org/T250652 (10RobH) T249048 was approved last Friday (today being Monday), and my plan is to place the info into Coupa later today for ordering. I don't think we'll need another task... [15:58:19] (03PS7) 10Andrew Bogott: cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) [16:01:17] (03CR) 10Andrew Bogott: "pcc diff: https://puppet-compiler.wmflabs.org/compiler1002/22090/" [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [16:01:25] (03CR) 10Andrew Bogott: [C: 03+2] profile::openstack::codfw1dev::db: tidy up firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/591082 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [16:01:32] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [16:02:04] PROBLEM - PHP opcache health on mwdebug1001 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [16:07:31] (03PS1) 10Andrew Bogott: wmcs pdns recursor profiles: remove nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591090 [16:09:18] RECOVERY - PHP opcache health on mwdebug1001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [16:09:20] RECOVERY - Device not healthy -SMART- on restbase2014 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/SMART%23Alerts https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=restbase2014&var-datasource=codfw+prometheus/ops [16:09:27] (03CR) 10Andrew Bogott: "confirmed harmless in https://puppet-compiler.wmflabs.org/compiler1002/22091/" [puppet] - 10https://gerrit.wikimedia.org/r/591090 (owner: 10Andrew Bogott) [16:12:09] 10Operations, 10Traffic: Implement TTL cap for ats-be - https://phabricator.wikimedia.org/T249627 (10ema) 05Open→03Resolved [16:12:15] (03CR) 10Andrew Bogott: [C: 03+2] wmcs pdns recursor profiles: remove nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591090 (owner: 10Andrew Bogott) [16:21:08] 10Operations, 10CommRel-Specialists-Support (Apr-Jun-2020): CommRel support for FY2019-2020 Q4 DC switchover - https://phabricator.wikimedia.org/T244808 (10Elitre) FYI @Trizek-WMF , @Johan [16:22:03] (03PS1) 10Andrew Bogott: neutron common profiles: remove nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591091 [16:23:39] (03PS1) 10Vgutierrez: ATS: Enable SSL_OP_PRIORITIZE_CHACHA on ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/591092 [16:24:18] (03PS2) 10Vgutierrez: ATS: Enable SSL_OP_PRIORITIZE_CHACHA on ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/591092 [16:24:57] (03PS2) 10Andrew Bogott: neutron common profiles: remove nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591091 [16:24:59] (03PS1) 10Andrew Bogott: clouddb2001-dev: remove a stray ) [puppet] - 10https://gerrit.wikimedia.org/r/591093 [16:28:11] (03CR) 10Vgutierrez: [C: 03+1] "pcc seems happy: https://puppet-compiler.wmflabs.org/compiler1002/22092/" [puppet] - 10https://gerrit.wikimedia.org/r/591092 (owner: 10Vgutierrez) [16:29:50] (03Abandoned) 10Andrew Bogott: neutron common profiles: remove nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591091 (owner: 10Andrew Bogott) [16:29:58] (03CR) 10Andrew Bogott: [C: 03+2] clouddb2001-dev: remove a stray ) [puppet] - 10https://gerrit.wikimedia.org/r/591093 (owner: 10Andrew Bogott) [16:41:29] (03PS1) 10Andrew Bogott: clouddb2001-dev: make a ferm rule a () list [puppet] - 10https://gerrit.wikimedia.org/r/591100 [16:45:04] (03CR) 10Andrew Bogott: [C: 03+2] clouddb2001-dev: make a ferm rule a () list [puppet] - 10https://gerrit.wikimedia.org/r/591100 (owner: 10Andrew Bogott) [16:50:50] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [16:51:22] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [16:52:56] (03CR) 10Ema: [C: 03+1] ATS: Enable SSL_OP_PRIORITIZE_CHACHA on ats-tls [puppet] - 10https://gerrit.wikimedia.org/r/591092 (owner: 10Vgutierrez) [16:54:04] (03PS1) 10Andrew Bogott: Neutron common: use openstack_controllers instead of nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591102 (https://phabricator.wikimedia.org/T249941) [16:56:50] (03PS2) 10Andrew Bogott: Neutron common: use openstack_controllers instead of nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591102 (https://phabricator.wikimedia.org/T249941) [16:56:52] (03CR) 10jerkins-bot: [V: 04-1] Neutron common: use openstack_controllers instead of nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591102 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [16:59:49] (03CR) 10jerkins-bot: [V: 04-1] Neutron common: use openstack_controllers instead of nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591102 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [17:00:05] gehel and onimisionipe: Dear deployers, time to do the Wikidata Query Service weekly deploy deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T1700). [17:01:23] (03PS3) 10Andrew Bogott: Neutron common: use openstack_controllers instead of nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591102 (https://phabricator.wikimedia.org/T249941) [17:04:43] (03CR) 10jerkins-bot: [V: 04-1] Neutron common: use openstack_controllers instead of nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591102 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [17:04:59] (03PS4) 10Andrew Bogott: Neutron common: use openstack_controllers instead of nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591102 (https://phabricator.wikimedia.org/T249941) [17:06:50] (03CR) 10Andrew Bogott: "pcc results at" [puppet] - 10https://gerrit.wikimedia.org/r/591102 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [17:07:22] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [17:07:24] PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 2/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [17:09:12] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [17:09:14] RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [17:12:34] (03PS1) 10Ema: purged: raise 'frontend_workers' [puppet] - 10https://gerrit.wikimedia.org/r/591108 (https://phabricator.wikimedia.org/T249583) [17:26:46] Urbanecm: I rebased and 'tweaked' https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/521181 – OK for me to merge? [17:27:04] James_F: absolutely! [17:27:13] (03PS31) 10Jforrester: Test if 2x logo version is 2 times bigger than 1x logo version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521181 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [17:27:19] Cool. [17:27:21] thanks [17:27:24] (03CR) 10Jforrester: [C: 03+2] Test if 2x logo version is 2 times bigger than 1x logo version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521181 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [17:27:28] Thanks for caring. :-) [17:27:40] Quite a few failing logos, sadly. [17:28:13] (03Merged) 10jenkins-bot: Test if 2x logo version is 2 times bigger than 1x logo version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521181 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [17:28:37] James_F: is there a follow up task or something? :-) [17:28:45] (for the failing ones) [17:28:55] Urbanecm: No, just an ugly list of exceptions in the test. [17:29:12] Including almost all the wikiquotes. [17:29:38] I'll create one then [17:29:49] Thanks. [17:32:45] James_F: T250731 [17:32:46] T250731: Change HD logos with incorrect size to match expectations - https://phabricator.wikimedia.org/T250731 [17:33:07] Thank you. [17:34:11] 10Operations, 10MediaWiki-Debug-Logger, 10Wikimedia-Logstash, 10observability, 10Patch-For-Review: MediaWiki logging & encryption - https://phabricator.wikimedia.org/T126989 (10Krinkle) >>! In T126989#5076715, @gerritbot wrote: > Change 498106 **merged** by Filippo Giunchedi: > [mediawiki/core@master] mo... [17:34:28] 10Operations, 10Analytics, 10Wikimedia-Logstash, 10observability: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10Krinkle) >>! In T126989#5076715, @gerritbot wrote: > Change 498106 **merged** by Filippo Giunchedi: > [mediawiki/core@... [17:34:29] yw [17:35:01] 10Operations, 10Analytics, 10Wikimedia-Logstash, 10observability, 10Performance-Team (Radar): Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10Krinkle) [17:38:51] 10Operations, 10Discovery-Search, 10Elasticsearch, 10Wikimedia-Logstash, 10observability: Migrate Elasticsearch from deprecated Gelf logstash input to rsyslog Kafka logging pipeline - https://phabricator.wikimedia.org/T225125 (10Gehel) a:05Mathew.onipe→03None [17:40:07] 10Operations, 10Discovery-Search, 10SDC General, 10Structured Data Engineering, and 2 others: Create CQS puppet configs by applying query_service module - https://phabricator.wikimedia.org/T237089 (10Gehel) a:05Mathew.onipe→03None [17:47:27] (03PS2) 10Jforrester: Do not update the globals cache file while opcache needs regeneration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575469 (https://phabricator.wikimedia.org/T236104) (owner: 10Giuseppe Lavagetto) [17:48:47] (03CR) 10Jforrester: [C: 03+1] "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575469 (https://phabricator.wikimedia.org/T236104) (owner: 10Giuseppe Lavagetto) [17:52:31] (03PS1) 10Ottomata: eventgate-logging-external - configure kaios_app.error stream [deployment-charts] - 10https://gerrit.wikimedia.org/r/591118 (https://phabricator.wikimedia.org/T250177) [17:57:16] Krinkle: You around? Going to deploy https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/575469 and would appreciate a sanity-check. [17:59:38] PROBLEM - SSH on mw1308.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:00:04] RoanKattouw, Niharika, and Urbanecm: How many deployers does it take to do Morning SWAT(Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T1800). [18:00:04] kemayo: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:21] Someday I hope to actually break them. [18:00:33] Hey Kemayo, I'll deploy. [18:00:35] hehe [18:00:51] James_F: 👍🏻 [18:00:51] (03PS2) 10Jforrester: DiscussionTools EditAttemptStepSamplingRate increase for some wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589641 (https://phabricator.wikimedia.org/T250086) (owner: 10DLynch) [18:00:52] James_F: LGTM still :) [18:01:00] double checked and trippled checked [18:01:11] 🤞🏽 [18:01:19] (03CR) 10Jforrester: [C: 03+2] DiscussionTools EditAttemptStepSamplingRate increase for some wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589641 (https://phabricator.wikimedia.org/T250086) (owner: 10DLynch) [18:01:35] Krinkle: Excellent. Thanks! [18:01:46] PROBLEM - SSH on ganeti1006.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [18:01:56] (03Merged) 10jenkins-bot: DiscussionTools EditAttemptStepSamplingRate increase for some wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/589641 (https://phabricator.wikimedia.org/T250086) (owner: 10DLynch) [18:02:59] Kemayo: Testable? It's on mwdebug1001. [18:03:23] James_F: It's out and having the expected effect. [18:04:05] Cool. [18:05:03] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: DiscussionTools: EditAttemptStepSamplingRate increase for some wikis T250086 (duration: 01m 10s) [18:05:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:05:10] T250086: Add config override for instrumentation sampling rate - https://phabricator.wikimedia.org/T250086 [18:06:10] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 02s) [18:06:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:14] Kemayo: Done. Please shout if there are issues. [18:06:22] James_F: Will do, thanks! [18:06:43] (03CR) 10Jforrester: [C: 03+2] Do not update the globals cache file while opcache needs regeneration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575469 (https://phabricator.wikimedia.org/T236104) (owner: 10Giuseppe Lavagetto) [18:07:36] (03Merged) 10jenkins-bot: Do not update the globals cache file while opcache needs regeneration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/575469 (https://phabricator.wikimedia.org/T236104) (owner: 10Giuseppe Lavagetto) [18:08:43] (03CR) 10Jason Linehan: [C: 03+1] "Looks good to me" [deployment-charts] - 10https://gerrit.wikimedia.org/r/591118 (https://phabricator.wikimedia.org/T250177) (owner: 10Ottomata) [18:10:39] James_F: perhaps you want to do https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MassMessage/+/591089 as well? Causes #wikimedia-production-error (through not an UBN) [18:11:59] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 02s) [18:12:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:12:10] *fixes, obv [18:12:48] OK, I feel moderately confident about this change. [18:14:21] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T236104 Wait to update the globals cache file for opcache regeneration (duration: 01m 02s) [18:14:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:14:27] T236104: Cache of wmf-config/InitialiseSettings often 1 step behind - https://phabricator.wikimedia.org/T236104 [18:14:44] Urbanecm: Sure. [18:15:06] thanks [18:17:55] (03PS3) 10Jforrester: The official name of Parsoid is 'Parsoid' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583394 (owner: 10C. Scott Ananian) [18:18:08] (03CR) 10Jforrester: [C: 03+2] The official name of Parsoid is 'Parsoid' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583394 (owner: 10C. Scott Ananian) [18:18:20] (03CR) 10Ottomata: [C: 03+2] eventgate-logging-external - configure kaios_app.error stream [deployment-charts] - 10https://gerrit.wikimedia.org/r/591118 (https://phabricator.wikimedia.org/T250177) (owner: 10Ottomata) [18:19:15] (03Merged) 10jenkins-bot: The official name of Parsoid is 'Parsoid' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583394 (owner: 10C. Scott Ananian) [18:19:43] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' . [18:19:43] !log otto@deploy1001 helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' . [18:19:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:50] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: Adjust dummy name of fake Parsoid extension to just 'Parsoid' (duration: 01m 01s) [18:20:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:07] (03PS3) 10Jforrester: Link to the phab task for VE/Parsoid being disabled on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583395 (owner: 10C. Scott Ananian) [18:21:15] (03CR) 10Jforrester: [C: 03+2] Link to the phab task for VE/Parsoid being disabled on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583395 (owner: 10C. Scott Ananian) [18:21:31] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' . [18:21:31] !log otto@deploy1001 helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' . [18:21:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:25] (03Merged) 10jenkins-bot: Link to the phab task for VE/Parsoid being disabled on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583395 (owner: 10C. Scott Ananian) [18:22:58] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' . [18:22:58] !log otto@deploy1001 helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' . [18:23:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:27] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Adjust Parsoid/VE disable comment for wikitechwiki (duration: 01m 02s) [18:27:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:08] (03PS1) 10Bstorm: toolforge-k8s: add a puppetized checkout of maintain-kubeusers [puppet] - 10https://gerrit.wikimedia.org/r/591121 [18:31:24] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [18:31:28] (03PS1) 10Jforrester: [testwiki] Force videojs-only mode for TimedMediaHandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591124 (https://phabricator.wikimedia.org/T248418) [18:31:54] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [18:33:34] (03CR) 10Jforrester: [C: 03+2] [testwiki] Force videojs-only mode for TimedMediaHandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591124 (https://phabricator.wikimedia.org/T248418) (owner: 10Jforrester) [18:34:29] (03Merged) 10jenkins-bot: [testwiki] Force videojs-only mode for TimedMediaHandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591124 (https://phabricator.wikimedia.org/T248418) (owner: 10Jforrester) [18:36:06] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.28/extensions/MassMessage/includes/SpecialEditMassMessageList.php: T250710 Follow-up 95c772864: Fix RevisionRecord calls that differ from Revision (duration: 01m 02s) [18:36:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:36:12] T250710: Call to undefined method MediaWiki\Revision\RevisionStoreRecord::getTitle() - https://phabricator.wikimedia.org/T250710 [18:38:29] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T248418 [testwiki] Force videojs-only mode for TimedMediaHandler (duration: 01m 01s) [18:38:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:38:36] T248418: Roll out videojs as the only video/audio player on all Wikimedia wikis - https://phabricator.wikimedia.org/T248418 [18:39:21] !log disabling puppet on all mcrouter hosts for cert renewal T248093 [18:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:27] T248093: Renew certs for mcrouter on all application servers. - https://phabricator.wikimedia.org/T248093 [18:39:44] (03PS1) 10Andrew Bogott: Neutron metadata agent: use keystone_api_fqdn for the metadata api [puppet] - 10https://gerrit.wikimedia.org/r/591125 (https://phabricator.wikimedia.org/T249941) [18:42:50] (03CR) 10jerkins-bot: [V: 04-1] Neutron metadata agent: use keystone_api_fqdn for the metadata api [puppet] - 10https://gerrit.wikimedia.org/r/591125 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [18:44:52] (03PS2) 10Andrew Bogott: Neutron metadata agent: use keystone_api_fqdn for the metadata api [puppet] - 10https://gerrit.wikimedia.org/r/591125 (https://phabricator.wikimedia.org/T249941) [18:48:21] (03CR) 10Andrew Bogott: [C: 03+2] Neutron common: use openstack_controllers instead of nova_controller params [puppet] - 10https://gerrit.wikimedia.org/r/591102 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [18:52:22] (03CR) 10Andrew Bogott: "pcc results: https://puppet-compiler.wmflabs.org/compiler1002/22102/" [puppet] - 10https://gerrit.wikimedia.org/r/591125 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [18:54:47] (03PS7) 10Dr0ptp4kt: Add Druid support for event.editattemptstep [puppet] - 10https://gerrit.wikimedia.org/r/587984 (https://phabricator.wikimedia.org/T249945) [18:56:35] 10Operations, 10ops-codfw, 10ops-eqiad, 10ops-eqsin, and 2 others: Audit & update spares part tracking for all sites - https://phabricator.wikimedia.org/T243450 (10Papaul) [18:56:52] (03PS1) 10Andrew Bogott: wmcs openstack haproxy: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591133 (https://phabricator.wikimedia.org/T249941) [19:02:26] RECOVERY - SSH on ganeti1006.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:03:56] (03CR) 10Andrew Bogott: "pcc results https://puppet-compiler.wmflabs.org/compiler1003/22104/" [puppet] - 10https://gerrit.wikimedia.org/r/591133 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [19:06:20] RECOVERY - SSH on mw1307.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:07:10] (03PS1) 10Andrew Bogott: profile::openstack::codfw1dev::keystone::fernet_keys: remove an unused param [puppet] - 10https://gerrit.wikimedia.org/r/591141 [19:07:43] (03CR) 10Jhedden: [C: 03+1] Neutron metadata agent: use keystone_api_fqdn for the metadata api [puppet] - 10https://gerrit.wikimedia.org/r/591125 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [19:09:11] (03CR) 10Andrew Bogott: [C: 03+2] Neutron metadata agent: use keystone_api_fqdn for the metadata api [puppet] - 10https://gerrit.wikimedia.org/r/591125 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [19:12:45] (03CR) 10Andrew Bogott: [C: 03+2] profile::openstack::codfw1dev::keystone::fernet_keys: remove an unused param [puppet] - 10https://gerrit.wikimedia.org/r/591141 (owner: 10Andrew Bogott) [19:17:50] (03PS1) 10EBernhardson: Revert "cirrus: redirect more_like to codfw to rebuild query cache" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591146 [19:17:52] (03PS8) 10Andrew Bogott: cloud-vps puppetmasters: use openstack_controllers instead of nova_controller [puppet] - 10https://gerrit.wikimedia.org/r/591086 (https://phabricator.wikimedia.org/T249941) [19:27:30] (03PS1) 10Andrew Bogott: Openstack: remove config for version Pike [puppet] - 10https://gerrit.wikimedia.org/r/591149 (https://phabricator.wikimedia.org/T249058) [19:32:27] (03CR) 10Andrew Bogott: [C: 03+2] Openstack: remove config for version Pike [puppet] - 10https://gerrit.wikimedia.org/r/591149 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [19:35:10] (03PS2) 10Jforrester: Revert "cirrus: redirect more_like to codfw to rebuild query cache" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591146 (owner: 10EBernhardson) [19:37:14] (03CR) 10Jforrester: [C: 03+2] Revert "cirrus: redirect more_like to codfw to rebuild query cache" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591146 (owner: 10EBernhardson) [19:38:24] (03Merged) 10jenkins-bot: Revert "cirrus: redirect more_like to codfw to rebuild query cache" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591146 (owner: 10EBernhardson) [19:40:36] !log mcrouter certs renewed on puppetmaster1001; puppet re-enabled on mcrouter hosts and will update certs naturally over the next 30m T248093 [19:40:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:43] T248093: Renew certs for mcrouter on all application servers. - https://phabricator.wikimedia.org/T248093 [19:42:04] (03PS3) 10Jforrester: [huwiki] Set wgFlaggedRevsOverride back to true, test period completed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496205 (https://phabricator.wikimedia.org/T210224) (owner: 10Mahveotm) [19:43:18] (03CR) 10Jforrester: "I've re-done this patch to apply to current master." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/496205 (https://phabricator.wikimedia.org/T210224) (owner: 10Mahveotm) [19:43:45] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: cirrus: Move more_like from codfw back to eqiad, rebuild complete (duration: 01m 03s) [19:43:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:35] !log jforrester@deploy1001 Synchronized wmf-config/PoolCounterSettings.php: Revert CirrusSearch-MoreLike pool conter numbers now rebuild is done (duration: 01m 01s) [19:45:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:30] (03PS1) 10Jdlrobson: Update project wordmarks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591152 (https://phabricator.wikimedia.org/T249047) [19:50:12] PROBLEM - Widespread puppet agent failures- no resources reported on icinga1001 is CRITICAL: 0.01018 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [19:50:36] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [19:51:58] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [19:53:20] o/ if any ops can re enable traffic to eqiad wdqs that would be great! [19:53:21] rzl: looks like there's a bunch of puppet failures on appserver and api_appserver [19:53:28] oh hia cdanis :P [19:53:33] !log cdanis@cumin1001 conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad [19:53:34] cdanis: thanks, looking [19:53:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:48] !log pool wdqs1006 again (caught up) [19:53:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:00] ty [19:54:11] addshore: if that server is going ot keep being problematic, you could lower its weight some [19:54:21] (assuming fewer reads will help it keep up with writes) [19:54:42] > Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, secret(): invalid secret mcrouter/mw2352.codfw.wmnet/mw2352.codfw.wmnet.crt.pem (file: /etc/puppet/modules/profile/manifests/mediawiki/mcrouter_wancache.pp, line: 96, column: 24) on node mw2352.codfw.wmnet [19:54:47] that's definitely my change, reverting it now [19:55:05] Not sure if it would help much, its happening to all of them apparently (deadlock), id have to check with discovery! [19:56:20] ack [19:56:38] rzl: lmk if you need help [19:56:59] nah easy revert but thanks [19:57:21] it turns out /etc/cergen/mcrouter.manifests.d/mediawiki-hosts.certs.yaml was lacking some hosts, so instead of getting their certs renewed, they were deleted [19:58:02] 😬 [19:58:56] * halfak waits patiently for the service deployment window. [19:59:06] (03PS1) 10Andrew Bogott: Remove nova_api_host params from places where it wasn't doing anything [puppet] - 10https://gerrit.wikimedia.org/r/591155 (https://phabricator.wikimedia.org/T249941) [19:59:07] We have an ORES change going out. [19:59:11] (03PS1) 10Andrew Bogott: nova fullstack: use openstack_controllers instead of nova_api_host [puppet] - 10https://gerrit.wikimedia.org/r/591156 (https://phabricator.wikimedia.org/T249941) [19:59:13] (03PS1) 10Andrew Bogott: nova.conf: remove ec2_dmz_host and ec2_url [puppet] - 10https://gerrit.wikimedia.org/r/591157 [19:59:15] (03PS1) 10Andrew Bogott: wmcs: remove nova_api_host [puppet] - 10https://gerrit.wikimedia.org/r/591158 (https://phabricator.wikimedia.org/T249941) [20:00:05] halfak and accraze: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Graphoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T2000). [20:00:30] Here we go. Deploying ORES. [20:00:35] (03PS2) 10Jdlrobson: Update project wordmarks and icons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591152 (https://phabricator.wikimedia.org/T249047) [20:00:35] !log halfak@deploy1001 Started deploy [ores/deploy@514f94a]: T250536 [20:00:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:41] (03PS1) 10Jdlrobson: Add project taglines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591160 (https://phabricator.wikimedia.org/T249047) [20:00:41] T250536: Mid-April 2020 ORES deployment - https://phabricator.wikimedia.org/T250536 [20:00:43] (03CR) 10Nuria: [C: 03+1] Add Druid support for event.editattemptstep [puppet] - 10https://gerrit.wikimedia.org/r/587984 (https://phabricator.wikimedia.org/T249945) (owner: 10Dr0ptp4kt) [20:01:08] RECOVERY - SSH on mw1308.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [20:02:37] jouncebot: next [20:02:37] In 0 hour(s) and 57 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T2100) [20:02:39] cdanis: revert committed, thanks for the ping -- should I manually run puppet on the failed hosts or just let them succeed on the next cycle? [20:02:53] rzl: if they're still serving fine, whatever [20:03:05] you know the cumin oneliner right? [20:03:23] I'd have to look up the flag names but I know the basics [20:03:26] sudo cumin -b15 'A:appserver' 'run-puppet-agent -q --failed-only' [20:03:42] C:mcrouter in this case, just in case it comes up [20:03:45] but, v helpful thanks [20:03:51] TIL --failed-only [20:04:02] very useful (also written down on [[wikitech:cumin]]) [20:04:38] :) [20:08:41] so it looks like the root cause is when we add new mcrouter hosts, we create certs for them but we don't add them to that config file, so cergen doesn't re-create them the next time around [20:09:00] I'll figure out whether this is a procedure problem or an automation problem and then retry tomorrow or next week [20:09:12] 10Operations, 10ops-eqiad: restbase1025 reported DIMM issues in getsel - https://phabricator.wikimedia.org/T250027 (10Cmjohnson) Ticket opened with Dell, SR1023451111 [20:12:13] (03CR) 10Andrew Bogott: [C: 03+2] Remove nova_api_host params from places where it wasn't doing anything [puppet] - 10https://gerrit.wikimedia.org/r/591155 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:14:40] !log halfak@deploy1001 Finished deploy [ores/deploy@514f94a]: T250536 (duration: 14m 06s) [20:14:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:46] T250536: Mid-April 2020 ORES deployment - https://phabricator.wikimedia.org/T250536 [20:14:57] (03CR) 10Andrew Bogott: [C: 03+2] nova fullstack: use openstack_controllers instead of nova_api_host [puppet] - 10https://gerrit.wikimedia.org/r/591156 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:15:34] (03CR) 10Andrew Bogott: [C: 03+2] nova.conf: remove ec2_dmz_host and ec2_url [puppet] - 10https://gerrit.wikimedia.org/r/591157 (owner: 10Andrew Bogott) [20:16:49] (03CR) 10Andrew Bogott: [C: 03+2] wmcs: remove nova_api_host [puppet] - 10https://gerrit.wikimedia.org/r/591158 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:17:11] (03CR) 10VolkerE: [C: 04-1] apereo_cas: update templates login page (031 comment) [software/cas-overlay-template] - 10https://gerrit.wikimedia.org/r/587538 (https://phabricator.wikimedia.org/T233939) (owner: 10Jbond) [20:21:45] Looks like all is fine with ORES. [20:28:58] RECOVERY - Widespread puppet agent failures- no resources reported on icinga1001 is OK: (C)0.01 ge (W)0.006 ge 0.003181 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [20:30:13] (03PS1) 10Andrew Bogott: nova.conf: remove a couple of unused settings [puppet] - 10https://gerrit.wikimedia.org/r/591166 [20:30:15] (03PS1) 10Andrew Bogott: nova.conf: remove [glance] settings [puppet] - 10https://gerrit.wikimedia.org/r/591167 (https://phabricator.wikimedia.org/T249941) [20:31:06] (03CR) 10Andrew Bogott: [C: 03+2] nova.conf: remove a couple of unused settings [puppet] - 10https://gerrit.wikimedia.org/r/591166 (owner: 10Andrew Bogott) [20:34:28] (03CR) 10Andrew Bogott: [C: 03+2] nova.conf: remove [glance] settings [puppet] - 10https://gerrit.wikimedia.org/r/591167 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:34:36] PROBLEM - Logstash Elasticsearch indexing errors on icinga1001 is CRITICAL: 8.154 ge 8 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash [20:44:48] (03PS1) 10Andrew Bogott: keystone: remove glance_api firewall rule [puppet] - 10https://gerrit.wikimedia.org/r/591173 (https://phabricator.wikimedia.org/T249941) [20:44:50] (03PS1) 10Andrew Bogott: wmcs hiera: remove glance_host values [puppet] - 10https://gerrit.wikimedia.org/r/591174 (https://phabricator.wikimedia.org/T249941) [20:48:00] (03CR) 10Andrew Bogott: [C: 03+2] keystone: remove glance_api firewall rule [puppet] - 10https://gerrit.wikimedia.org/r/591173 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:48:13] (03CR) 10Andrew Bogott: [C: 03+2] wmcs hiera: remove glance_host values [puppet] - 10https://gerrit.wikimedia.org/r/591174 (https://phabricator.wikimedia.org/T249941) (owner: 10Andrew Bogott) [20:53:11] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Sam Walton - https://phabricator.wikimedia.org/T250189 (10CDanis) Hi Sam, Looks like you already have an account on wikitech (`Samwalton`), so I've given that account membership in the `wmf` group. You should have Superset access with it... [20:55:27] (03PS1) 10CDanis: admin: add samwalton to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/591177 (https://phabricator.wikimedia.org/T250189) [20:56:18] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (bad URL) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [20:57:41] 10Operations, 10serviceops: Renew certs for mcrouter on all application servers. - https://phabricator.wikimedia.org/T248093 (10RLazarus) The renewal script works as expected, but the procedure as written caused problems because not every mcrouter host is listed in `/etc/cergen/mcrouter.manifests.d/mediawiki-h... [21:00:00] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [21:00:04] Reedy and sbassett: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T2100). [21:01:21] (03CR) 10CDanis: [C: 03+2] admin: add samwalton to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/591177 (https://phabricator.wikimedia.org/T250189) (owner: 10CDanis) [21:02:11] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Sam Walton - https://phabricator.wikimedia.org/T250189 (10CDanis) 05Open→03Resolved a:03CDanis Please re-open if this doesn't work for you! [21:06:09] (03CR) 10BryanDavis: [C: 03+1] "Sounds reasonable to me. The only other way I could think to manage this would be with scap3, but that doesn't seem necessary at this time" [puppet] - 10https://gerrit.wikimedia.org/r/591121 (owner: 10Bstorm) [21:17:41] (03CR) 10Bstorm: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/591121 (owner: 10Bstorm) [21:17:58] (03CR) 10Bstorm: [C: 03+2] toolforge-k8s: add a puppetized checkout of maintain-kubeusers [puppet] - 10https://gerrit.wikimedia.org/r/591121 (owner: 10Bstorm) [21:21:33] James_F thanks for backporting the MassMessage fix [21:21:52] Here for SWAT once I find the right password for wiki [21:23:40] DannyS712: Happy to help. Thanks for fixing [21:24:24] no problem; I know we're moving fast trying to fully hard deprecate the Revision class before 1.35; I'm just glad it was noticed by me reading through the code rather than a user being impacted [21:25:28] Yeah. [21:31:02] (03PS1) 10Bstorm: toolforge-k8s: make sure directory exists before checking out to it [puppet] - 10https://gerrit.wikimedia.org/r/591184 [21:36:29] (03PS4) 10RhinosF1: Add 'media.api.aucklandmuseum.com' to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590983 (https://phabricator.wikimedia.org/T250646) [21:44:24] RECOVERY - Logstash Elasticsearch indexing errors on icinga1001 is OK: (C)8 ge (W)1 ge 0.09583 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash [21:48:59] (03CR) 10Bstorm: [C: 03+2] toolforge-k8s: make sure directory exists before checking out to it [puppet] - 10https://gerrit.wikimedia.org/r/591184 (owner: 10Bstorm) [21:53:43] (03PS1) 10Papaul: DNS: Add mgmt and production DNS for cloudcontrol2004-dev [dns] - 10https://gerrit.wikimedia.org/r/591194 [21:54:10] (03CR) 10jerkins-bot: [V: 04-1] DNS: Add mgmt and production DNS for cloudcontrol2004-dev [dns] - 10https://gerrit.wikimedia.org/r/591194 (owner: 10Papaul) [22:07:53] (03PS2) 10Papaul: DNS: Add mgmt and production DNS for cloudcontrol2004-dev [dns] - 10https://gerrit.wikimedia.org/r/591194 [22:24:19] (03PS1) 10Jhedden: cloudvps: metricsinfra add prometheus alert manager and email notifications [puppet] - 10https://gerrit.wikimedia.org/r/591202 (https://phabricator.wikimedia.org/T250206) [22:27:52] (03CR) 10Jhedden: cloudvps: metricsinfra add prometheus alert manager and email notifications (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/591202 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [22:29:02] (03CR) 10Jhedden: cloudvps: metricsinfra add prometheus alert manager and email notifications (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/591202 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [22:43:40] (03CR) 10Alex Monk: cloudvps: metricsinfra add prometheus alert manager and email notifications (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/591202 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [22:44:24] (03PS2) 10Jdlrobson: Add project taglines [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591160 (https://phabricator.wikimedia.org/T249047) [23:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200420T2300). [23:00:04] RhinosF1, ebernhardson, and Jdlrobson: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:18] * RhinosF1 here, can I go first? [23:01:01] o/ [23:02:18] * RhinosF1 eyes around for a deployer [23:06:15] RoanKattouw: are you around [23:06:25] Yes, sorry, I can do the SWAT [23:06:48] Thanks! [23:07:06] (03CR) 10Catrope: [C: 03+2] Add 'media.api.aucklandmuseum.com' to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590983 (https://phabricator.wikimedia.org/T250646) (owner: 10RhinosF1) [23:07:29] thanks RoanKattouw [23:08:09] (03Merged) 10jenkins-bot: Add 'media.api.aucklandmuseum.com' to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590983 (https://phabricator.wikimedia.org/T250646) (owner: 10RhinosF1) [23:09:57] RhinosF1: Your change is on mwdebug1002, please test there [23:10:23] on it [23:10:27] ebernhardson: You around for SWAT? You scheduled a change moving more_like traffic back to eqiad [23:11:31] (03PS3) 10Catrope: Update project wordmarks and icons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591152 (https://phabricator.wikimedia.org/T249047) (owner: 10Jdlrobson) [23:12:51] RoanKattouw: LGTM [23:13:25] (03CR) 10Catrope: [C: 03+2] Update project wordmarks and icons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591152 (https://phabricator.wikimedia.org/T249047) (owner: 10Jdlrobson) [23:14:23] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add media.api.aucklandmuseum.com to $wgCopyUploadsDomains (T250646) (duration: 01m 08s) [23:14:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:30] (03Merged) 10jenkins-bot: Update project wordmarks and icons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/591152 (https://phabricator.wikimedia.org/T249047) (owner: 10Jdlrobson) [23:14:31] T250646: Add media.api.aucklandmuseum.com to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T250646 [23:15:26] Jdlrobson: On mwdebug1002, please test [23:15:35] RoanKattouw: ty, night! [23:16:07] k this will take a bit of time [23:16:11] will let you know when all good [23:17:11] Yeah it's a complex one, and deploying it will be two syncs with a cache purge in between I think [23:17:31] (sync the images dir, purge cache for the URLs that already existed, sync InitialiseSettings) [23:22:25] RoanKattouw: just checking one logo [23:22:26] almost done [23:24:46] (03CR) 10Jhedden: cloudvps: metricsinfra add prometheus alert manager and email notifications (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/591202 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [23:25:03] RoanKattouw: we're good. Please sync :) [23:25:05] (03PS2) 10Jhedden: cloudvps: metricsinfra add prometheus alert manager and email notifications [puppet] - 10https://gerrit.wikimedia.org/r/591202 (https://phabricator.wikimedia.org/T250206) [23:26:00] (03CR) 10Jhedden: cloudvps: metricsinfra add prometheus alert manager and email notifications (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/591202 (https://phabricator.wikimedia.org/T250206) (owner: 10Jhedden) [23:27:17] !log catrope@deploy1001 Synchronized static/images/mobile/: Update project wordmarks and icons (T249047) (duration: 01m 02s) [23:27:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:24] T249047: [Site Config] Make new logos available in production in preparation for T246170 - https://phabricator.wikimedia.org/T249047 [23:29:04] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Update project wordmarks and icons (T249047) (duration: 01m 01s) [23:29:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:48] PROBLEM - Host mw1309.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [23:33:28] RECOVERY - Host mw1309.mgmt is UP: PING WARNING - Packet loss = 77%, RTA = 1.59 ms [23:38:27] thx RoanKattouw for the deploy [23:38:44] (As always! :)) [23:53:17] 10Operations, 10MediaWiki-Shell, 10Wikimedia-General-or-Unknown, 10Security: Securing external binaries run by MediaWiki - https://phabricator.wikimedia.org/T172584 (10Reedy)