[00:16:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2098 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 958.67 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[00:30:03] <wikibugs>	 (03PS2) 10Bstorm: host monitoring: add optional contact group for mgmt interfaces [puppet] - 10https://gerrit.wikimedia.org/r/543916 (https://phabricator.wikimedia.org/T223458)
[00:35:27] <wikibugs>	 (03CR) 10CRusnov: [C: 03+1] "This LGTM, would like additional sign-off." [puppet] - 10https://gerrit.wikimedia.org/r/543252 (https://phabricator.wikimedia.org/T235458) (owner: 10Brian Wolff)
[00:37:51] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T230575 (10Bstorm) T229156 according to that ticket, this is the disk that came from Dell for that ticket.
[00:45:39] <wikibugs>	 (03PS1) 10CRusnov: mailman: add alias and redirect for multimedia-team [puppet] - 10https://gerrit.wikimedia.org/r/545122 (https://phabricator.wikimedia.org/T235550)
[00:47:12] <wikibugs>	 (03CR) 10CRusnov: "We shall need this merged shortly after the rename is executed." [puppet] - 10https://gerrit.wikimedia.org/r/545122 (https://phabricator.wikimedia.org/T235550) (owner: 10CRusnov)
[00:51:29] <paladox>	 Hmm gerrit-replication seems down?
[00:51:30] <paladox>	 Oh!
[00:51:32] <paladox>	 Misplet 
[00:54:22] <wikibugs>	 (03PS1) 10CRusnov: netbox: Enable CSV dump rotations. [puppet] - 10https://gerrit.wikimedia.org/r/545123
[01:02:40] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T230575 (10Bstorm) There are two disks in there with the larger size.  {F30874039}  Note that the size matches this: T229156#5399581 -- which is this disk replaced in the last ticket.  This suggests the disk was repla...
[01:09:36] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T230575 (10Bstorm) Service request 986376069 does not show anything terribly useful. Since T229156 shows the disk at its current size, I have to imagine that Dell sent us larger disks during that request.  I thought t...
[01:11:23] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1018 - https://phabricator.wikimedia.org/T230575 (10Bstorm) @wiki_willy I think we need to follow up with Dell about that.  They should have some kind of tracking on the disk serial numbers, etc. that they have been sending us, right?
[01:50:37] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2098 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[02:01:22] <wikibugs>	 (03PS3) 10Huji: Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[02:01:41] <wikibugs>	 (03CR) 10Huji: [C: 03+1] "This can be merged now." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[02:02:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[02:13:37] <icinga-wm>	 PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 45 probes of 470 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[02:15:40] <wikibugs>	 (03CR) 10DannyS712: "Error: /src/wmf-config/InitialiseSettings.php should not be executable" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[02:19:11] <icinga-wm>	 RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 25 probes of 470 (alerts on 35) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[03:42:50] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to port 4443 on cp3047 [puppet] - 10https://gerrit.wikimedia.org/r/545127 (https://phabricator.wikimedia.org/T231433)
[03:42:52] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp3047 [puppet] - 10https://gerrit.wikimedia.org/r/545128 (https://phabricator.wikimedia.org/T231433)
[03:43:59] <vgutierrez>	 !log Switch from nginx to ats-tls on cp3047 - T231433
[03:44:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:44:04] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[03:44:33] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to port 4443 on cp3047 [puppet] - 10https://gerrit.wikimedia.org/r/545127 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[03:46:23] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:46:24] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to port 443 on cp3047 [puppet] - 10https://gerrit.wikimedia.org/r/545128 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[03:49:37] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[03:52:48] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez)
[04:11:34] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Set nginx on port 4443 for cache upload on esams [puppet] - 10https://gerrit.wikimedia.org/r/545129 (https://phabricator.wikimedia.org/T231433)
[04:11:36] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Set ats-tls on port 443 for cache upload nodes on esams [puppet] - 10https://gerrit.wikimedia.org/r/545130 (https://phabricator.wikimedia.org/T231433)
[04:18:48] <vgutierrez>	 !log Switch from nginx to ats-tls on cp3049 - T231433
[04:18:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:18:53] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[04:19:08] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] "pcc shows a NOOP for the whole cluster and the expected changes on cp3049: https://puppet-compiler.wmflabs.org/compiler1001/18969/" [puppet] - 10https://gerrit.wikimedia.org/r/545129 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[04:21:13] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] "pcc shows a NOOP for the whole cluster and the expected changes for cp3049: https://puppet-compiler.wmflabs.org/compiler1002/18970/" [puppet] - 10https://gerrit.wikimedia.org/r/545130 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[04:27:16] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez)
[04:30:12] <wikibugs>	 (03PS1) 10CRusnov: coherence: Check unracked devices for connected console ports [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/545132
[04:30:16] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to port 4443 on cp2024 [puppet] - 10https://gerrit.wikimedia.org/r/545133 (https://phabricator.wikimedia.org/T231433)
[04:30:18] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp2024 [puppet] - 10https://gerrit.wikimedia.org/r/545134 (https://phabricator.wikimedia.org/T231433)
[04:30:48] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] coherence: Check unracked devices for connected console ports [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/545132 (owner: 10CRusnov)
[04:30:58] <vgutierrez>	 !log Switch from nginx to ats-tls on cp2024 - T231433
[04:31:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:31:02] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[04:31:30] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to port 4443 on cp2024 [puppet] - 10https://gerrit.wikimedia.org/r/545133 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[04:32:46] <wikibugs>	 10Operations: Puppet breakage in automation-framework VMs - https://phabricator.wikimedia.org/T234452 (10crusnov) This should be fixed now.
[04:35:10] <wikibugs>	 (03PS2) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp2024 [puppet] - 10https://gerrit.wikimedia.org/r/545134 (https://phabricator.wikimedia.org/T231433)
[04:35:12] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Move cp2024.yaml to the proper directory [puppet] - 10https://gerrit.wikimedia.org/r/545136 (https://phabricator.wikimedia.org/T231433)
[04:36:09] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Move cp2024.yaml to the proper directory [puppet] - 10https://gerrit.wikimedia.org/r/545136 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[04:37:56] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to port 443 on cp2024 [puppet] - 10https://gerrit.wikimedia.org/r/545134 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[04:43:26] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez)
[04:50:19] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Set nginx on port 4443 for cache upload on codfw [puppet] - 10https://gerrit.wikimedia.org/r/545138 (https://phabricator.wikimedia.org/T231433)
[04:50:21] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Set ats-tls on port 443 for cache upload nodes on codfw [puppet] - 10https://gerrit.wikimedia.org/r/545139 (https://phabricator.wikimedia.org/T231433)
[04:58:06] <vgutierrez>	 !log Switch from nginx to ats-tls on cp2026 - T231433
[04:58:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:58:10] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[04:58:16] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] "pcc shows a NOOP for the whole cluster and the expected changes on cp2026: https://puppet-compiler.wmflabs.org/compiler1002/18971/" [puppet] - 10https://gerrit.wikimedia.org/r/545138 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[04:59:59] <wikibugs>	 (03CR) 10Vgutierrez: "pcc shows a NOOP for the whole cluster and the expected changes on cp2026: https://puppet-compiler.wmflabs.org/compiler1002/18972/" [puppet] - 10https://gerrit.wikimedia.org/r/545139 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[05:00:10] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Set ats-tls on port 443 for cache upload nodes on codfw [puppet] - 10https://gerrit.wikimedia.org/r/545139 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[05:00:49] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2084:3314 after compression', diff saved to https://phabricator.wikimedia.org/P9420 and previous config saved to /var/cache/conftool/dbconfig/20191022-050048-marostegui.json
[05:00:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:02:06] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db2089:3315 for compression T235599', diff saved to https://phabricator.wikimedia.org/P9421 and previous config saved to /var/cache/conftool/dbconfig/20191022-050204-marostegui.json
[05:02:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:02:09] <stashbot>	 T235599: Recompress special slaves across eqiad and codfw - https://phabricator.wikimedia.org/T235599
[05:05:14] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez)
[05:06:28] <wikibugs>	 (03PS1) 10Marostegui: report_users: Remove dbproxy1004,dbproxy1009 [software] - 10https://gerrit.wikimedia.org/r/545142 (https://phabricator.wikimedia.org/T231280)
[05:07:57] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to port 4443 on cp1088 [puppet] - 10https://gerrit.wikimedia.org/r/545143 (https://phabricator.wikimedia.org/T231433)
[05:07:59] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp1088 [puppet] - 10https://gerrit.wikimedia.org/r/545144 (https://phabricator.wikimedia.org/T231433)
[05:08:01] <vgutierrez>	 !log Switch from nginx to ats-tls on cp1088 - T231433
[05:08:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:08:05] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[05:08:34] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to port 4443 on cp1088 [puppet] - 10https://gerrit.wikimedia.org/r/545143 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[05:09:10] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] report_users: Remove dbproxy1004,dbproxy1009 [software] - 10https://gerrit.wikimedia.org/r/545142 (https://phabricator.wikimedia.org/T231280) (owner: 10Marostegui)
[05:09:34] <wikibugs>	 (03Merged) 10jenkins-bot: report_users: Remove dbproxy1004,dbproxy1009 [software] - 10https://gerrit.wikimedia.org/r/545142 (https://phabricator.wikimedia.org/T231280) (owner: 10Marostegui)
[05:10:02] <wikibugs>	 (03PS4) 10Marostegui: db-eqiad.php: Temporary pool pc1010 in pc1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542890 (https://phabricator.wikimedia.org/T227142)
[05:10:10] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to port 443 on cp1088 [puppet] - 10https://gerrit.wikimedia.org/r/545144 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[05:14:50] <icinga-wm>	 PROBLEM - Ensure trafficserver_exporter is running for instance tls on cp2025 is CRITICAL: PROCS CRITICAL: 0 processes with args /usr/bin/python3 /usr/bin/prometheus-trafficserver-exporter --no-procstats --no-ssl-verification --endpoint https://127.0.0.1:443/_stats --port 9322 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[05:14:56] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2060.codfw.wmnet - https://phabricator.wikimedia.org/T231625 (10Marostegui)
[05:15:04] <icinga-wm>	 PROBLEM - HTTPS Unified RSA on cp2025 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS
[05:15:20] <icinga-wm>	 PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp2025 is CRITICAL: connect to address 10.192.48.29 and port 9322: Connection refused https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[05:15:36] <icinga-wm>	 PROBLEM - Check systemd state on cp2025 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:15:52] <wikibugs>	 (03PS1) 10Marostegui: wmnet: Remove db2060 DNS production entries [dns] - 10https://gerrit.wikimedia.org/r/545150 (https://phabricator.wikimedia.org/T231625)
[05:16:36] <logmsgbot>	 !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission
[05:16:38] <wikibugs>	 10Operations, 10Traffic: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez)
[05:16:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:16:43] <wikibugs>	 (03PS1) 10Marostegui: site.pp: Remove puppet references for db2060 [puppet] - 10https://gerrit.wikimedia.org/r/545151 (https://phabricator.wikimedia.org/T231625)
[05:16:46] <logmsgbot>	 !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
[05:16:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:16:54] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: Decommission db2060.codfw.wmnet - https://phabricator.wikimedia.org/T231625 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db2060.codfw.wmnet` -  db2060.codfw.wmnet (**PASS**)...
[05:17:28] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] site.pp: Remove puppet references for db2060 [puppet] - 10https://gerrit.wikimedia.org/r/545151 (https://phabricator.wikimedia.org/T231625) (owner: 10Marostegui)
[05:17:49] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] wmnet: Remove db2060 DNS production entries [dns] - 10https://gerrit.wikimedia.org/r/545150 (https://phabricator.wikimedia.org/T231625) (owner: 10Marostegui)
[05:18:12] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2060.codfw.wmnet - https://phabricator.wikimedia.org/T231625 (10Marostegui) a:05RobH→03Papaul
[05:18:27] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2060.codfw.wmnet - https://phabricator.wikimedia.org/T231625 (10Marostegui) Host ready for on-site and switch disablement steps
[05:18:50] <icinga-wm>	 PROBLEM - ats-tls HTTPS en.wikipedia.org RSA on cp2025 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS
[05:19:12] <icinga-wm>	 PROBLEM - ats-tls HTTPS en.wikipedia.org ECDSA on cp2025 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS
[05:19:23] <vgutierrez>	 hmm that's not expected
[05:19:25] * vgutierrez checking
[05:20:15] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10Marostegui)
[05:20:17] <vgutierrez>	 wonderful.. I didn't have cp2025 listed on T231433
[05:20:18] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[05:20:42] <vgutierrez>	 !log depooling cp2025 to fix ATS/nginx configuration - T231433
[05:20:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:21:15] <wikibugs>	 (03PS1) 10Gergő Tisza: Set GrowthExperiments task suggester config on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545155 (https://phabricator.wikimedia.org/T234426)
[05:22:04] <icinga-wm>	 RECOVERY - ats-tls HTTPS en.wikipedia.org RSA on cp2025 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345584 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2019-11-22 07:59:59 +0000 (expires in 31 days) https://wikitech.wikimedia.org/wiki/HTTPS
[05:22:22] <icinga-wm>	 RECOVERY - Ensure trafficserver_exporter is running for instance tls on cp2025 is OK: PROCS OK: 1 process with args /usr/bin/python3 /usr/bin/prometheus-trafficserver-exporter --no-procstats --no-ssl-verification --endpoint https://127.0.0.1:443/_stats --port 9322 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[05:22:30] <icinga-wm>	 RECOVERY - ats-tls HTTPS en.wikipedia.org ECDSA on cp2025 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345558 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2019-11-22 07:59:59 +0000 (expires in 31 days) https://wikitech.wikimedia.org/wiki/HTTPS
[05:22:40] <icinga-wm>	 RECOVERY - HTTPS Unified RSA on cp2025 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 345548 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (RSA) valid until 2019-11-22 07:59:59 +0000 (expires in 31 days) https://wikitech.wikimedia.org/wiki/HTTPS
[05:22:58] <icinga-wm>	 RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp2025 is OK: HTTP OK: HTTP/1.0 200 OK - 19521 bytes in 0.119 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server
[05:23:50] <icinga-wm>	 RECOVERY - Check systemd state on cp2025 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[05:24:27] <vgutierrez>	 !log repooling cp2025 - T231433
[05:24:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:24:59] <wikibugs>	 10Operations, 10Traffic: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez)
[05:26:37] <wikibugs>	 10Operations, 10Performance-Team, 10Wikimedia-General-or-Unknown, 10serviceops: Investigate recurrent latency spikes for the MediaWiki appservers - https://phabricator.wikimedia.org/T235872 (10jijiki)
[05:27:10] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Set nginx on port 4443 for cache upload on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/545156 (https://phabricator.wikimedia.org/T231433)
[05:27:12] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Set ats-tls on port 443 for cache upload nodes on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/545157 (https://phabricator.wikimedia.org/T231433)
[05:28:51] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: Parsoid/PHP: Load the extension on all Parsoid nodes (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544878 (https://phabricator.wikimedia.org/T235898) (owner: 10Mobrovac)
[05:28:56] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops: ESAMS Refresh/Rebuild (October 2019) - https://phabricator.wikimedia.org/T235805 (10Papaul)
[05:31:13] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "As I already suggested previously, this is the wrong approach." [puppet] - 10https://gerrit.wikimedia.org/r/544864 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli)
[05:32:02] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops: ESAMS Refresh/Rebuild (October 2019) - https://phabricator.wikimedia.org/T235805 (10ayounsi)
[05:32:05] <wikibugs>	 (03PS2) 10Vgutierrez: hiera: Set ats-tls on port 443 for cache upload nodes on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/545157 (https://phabricator.wikimedia.org/T231433)
[05:32:21] <wikibugs>	 (03PS1) 10Marostegui: mariadb: Set db1070 to spare [puppet] - 10https://gerrit.wikimedia.org/r/545158 (https://phabricator.wikimedia.org/T235464)
[05:32:58] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1070 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545159 (https://phabricator.wikimedia.org/T235464)
[05:33:47] <vgutierrez>	 !log Switch from nginx to ats-tls on cp1090 - T231433
[05:33:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:33:51] <stashbot>	 T231433: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433
[05:33:55] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] "pcc shows a NOOP on the whole cluster and the expected changes on cp1090: https://puppet-compiler.wmflabs.org/compiler1002/18973/" [puppet] - 10https://gerrit.wikimedia.org/r/545156 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[05:34:56] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Remove db1070 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545159 (https://phabricator.wikimedia.org/T235464) (owner: 10Marostegui)
[05:35:40] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1070 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545159 (https://phabricator.wikimedia.org/T235464) (owner: 10Marostegui)
[05:35:51] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] "pcc shows a generalized NOOP and the expected changes on cp1076 and cp1090: https://puppet-compiler.wmflabs.org/compiler1001/18974/" [puppet] - 10https://gerrit.wikimedia.org/r/545157 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[05:39:01] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db1070 from config T235464 (duration: 00m 53s)
[05:39:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:39:06] <stashbot>	 T235464: decommission db1070.eqiad.wmnet - https://phabricator.wikimedia.org/T235464
[05:40:01] <marostegui>	 !log Remove db1070 from tendril and zarcillo - T235464
[05:40:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:40:09] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db1070 from config T235464 (duration: 00m 51s)
[05:40:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:41:55] <marostegui>	 !log Stop mysql on db1070 - T235464
[05:41:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:42:43] <wikibugs>	 10Operations, 10DBA: decommission db1070.eqiad.wmnet - https://phabricator.wikimedia.org/T235464 (10Marostegui)
[05:43:30] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] mariadb: Set db1070 to spare [puppet] - 10https://gerrit.wikimedia.org/r/545158 (https://phabricator.wikimedia.org/T235464) (owner: 10Marostegui)
[05:47:42] <wikibugs>	 (03PS1) 10Vgutierrez: hiera: Unify common ats-tls settings for cache upload [puppet] - 10https://gerrit.wikimedia.org/r/545162 (https://phabricator.wikimedia.org/T231433)
[05:47:44] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez)
[05:47:47] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] spec: remove hhvm references from tests [puppet] - 10https://gerrit.wikimedia.org/r/544847 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli)
[05:48:00] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Remove db1070 from config T235464', diff saved to https://phabricator.wikimedia.org/P9422 and previous config saved to /var/cache/conftool/dbconfig/20191022-054759-marostegui.json
[05:48:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:48:04] <stashbot>	 T235464: decommission db1070.eqiad.wmnet - https://phabricator.wikimedia.org/T235464
[05:51:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1096 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9423 and previous config saved to /var/cache/conftool/dbconfig/20191022-055151-marostegui.json
[05:51:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[05:52:59] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] "pcc shows a NOOP across cache upload nodes on every DC: https://puppet-compiler.wmflabs.org/compiler1001/18975/" [puppet] - 10https://gerrit.wikimedia.org/r/545162 (https://phabricator.wikimedia.org/T231433) (owner: 10Vgutierrez)
[05:54:39] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: Move cache upload cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231433 (10Vgutierrez) 05Open→03Resolved
[05:54:44] <wikibugs>	 10Operations, 10Traffic, 10Goal, 10Patch-For-Review, 10Performance-Team (Radar): Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10Vgutierrez)
[05:54:48] <wikibugs>	 10Operations, 10Traffic: Get rid of nginx puppetization for cache upload - https://phabricator.wikimedia.org/T236120 (10Vgutierrez)
[05:57:15] <wikibugs>	 (03PS5) 10Marostegui: db-eqiad.php: Temporary pool pc1010 in pc1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542890 (https://phabricator.wikimedia.org/T227142)
[05:57:21] <wikibugs>	 (03CR) 10Marostegui: db-eqiad.php: Temporary pool pc1010 in pc1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542890 (https://phabricator.wikimedia.org/T227142) (owner: 10Marostegui)
[05:57:30] <wikibugs>	 (03Abandoned) 10Vgutierrez: Testing buffer_upload experimental plugin - do not merge [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/543271 (https://phabricator.wikimedia.org/T234887) (owner: 10Vgutierrez)
[06:02:44] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ATS: Enable reloading global lua script [puppet] - 10https://gerrit.wikimedia.org/r/543022 (https://phabricator.wikimedia.org/T233274) (owner: 10Vgutierrez)
[06:07:24] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ATS: Use a common base path for /etc/ssl and /etc/acmecerts certs [puppet] - 10https://gerrit.wikimedia.org/r/544151 (https://phabricator.wikimedia.org/T234803) (owner: 10Vgutierrez)
[06:32:00] <vgutierrez>	 !log rolling restart of ats-tls - T233274 T234803
[06:32:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:32:05] <stashbot>	 T234803: Provide an easy way of picking the traffic serving TLS certificate used by ATS - https://phabricator.wikimedia.org/T234803
[06:32:06] <stashbot>	 T233274: ATS lua script reload doesn't work as expected - https://phabricator.wikimedia.org/T233274
[06:41:11] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Temporary pool pc1010 in pc1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542890 (https://phabricator.wikimedia.org/T227142) (owner: 10Marostegui)
[06:41:50] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Temporary pool pc1010 in pc1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542890 (https://phabricator.wikimedia.org/T227142) (owner: 10Marostegui)
[06:43:11] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool pc1010 T227142 (duration: 00m 52s)
[06:43:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:43:14] <stashbot>	 T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142
[06:47:28] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 04-1] Set GrowthExperiments task suggester config on beta (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545155 (https://phabricator.wikimedia.org/T234426) (owner: 10Gergő Tisza)
[06:51:18] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: Add parsoid-php to the discovery records to switchover [cookbooks] - 10https://gerrit.wikimedia.org/r/545167
[06:53:21] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] "I think btw it should be possible to use debdeploy and debmonitor for eevans as well. LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/544966 (https://phabricator.wikimedia.org/T200803) (owner: 10Alexandros Kosiaris)
[06:53:38] <wikibugs>	 (03PS2) 10Gergő Tisza: Set GrowthExperiments task suggester config on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545155 (https://phabricator.wikimedia.org/T234426)
[06:55:37] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "I think the point set forward by eevans makes sense. Unless we have a way to manually trigger the process, we might be better off writing " [puppet] - 10https://gerrit.wikimedia.org/r/544964 (https://phabricator.wikimedia.org/T235675) (owner: 10Alexandros Kosiaris)
[06:57:01] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+1] Set GrowthExperiments task suggester config on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545155 (https://phabricator.wikimedia.org/T234426) (owner: 10Gergő Tisza)
[07:02:45] <wikibugs>	 (03CR) 10ArielGlenn: "I looked at removal scripts for the relevnt packages and it looks ok. We could do a test on a snapshot host as soon as one becomes idle, w" [puppet] - 10https://gerrit.wikimedia.org/r/544864 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli)
[07:03:55] <wikibugs>	 (03CR) 10Effie Mouzeli: "> As I already suggested previously, this is the wrong approach." [puppet] - 10https://gerrit.wikimedia.org/r/544864 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli)
[07:04:51] <wikibugs>	 (03CR) 10Effie Mouzeli: "> I looked at removal scripts for the relevnt packages and it looks" [puppet] - 10https://gerrit.wikimedia.org/r/544864 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli)
[07:14:11] <wikibugs>	 (03PS2) 10Muehlenhoff: Update microcode check [puppet] - 10https://gerrit.wikimedia.org/r/544944 (https://phabricator.wikimedia.org/T235250)
[07:17:59] <moritzm>	 !log installing tcpdump security updates
[07:18:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:34:42] <wikibugs>	 (03CR) 10Elukey: "Does it need to be on stat1007? In theory a cleaner solution, in my opinion, would be a Ganeti VM in the analytics VLAN dedicated to this " [puppet] - 10https://gerrit.wikimedia.org/r/544989 (owner: 10EBernhardson)
[07:36:42] <wikibugs>	 (03PS2) 10DCausse: Bump experimental-highlighter to 5.6.4.1 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/543188 (https://phabricator.wikimedia.org/T236123)
[07:37:04] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: Can't load flame or coal graphs on performance.wikimedia.org (HTTP 502) - https://phabricator.wikimedia.org/T236102 (10Gilles) Confirmed on WMCS:  ` HTTP/2 502  date: Tue, 22 Oct 2019 07:34:28 GMT content-type: text/html server: ATS/8.0.5 cache-control: no-store c...
[07:38:18] <wikibugs>	 (03PS4) 10Elukey: swap: Redirect stderr to /dev/null to prevent cronspam [puppet] - 10https://gerrit.wikimedia.org/r/543866 (https://phabricator.wikimedia.org/T132324) (owner: 10Jcrespo)
[07:40:14] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] "I keep postponing this due to other tasks and time off work, so let's merge this and stop cronspam, I'll come back to it :)" [puppet] - 10https://gerrit.wikimedia.org/r/543866 (https://phabricator.wikimedia.org/T132324) (owner: 10Jcrespo)
[07:40:28] <wikibugs>	 10Operations: Puppet breakage in automation-framework VMs - https://phabricator.wikimedia.org/T234452 (10Volans) >>! In T234452#5593612, @crusnov wrote: > This should be fixed now.  Are you sure? Puppet is still broken on all of them AFAICT (I just checked randomly some of them). This is on the puppetmaster: ` T...
[07:41:06] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic: Can't load flame or coal graphs on performance.wikimedia.org (HTTP 502) - https://phabricator.wikimedia.org/T236102 (10ema) The certificate for performance.discovery.wmnet does not include performance.wikimedia.org in SubjectAltName, hence ATS fails to connect to...
[07:48:26] <wikibugs>	 (03PS1) 10Ema: ssl: re-issue cert for performance.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/545203 (https://phabricator.wikimedia.org/T210411)
[07:48:51] <wikibugs>	 (03PS4) 10Urbanecm: Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[07:49:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[07:50:20] <wikibugs>	 (03CR) 10Ema: [C: 03+2] ssl: re-issue cert for performance.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/545203 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema)
[07:53:49] <marostegui>	 !log Stop MySQL on db1116 pc1007 db1096:3315, db1096:3316 for PDU maintenance T227142
[07:53:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:53:53] <stashbot>	 T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142
[07:54:20] <wikibugs>	 (03PS5) 10Urbanecm: Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[07:56:32] <wikibugs>	 10Operations, 10Traffic: ATS lua script reload doesn't work as expected - https://phabricator.wikimedia.org/T233274 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez
[07:56:56] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: LVS: add config for parsoid-php service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/543243 (https://phabricator.wikimedia.org/T233654) (owner: 10Dzahn)
[07:58:06] <icinga-wm>	 PROBLEM - MariaDB Slave IO: pc1 on pc2007 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl@pc1007.eqiad.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Cant connect to MySQL server on pc1007.eqiad.wmnet (111 Connection refused) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[07:58:26] <icinga-wm>	 PROBLEM - MariaDB Slave IO: pc1 on pc2010 is CRITICAL: CRITICAL slave_io_state Slave_IO_Running: No, Errno: 2003, Errmsg: error reconnecting to master repl@pc1007.eqiad.wmnet:3306 - retry-time: 60 maximum-retries: 86400 message: Cant connect to MySQL server on pc1007.eqiad.wmnet (111 Connection refused) https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[07:58:31] <marostegui>	 ^ me
[07:58:35] <marostegui>	 I will silence those
[08:03:48] <wikibugs>	 10Operations, 10Traffic: Trigger envoy reload upon TLS certificate update - https://phabricator.wikimedia.org/T236125 (10ema)
[08:04:00] <wikibugs>	 10Operations, 10Traffic: Trigger envoy reload upon TLS certificate update - https://phabricator.wikimedia.org/T236125 (10ema) p:05Triage→03Normal
[08:04:35] <wikibugs>	 (03PS1) 10Vgutierrez: acme_chief: Grant access to all cp nodes to the unified cert [puppet] - 10https://gerrit.wikimedia.org/r/545204 (https://phabricator.wikimedia.org/T234803)
[08:05:40] <marostegui>	 !log Stop MySQL on labsdb1012 for PDU work T227142
[08:05:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:05:44] <stashbot>	 T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142
[08:07:33] <wikibugs>	 (03CR) 10Ema: [C: 03+1] acme_chief: Grant access to all cp nodes to the unified cert [puppet] - 10https://gerrit.wikimedia.org/r/545204 (https://phabricator.wikimedia.org/T234803) (owner: 10Vgutierrez)
[08:08:06] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] acme_chief: Grant access to all cp nodes to the unified cert [puppet] - 10https://gerrit.wikimedia.org/r/545204 (https://phabricator.wikimedia.org/T234803) (owner: 10Vgutierrez)
[08:09:05] <wikibugs>	 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: Can't load flame or coal graphs on performance.wikimedia.org (HTTP 502) - https://phabricator.wikimedia.org/T236102 (10ema) 05Open→03Resolved a:03ema Done, thanks for the bug report @ori!
[08:09:42] <wikibugs>	 (03Abandoned) 10Vgutierrez: package_builder: Fix debhelper dependencies on stretch [puppet] - 10https://gerrit.wikimedia.org/r/533896 (owner: 10Vgutierrez)
[08:18:18] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: blubberoid: Add TLS termination [deployment-charts] - 10https://gerrit.wikimedia.org/r/544774 (https://phabricator.wikimedia.org/T210411)
[08:18:20] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: scaffold: Add option for TLS termination [deployment-charts] - 10https://gerrit.wikimedia.org/r/543854 (https://phabricator.wikimedia.org/T236008)
[08:18:22] <wikibugs>	 (03PS4) 10Giuseppe Lavagetto: scaffold: only expose one port as a service by default [deployment-charts] - 10https://gerrit.wikimedia.org/r/544629
[08:22:53] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Reload TLS material on acme_chief::cert updates [puppet] - 10https://gerrit.wikimedia.org/r/545206 (https://phabricator.wikimedia.org/T234803)
[08:23:37] <wikibugs>	 (03PS1) 10Ema: kibana: add TLS termination with envoy [puppet] - 10https://gerrit.wikimedia.org/r/545207 (https://phabricator.wikimedia.org/T210411)
[08:24:04] <wikibugs>	 (03CR) 10Muehlenhoff: "Wrt the concern about picking an explicit version; we can also set this via ListShellHook (as already done for "elastic" and "elastic55")" [puppet] - 10https://gerrit.wikimedia.org/r/544964 (https://phabricator.wikimedia.org/T235675) (owner: 10Alexandros Kosiaris)
[08:24:27] <wikibugs>	 (03PS17) 10Jcrespo: bacula: Create new backup jobs status check for icinga [puppet] - 10https://gerrit.wikimedia.org/r/544220 (https://phabricator.wikimedia.org/T234900)
[08:25:39] <wikibugs>	 (03PS1) 10Vgutierrez: ATS: Deploy acme-chief version of the unified certificate globally [puppet] - 10https://gerrit.wikimedia.org/r/545208 (https://phabricator.wikimedia.org/T234803)
[08:28:25] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM python wise, I'll leave it to you for the flag logic" [puppet] - 10https://gerrit.wikimedia.org/r/544944 (https://phabricator.wikimedia.org/T235250) (owner: 10Muehlenhoff)
[08:29:35] <wikibugs>	 (03PS1) 10Ema: Add kibana-ssl LVS service [puppet] - 10https://gerrit.wikimedia.org/r/545209 (https://phabricator.wikimedia.org/T210411)
[08:30:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Thanks, merging. The changes to the flag logic are based on my tests on servers on the fleet, there'll be a few tweaks for the blacklist, " [puppet] - 10https://gerrit.wikimedia.org/r/544944 (https://phabricator.wikimedia.org/T235250) (owner: 10Muehlenhoff)
[08:30:49] <wikibugs>	 (03PS3) 10Muehlenhoff: Update microcode check [puppet] - 10https://gerrit.wikimedia.org/r/544944 (https://phabricator.wikimedia.org/T235250)
[08:32:39] <wikibugs>	 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff)
[08:33:30] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Update microcode check [puppet] - 10https://gerrit.wikimedia.org/r/544944 (https://phabricator.wikimedia.org/T235250) (owner: 10Muehlenhoff)
[08:39:24] <wikibugs>	 (03PS2) 10Vgutierrez: ATS,tlsproxy: Reload TLS material on acme_chief::cert updates [puppet] - 10https://gerrit.wikimedia.org/r/545206 (https://phabricator.wikimedia.org/T234803)
[08:39:26] <wikibugs>	 (03PS2) 10Vgutierrez: ATS: Deploy acme-chief version of the unified certificate globally [puppet] - 10https://gerrit.wikimedia.org/r/545208 (https://phabricator.wikimedia.org/T234803)
[08:40:22] <wikibugs>	 (03PS18) 10Jcrespo: bacula: Create new backup jobs status check for icinga [puppet] - 10https://gerrit.wikimedia.org/r/544220 (https://phabricator.wikimedia.org/T234900)
[08:45:38] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "lgtm minor nit" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/543916 (https://phabricator.wikimedia.org/T223458) (owner: 10Bstorm)
[08:45:46] <wikibugs>	 (03PS3) 10Vgutierrez: ATS: Deploy acme-chief version of the unified certificate globally [puppet] - 10https://gerrit.wikimedia.org/r/545208 (https://phabricator.wikimedia.org/T234803)
[08:48:46] <wikibugs>	 (03PS2) 10Ema: kibana: add TLS termination with envoy [puppet] - 10https://gerrit.wikimedia.org/r/545207 (https://phabricator.wikimedia.org/T210411)
[08:48:55] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "Requires first a netbox deploy." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545123 (owner: 10CRusnov)
[08:51:58] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 03+2] logstash: config readable by logstash only by default [puppet] - 10https://gerrit.wikimedia.org/r/544218 (https://phabricator.wikimedia.org/T235891) (owner: 10Filippo Giunchedi)
[08:52:09] <wikibugs>	 (03PS3) 10Ema: kibana: add TLS termination with envoy [puppet] - 10https://gerrit.wikimedia.org/r/545207 (https://phabricator.wikimedia.org/T210411)
[08:52:11] <wikibugs>	 (03PS2) 10Filippo Giunchedi: logstash: config readable by logstash only by default [puppet] - 10https://gerrit.wikimedia.org/r/544218 (https://phabricator.wikimedia.org/T235891)
[08:58:25] <wikibugs>	 (03CR) 10Ema: [C: 03+2] kibana: add TLS termination with envoy [puppet] - 10https://gerrit.wikimedia.org/r/545207 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema)
[09:04:04] <wikibugs>	 (03Abandoned) 10Vgutierrez: acme_chief cloud: Ensure that python3-designateclient is installed [puppet] - 10https://gerrit.wikimedia.org/r/528624 (owner: 10Vgutierrez)
[09:05:23] <wikibugs>	 (03PS3) 10Arturo Borrero Gonzalez: wikimedia.cloud: add initial zone file [dns] - 10https://gerrit.wikimedia.org/r/544175 (https://phabricator.wikimedia.org/T235846)
[09:09:15] <wikibugs>	 (03CR) 10Volans: "See comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545094 (owner: 10Dzahn)
[09:10:52] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Change weights to x100 on s4 codfw - T231018', diff saved to https://phabricator.wikimedia.org/P9424 and previous config saved to /var/cache/conftool/dbconfig/20191022-091051-marostegui.json
[09:10:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:10:56] <stashbot>	 T231018: specify group (api/vslow/etc) weights in terms of 0..100 instead of 0..1 - https://phabricator.wikimedia.org/T231018
[09:13:28] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Change weights to x100 on s4 eqiad - T231018', diff saved to https://phabricator.wikimedia.org/P9425 and previous config saved to /var/cache/conftool/dbconfig/20191022-091327-marostegui.json
[09:13:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:14:08] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM. Minor comment inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545072 (https://phabricator.wikimedia.org/T235863) (owner: 10Jhedden)
[09:16:04] <wikibugs>	 10Operations, 10Traffic: Elevated 502s observed in ulsfo - https://phabricator.wikimedia.org/T236130 (10fgiunchedi)
[09:17:24] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] wikimedia.cloud: add initial zone file [dns] - 10https://gerrit.wikimedia.org/r/544175 (https://phabricator.wikimedia.org/T235846) (owner: 10Arturo Borrero Gonzalez)
[09:20:07] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] "LGTM, minor comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/544990 (owner: 10EBernhardson)
[09:22:23] <wikibugs>	 (03PS3) 10DCausse: Bump experimental-highlighter to 6.5.4.1 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/543188 (https://phabricator.wikimedia.org/T236123)
[09:23:03] <wikibugs>	 (03CR) 10Volans: "recheck" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/545132 (owner: 10CRusnov)
[09:25:26] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: "> Wrt the concern about picking an explicit version; we can also set" [puppet] - 10https://gerrit.wikimedia.org/r/544964 (https://phabricator.wikimedia.org/T235675) (owner: 10Alexandros Kosiaris)
[09:29:09] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: blubberoid: Add TLS termination [deployment-charts] - 10https://gerrit.wikimedia.org/r/544774 (https://phabricator.wikimedia.org/T210411)
[09:29:11] <wikibugs>	 (03PS6) 10Giuseppe Lavagetto: scaffold: Add option for TLS termination [deployment-charts] - 10https://gerrit.wikimedia.org/r/543854 (https://phabricator.wikimedia.org/T236008)
[09:29:13] <wikibugs>	 (03PS5) 10Giuseppe Lavagetto: scaffold: only expose one port as a service by default [deployment-charts] - 10https://gerrit.wikimedia.org/r/544629
[09:30:56] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:32:00] <jynus>	 there was a spike
[09:32:30] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[09:32:48] <jynus>	  Error from line 130 of /srv/mediawiki/php-1.35.0-wmf.2/extensions/Graph/includes/ApiGraph.php: Call to a member function getExtensionData() on boolean
[09:33:03] <jynus>	 (exception) 
[09:33:34] <jynus>	 ^heads up to releng to notify owners if it is not tracked already
[09:34:29] <jynus>	 oh, it is tracked already: https://phabricator.wikimedia.org/T235356
[09:41:14] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-upload site=eqiad https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:42:50] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[09:45:32] <wikibugs>	 (03PS2) 10Ema: Add kibana-ssl LVS service [puppet] - 10https://gerrit.wikimedia.org/r/545209 (https://phabricator.wikimedia.org/T210411)
[09:48:55] <wikibugs>	 (03PS3) 10Ema: Add kibana-ssl LVS service [puppet] - 10https://gerrit.wikimedia.org/r/545209 (https://phabricator.wikimedia.org/T210411)
[09:49:31] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] Add kibana-ssl LVS service [puppet] - 10https://gerrit.wikimedia.org/r/545209 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema)
[09:51:02] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Add kibana-ssl LVS service [puppet] - 10https://gerrit.wikimedia.org/r/545209 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema)
[09:53:01] <ema>	 lvs1016: restart pybal to add new service kibana-ssl T210411
[09:53:02] <stashbot>	 T210411: Applayer services without TLS - https://phabricator.wikimedia.org/T210411
[09:53:57] <vgutierrez>	 ema: I think you missed the log cmd
[09:54:03] <ema>	 ah!
[09:54:09] <ema>	 !log lvs1016: restart pybal to add new service kibana-ssl T210411
[09:54:11] <vgutierrez>	 <3
[09:54:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:54:14] <ema>	 thanks!
[09:54:51] <ema>	 !log lvs2006: restart pybal to add new service kibana-ssl T210411
[09:54:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:56:37] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ATS,tlsproxy: Reload TLS material on acme_chief::cert updates [puppet] - 10https://gerrit.wikimedia.org/r/545206 (https://phabricator.wikimedia.org/T234803) (owner: 10Vgutierrez)
[09:56:45] <wikibugs>	 (03CR) 10Vgutierrez: ATS,tlsproxy: Reload TLS material on acme_chief::cert updates [puppet] - 10https://gerrit.wikimedia.org/r/545206 (https://phabricator.wikimedia.org/T234803) (owner: 10Vgutierrez)
[09:57:14] <icinga-wm>	 PROBLEM - Confd template for /srv/config-master/pybal/eqiad/kibana on puppetmaster1001 is CRITICAL: Compilation of file /srv/config-master/pybal/eqiad/kibana is broken https://wikitech.wikimedia.org/wiki/Confd
[09:57:27] <ema>	 uh
[09:57:48] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs1016 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.2.33:443]) https://wikitech.wikimedia.org/wiki/PyBal
[09:57:58] <icinga-wm>	 PROBLEM - Confd template for /srv/config-master/pybal/codfw/kibana on puppetmaster1001 is CRITICAL: Compilation of file /srv/config-master/pybal/codfw/kibana is broken https://wikitech.wikimedia.org/wiki/Confd
[09:58:04] <icinga-wm>	 PROBLEM - PyBal IPVS diff check on lvs2006 is CRITICAL: CRITICAL: Services known to PyBal but not to IPVS: set([10.2.1.33:443]) https://wikitech.wikimedia.org/wiki/PyBal
[09:58:29] <ema>	 _joe_: can you see anything strange with /srv/config-master/pybal/eqiad/kibana on puppetmaster1001?
[09:59:06] <_joe_>	 ema: /var/log/confd.log says anything?
[09:59:41] <_joe_>	 [invalid]: server pool cannot be empty!
[09:59:46] <_joe_>	 pool some servers :P
[10:00:01] <_joe_>	 they're all in status pooled=inactive
[10:00:03] <vgutierrez>	 new service.. everything is depooled :)
[10:00:09] <ema>	 ah, right
[10:00:17] <ema>	 the error is a bit misleading
[10:00:19] <_joe_>	 nothig to worry about too much :P
[10:00:22] <_joe_>	 what error?
[10:00:39] <ema>	 the one saying that compilation is broken
[10:00:51] <ema>	 it made me think of a syntax error in the template
[10:00:55] <_joe_>	 it is true that the compilation is broken, it doesn't pass the verification step
[10:01:11] <_joe_>	 well an error can be due to either the template or the data right?
[10:02:11] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=yes; selector: service=kibana-ssl
[10:02:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:03:22] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs1016 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[10:03:40] <icinga-wm>	 RECOVERY - PyBal IPVS diff check on lvs2006 is OK: OK: no difference between hosts in IPVS/PyBal https://wikitech.wikimedia.org/wiki/PyBal
[10:04:41] <ema>	 _joe_: indeed. Thanks!
[10:07:11] <wikibugs>	 (03PS3) 10Vgutierrez: ATS,tlsproxy: Reload TLS material on acme_chief::cert updates [puppet] - 10https://gerrit.wikimedia.org/r/545206 (https://phabricator.wikimedia.org/T234803)
[10:07:13] <wikibugs>	 (03PS4) 10Vgutierrez: ATS: Deploy acme-chief version of the unified certificate globally [puppet] - 10https://gerrit.wikimedia.org/r/545208 (https://phabricator.wikimedia.org/T234803)
[10:10:01] <ema>	 is some manual intervention required now to make confd happy again on the puppetmaster?
[10:11:10] <wikibugs>	 (03CR) 10Hashar: [C: 03+1] "And from https://phabricator.wikimedia.org/T208566#5371633  that will let us do some cleanup after that, notably get rid of rake_modules/f" [puppet] - 10https://gerrit.wikimedia.org/r/526104 (https://phabricator.wikimedia.org/T228657) (owner: 10Alexandros Kosiaris)
[10:12:46] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] ATS,tlsproxy: Reload TLS material on acme_chief::cert updates [puppet] - 10https://gerrit.wikimedia.org/r/545206 (https://phabricator.wikimedia.org/T234803) (owner: 10Vgutierrez)
[10:14:49] <ema>	 !log puppetmaster1001: rm /var/run/confd-template/.kibana-ssl*.err to make confd icinga check happy T210411 
[10:14:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:14:53] <stashbot>	 T210411: Applayer services without TLS - https://phabricator.wikimedia.org/T210411
[10:15:21] <wikibugs>	 10Operations, 10Traffic: Trigger envoy reload upon TLS certificate update - https://phabricator.wikimedia.org/T236125 (10Joe) Given we have the hot-restarted now, that's probably a good idea.
[10:15:26] <icinga-wm>	 RECOVERY - Confd template for /srv/config-master/pybal/eqiad/kibana on puppetmaster1001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd
[10:15:26] <icinga-wm>	 RECOVERY - Confd template for /srv/config-master/pybal/codfw/kibana on puppetmaster1001 is OK: No errors detected https://wikitech.wikimedia.org/wiki/Confd
[10:15:27] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10Marostegui) The following hosts are ready for this maintenance * pc1007 * labsdb1012 * db1116 * db1096 * dbproxy1013 * db1066. Note ** this host is powered OFF as it is ready to...
[10:18:37] <ema>	 !log lvs1015: restart pybal to add new service kibana-ssl T210411
[10:18:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:21] <ema>	 !log lvs2003: restart pybal to add new service kibana-ssl T210411
[10:21:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:21:25] <stashbot>	 T210411: Applayer services without TLS - https://phabricator.wikimedia.org/T210411
[10:28:23] <wikibugs>	 (03PS1) 10Ema: kibana: add discovery configuration [puppet] - 10https://gerrit.wikimedia.org/r/545225
[10:32:26] <jynus>	 !log shutting down db1115 in preparation for PDU maintanance, this will make tendril and dbtree unavailable for 2 hours T227142
[10:32:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:32:30] <stashbot>	 T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142
[10:32:54] <wikibugs>	 (03PS2) 10Ema: kibana: add discovery configuration [puppet] - 10https://gerrit.wikimedia.org/r/545225
[10:35:51] <wikibugs>	 (03CR) 10Ema: [C: 03+2] kibana: add discovery configuration [puppet] - 10https://gerrit.wikimedia.org/r/545225 (owner: 10Ema)
[10:35:56] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM once the new discovery record is live." [cookbooks] - 10https://gerrit.wikimedia.org/r/545167 (owner: 10Giuseppe Lavagetto)
[10:37:47] <wikibugs>	 (03CR) 10Filippo Giunchedi: "FWIW I'm still seeing some warnings from throttle from time to time, I'm guessing when we take down one of logstash frontends and more mes" [puppet] - 10https://gerrit.wikimedia.org/r/543904 (owner: 10Herron)
[10:37:51] <logmsgbot>	 !log ema@puppetmaster1001 conftool action : set/pooled=true; selector: dnsdisc=kibana
[10:37:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:30] <icinga-wm>	 PROBLEM - HTTP-dbtree on dbmonitor2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 281 bytes in 0.150 second response time https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[10:40:20] <jynus>	 ^expected, downtimed, but I was too late it was aalready on soft
[10:43:18] <wikibugs>	 (03PS1) 10Ema: kibana: add discovery record [dns] - 10https://gerrit.wikimedia.org/r/545232 (https://phabricator.wikimedia.org/T227432)
[10:46:17] <wikibugs>	 (03CR) 10Ema: [C: 03+2] kibana: add discovery record [dns] - 10https://gerrit.wikimedia.org/r/545232 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema)
[10:48:28] <wikibugs>	 (03PS1) 10Filippo Giunchedi: logstash: remove deprecated elasticsearch options [puppet] - 10https://gerrit.wikimedia.org/r/545236 (https://phabricator.wikimedia.org/T235891)
[10:48:52] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks fine (sans John's comment), but that changes existing sudo rules, so needs approval in the next SRE meeting, maybe Guillaume can tak" [puppet] - 10https://gerrit.wikimedia.org/r/544990 (owner: 10EBernhardson)
[10:53:06] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10jcrespo) db1115 is now down, I took the opportunity to upgrade all its system packages, but didn't touch mariadb.
[10:54:12] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[10:54:13] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[10:54:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:54:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:54:49] <wikibugs>	 (03PS1) 10Ema: Revert "kibana: add discovery record" [dns] - 10https://gerrit.wikimedia.org/r/545241
[10:55:26] <moritzm>	 !log rebooting rpki2001 for some microcode tests
[10:55:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:55:51] <wikibugs>	 (03CR) 10Ema: [C: 03+2] Revert "kibana: add discovery record" [dns] - 10https://gerrit.wikimedia.org/r/545241 (owner: 10Ema)
[10:58:47] <wikibugs>	 (03CR) 10Huji: [C: 03+1] Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[11:00:04] <jouncebot>	 Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T1100).
[11:00:04] <jouncebot>	 MatmaRex: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:17] <MatmaRex>	 hello
[11:00:39] <Lucas_WMDE>	 o/
[11:00:47] <Urbanecm>	 I can SWAT today!
[11:01:04] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at ulsfo on icinga1001 is CRITICAL: job=varnish-text site=ulsfo https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[11:01:58] <Urbanecm>	 MatmaRex: +2'ed your backports, will let you know once they're ready to test
[11:02:17] <MatmaRex>	 Urbanecm: hi, thanks. i'm trying out the same patch we revereted yesterday, i've been told this time it will *really* work ;)
[11:02:24] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[11:02:40] <wikibugs>	 (03PS19) 10Jcrespo: bacula: Create new backup jobs status check for icinga [puppet] - 10https://gerrit.wikimedia.org/r/544220 (https://phabricator.wikimedia.org/T234900)
[11:02:41] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10Jclark-ctr) starting PDU Maintenance
[11:02:44] <wikibugs>	 (03PS1) 10Jcrespo: mariadb/backups: Prepare dbmonitor[12]001 to reimage to buster [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589)
[11:02:48] <Urbanecm>	 MatmaRex: cool!
[11:03:49] <wikibugs>	 (03PS1) 10Muehlenhoff: Fix detection of virtual hosts in microcode code [puppet] - 10https://gerrit.wikimedia.org/r/545247
[11:04:31] <wikibugs>	 (03CR) 10Muehlenhoff: mariadb/backups: Prepare dbmonitor[12]001 to reimage to buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[11:05:08] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-kibana.state on authdns1001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-kibana.state https://wikitech.wikimedia.org/wiki/Confd
[11:05:25] <wikibugs>	 (03CR) 10Jcrespo: [C: 04-1] mariadb/backups: Prepare dbmonitor[12]001 to reimage to buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[11:05:36] <Urbanecm>	 MatmaRex: fyi, I'm going to deploy a config patch while waiting on CI
[11:05:55] <MatmaRex>	 okay
[11:06:03] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+2] Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[11:06:36] <wikibugs>	 (03PS2) 10Jcrespo: mariadb/backups: Prepare dbmonitor[12]001 to reimage to buster [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589)
[11:06:52] <wikibugs>	 (03Merged) 10jenkins-bot: Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544995 (https://phabricator.wikimedia.org/T230614) (owner: 104nn1l2)
[11:06:54] <wikibugs>	 (03CR) 10Jcrespo: "Thank you very much for the catch, Moritz!" [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[11:07:10] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Fix detection of virtual hosts in microcode code [puppet] - 10https://gerrit.wikimedia.org/r/545247 (owner: 10Muehlenhoff)
[11:09:38] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[11:09:57] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 03+2] spec: remove hhvm references from tests [puppet] - 10https://gerrit.wikimedia.org/r/544847 (https://phabricator.wikimedia.org/T229792) (owner: 10Effie Mouzeli)
[11:09:59] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 0593f34: Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections (T230614) (duration: 00m 54s)
[11:09:59] <wikibugs>	 (03CR) 10Jcrespo: "@Marostegui From IRC: <jynus> what would you think about me doing T224589 while you monitor the PDU stuff?" [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[11:10:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:10:03] <stashbot>	 T230614: Carry out the 2019 fawiki elections on votewiki - https://phabricator.wikimedia.org/T230614
[11:10:31] <jynus>	 ^it is all bots all the way down
[11:10:49] * jynus prepares when bots will take over my job
[11:11:19] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10aborrero)
[11:11:49] <hauskater>	 hope you have some pension plan :P
[11:12:33] <jynus>	 Step 1: automate my job away. Step 2: ?. Step 3: Profit!
[11:13:08] <godog>	 jynus marostegui FYI with db1115 down/unavailable then /usr/local/sbin/mysqld_exporter_config.py doesn't work on prometheus hosts, noticed via puppet failures
[11:13:18] <godog>	 pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'db1115.eqiad.wmnet' ([Errno 111] Connection refused)")
[11:13:21] <jynus>	 yeah
[11:13:27] <jynus>	 that is expected, but shouldn't alert?
[11:13:36] <jynus>	 does it create issues on prometheus?
[11:14:08] <jynus>	 oh, I see, it is the puppet thing
[11:14:25] <godog>	 no immediate issue afaict no, but obviously the db targets are not being updated if needed
[11:14:26] <jynus>	 but it doesn't "break" puppet, right?
[11:14:32] <godog>	 it does
[11:14:36] <jynus>	 oh?
[11:14:41] <jynus>	 I think that was on purpose
[11:14:44] <godog>	 as in the puppet run fails, because the exec fails
[11:14:45] <jynus>	 to notice it
[11:14:53] <godog>	 yeah I think this is the correct behaviour
[11:14:55] <jynus>	 but it doesn't blamk the config
[11:15:01] <jynus>	 is what I meant?
[11:15:12] <wikibugs>	 (03PS6) 10Urbanecm: Allow certain users to create account at closed wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542755 (https://phabricator.wikimedia.org/T222117)
[11:15:29] <godog>	 yeah the config is fine
[11:15:38] <jynus>	 I will ack the error with a timeout
[11:15:49] <jynus>	 It creates a warning so I didn't notice it
[11:15:58] <jynus>	 in the future we will have ha for that db
[11:16:14] <Urbanecm>	 MatmaRex: your commits are ready at mwdebug1001, please test and let me know
[11:16:15] <jynus>	 but we cannot right now because tendril
[11:17:11] <godog>	 ack, thanks jynus 
[11:17:26] <jynus>	 sorry for the issues, I didn't rememeber that
[11:17:37] <jynus>	 I will document db1115 dependencies on wikitech
[11:17:39] <godog>	 np, no issues actually
[11:17:47] <jynus>	 "it works" as intended
[11:17:49] <jynus>	 :-D
[11:18:01] <MatmaRex>	 Urbanecm: looking
[11:18:09] <Urbanecm>	 thanks
[11:19:23] <MatmaRex>	 Urbanecm: are you sure? i'm still seeing the old JS code
[11:19:39] <Urbanecm>	 MatmaRex: verifying, give me a moment
[11:21:22] <hashar>	 jouncebot: now
[11:21:22] <jouncebot>	 For the next 0 hour(s) and 38 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T1100)
[11:21:29] <Urbanecm>	 MatmaRex: made a mistake, it should be fine now
[11:21:32] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb/backups: Prepare dbmonitor[12]001 to reimage to buster [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[11:22:14] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-kibana.state on authdns2001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-kibana.state https://wikitech.wikimedia.org/wiki/Confd
[11:22:40] <MatmaRex>	 Urbanecm: thanks, seems to be working as expected now. give me a minute more to test the logging stuff
[11:22:47] <Urbanecm>	 MatmaRex: sure
[11:24:28] <marostegui>	 godog: thanks for the ping, I didn't remember that either
[11:25:51] <MatmaRex>	 Urbanecm: everything looks good!
[11:26:03] <Urbanecm>	 MatmaRex: good! Going to sync 'em all!
[11:28:20] <logmsgbot>	 !log urbanecm@deploy1001 Synchronized php-1.35.0-wmf.2/extensions/VisualEditor/: SWAT: 2bc4420 (T235707); 680a98b (T233320); d83265d (T234564) (duration: 00m 53s)
[11:28:34] <Urbanecm>	 MatmaRex: synced!
[11:28:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:35] <stashbot>	 T233320: VisualEditor <-> RESTBase communication and ETags - https://phabricator.wikimedia.org/T233320
[11:28:35] <stashbot>	 T234564: Logstash discards messages from MediaWiki if they contain uncommon keys in the $context array - https://phabricator.wikimedia.org/T234564
[11:28:36] <stashbot>	 T235707: VE not successfully loading on pages with video or audio embeds: "TypeError: href is null" from ve.dm.MWInternalLinkAnnotation.js:141:2 - https://phabricator.wikimedia.org/T235707
[11:29:13] <MatmaRex>	 Urbanecm: thank you
[11:29:16] <wikibugs>	 (03PS1) 10Muehlenhoff: Check for both kvm/qemu in systemd-detec-virt [puppet] - 10https://gerrit.wikimedia.org/r/545254
[11:29:21] <Urbanecm>	 you're welcome
[11:29:24] <Urbanecm>	 !log EU SWAT done
[11:29:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:31:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Check for both kvm/qemu in systemd-detec-virt [puppet] - 10https://gerrit.wikimedia.org/r/545254 (owner: 10Muehlenhoff)
[11:34:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1105:3311, db1105:3312 for firmware upgrade T235877', diff saved to https://phabricator.wikimedia.org/P9428 and previous config saved to /var/cache/conftool/dbconfig/20191022-113437-marostegui.json
[11:34:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:34:42] <stashbot>	 T235877: db1105 rebooted itself - https://phabricator.wikimedia.org/T235877
[11:34:56] <hauskater>	 Daimona: can https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/SocialProfile/+/545003/ be un-WIP?
[11:35:06] <hauskater>	 I can re+2 after that
[11:35:09] <marostegui>	 !log Stop MySQL on db1105:3311, db1105:3312 for firmware upgrade - T235877
[11:35:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:35:30] <Daimona>	 hauskater: Oh, sure. I forgot to.
[11:35:54] <hauskater>	 +2
[11:35:57] <hauskater>	 again
[11:36:36] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-kibana.state on multatuli is CRITICAL: File not found: /var/lib/gdnsd/discovery-kibana.state https://wikitech.wikimedia.org/wiki/Confd
[11:45:50] <wikibugs>	 10Operations, 10Gerrit: Editing in Gerrit isn't saved after the update/migration to gerrit1001 - https://phabricator.wikimedia.org/T236143 (10MoritzMuehlenhoff)
[11:47:17] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10MoritzMuehlenhoff) Nothing critical, but this happens after the update/migration: https://phabricator.wiki...
[11:48:14] <wikibugs>	 10Operations, 10Patch-For-Review: Migrate dbmonitor hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224589 (10jcrespo) a:03jcrespo
[11:48:17] <wikibugs>	 (03PS2) 10Muehlenhoff: Check for both kvm/qemu in systemd-detec-virt [puppet] - 10https://gerrit.wikimedia.org/r/545254
[11:51:01] <wikibugs>	 (03PS3) 10Jcrespo: mariadb/backups: Prepare dbmonitor[12]001 to reimage to buster [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589)
[11:51:02] <hashar>	 !log Restarted CI Jenkins on  contint1001
[11:51:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:51:24] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: rename k8s::apilb role/profile to k8s::haproxy [puppet] - 10https://gerrit.wikimedia.org/r/544191 (https://phabricator.wikimedia.org/T234037) (owner: 10Arturo Borrero Gonzalez)
[11:52:21] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb/backups: Prepare dbmonitor[12]001 to reimage to buster [puppet] - 10https://gerrit.wikimedia.org/r/545246 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[11:54:21] <wikibugs>	 (03PS3) 10Muehlenhoff: Check for both kvm/qemu in systemd-detec-virt [puppet] - 10https://gerrit.wikimedia.org/r/545254
[11:57:17] <liw>	 !log starting to cut branch for train 1.35-wmf.3
[11:57:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T1200)
[12:02:00] <wikibugs>	 10Operations, 10MediaWiki-Maintenance-scripts, 10cloud-services-team (Kanban): processEchoEmailBatch.php failing for labtestwiki - https://phabricator.wikimedia.org/T236145 (10Marostegui)
[12:12:23] <wikibugs>	 (03PS1) 10Awight: Reference Previews: full beta deployment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545260 (https://phabricator.wikimedia.org/T235083)
[12:14:59] <jynus>	 !log reimage to buster dbmonitor2001.wikimedia.org T224589
[12:15:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:15:03] <stashbot>	 T224589: Migrate dbmonitor hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224589
[12:15:20] <wikibugs>	 (03PS1) 10Paladox: Test inline editor [puppet] - 10https://gerrit.wikimedia.org/r/545262
[12:15:48] <wikibugs>	 (03PS2) 10Paladox: Test inline editor [puppet] - 10https://gerrit.wikimedia.org/r/545262
[12:16:31] <wikibugs>	 (03Abandoned) 10Paladox: Test inline editor [puppet] - 10https://gerrit.wikimedia.org/r/545262 (owner: 10Paladox)
[12:19:36] <wikibugs>	 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on elastic1046 - https://phabricator.wikimedia.org/T228606 (10Cmjohnson) @wiki_willy can you order a new disk?  I tried logging in but being prompted for a password so I cannot get disk info.
[12:19:49] <wikibugs>	 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on elastic1046 - https://phabricator.wikimedia.org/T228606 (10Cmjohnson) a:05Cmjohnson→03wiki_willy
[12:20:35] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: db1105 rebooted itself - https://phabricator.wikimedia.org/T235877 (10Cmjohnson) Updated all F/W on db1105 - Raid -Bios - Backplane - Idrac
[12:20:53] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Temporary pool pc1010 in pc1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545264
[12:22:10] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: db1105 rebooted itself - https://phabricator.wikimedia.org/T235877 (10Marostegui) Thank you Chris!
[12:22:26] <icinga-wm>	 RECOVERY - MariaDB Slave IO: pc1 on pc2010 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[12:22:49] <wikibugs>	 10Operations, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui)
[12:22:56] <icinga-wm>	 RECOVERY - MariaDB Slave IO: pc1 on pc2007 is OK: OK slave_io_state Slave_IO_Running: Yes https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave
[12:23:24] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10Jclark-ctr) a:05Cmjohnson→03RobH finished PDU Maintenance . Netbox updated with new PDU
[12:23:43] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Temporary pool pc1010 in pc1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545264 (owner: 10Marostegui)
[12:24:32] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Temporary pool pc1010 in pc1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545264 (owner: 10Marostegui)
[12:25:56] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool pc1007 after PDU maintenance T227142 (duration: 00m 50s)
[12:26:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:26:01] <stashbot>	 T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142
[12:27:38] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[12:27:39] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[12:27:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:28:59] <marostegui>	 !log Compress db1096:3315 
[12:29:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:29:40] <moritzm>	 !log rebooting miscweb2001 for some microcode tests
[12:29:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:30:33] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db2089:3315', diff saved to https://phabricator.wikimedia.org/P9429 and previous config saved to /var/cache/conftool/dbconfig/20191022-123032-marostegui.json
[12:30:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:32:58] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1105:3312 and db1105:3311 after on-site maintenance T235877', diff saved to https://phabricator.wikimedia.org/P9430 and previous config saved to /var/cache/conftool/dbconfig/20191022-123257-marostegui.json
[12:33:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:33:03] <stashbot>	 T235877: db1105 rebooted itself - https://phabricator.wikimedia.org/T235877
[12:33:50] <icinga-wm>	 RECOVERY - HTTP-dbtree on dbmonitor2001 is OK: HTTP OK: HTTP/1.1 200 OK - 10975 bytes in 7.304 second response time https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org
[12:37:59] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1096:3316 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9431 and previous config saved to /var/cache/conftool/dbconfig/20191022-123757-marostegui.json
[12:38:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:43:00] <icinga-wm>	 PROBLEM - Check the Netbox report librenms for fail status. on netbox1001 is CRITICAL: librenms.LibreNMS CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[12:43:36] <wikibugs>	 (03PS1) 10Muehlenhoff: Extend black list for unsupported CPUs [puppet] - 10https://gerrit.wikimedia.org/r/545267
[12:44:57] <wikibugs>	 10Operations, 10MediaWiki-extensions-PdfHandler, 10Multimedia: Error creating PDF on Commons: "convert: no decode delegate for this image format" (fixed in GS 9.07) - https://phabricator.wikimedia.org/T50007 (10Seb35) 05Resolved→03Open I reopen this task with a better proposed resolution, and possibly al...
[12:45:25] <wikibugs>	 (03PS1) 10Cmjohnson: Adding mgmt dns for dumpsdata1003 [dns] - 10https://gerrit.wikimedia.org/r/545268 (https://phabricator.wikimedia.org/T234076)
[12:46:01] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: ATS-tls nodes on the text cluster have a slightly higher rate of failed fetches on varnish-fe - https://phabricator.wikimedia.org/T234887 (10jijiki) It appears we are having fetch errors, possibly due to timeouts as well mostly on two servers where we have enabled...
[12:46:10] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9432 and previous config saved to /var/cache/conftool/dbconfig/20191022-124607-marostegui.json
[12:46:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:49:05] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Extend black list for unsupported CPUs [puppet] - 10https://gerrit.wikimedia.org/r/545267 (owner: 10Muehlenhoff)
[12:54:38] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'More traffic to db1096:3316 db1105:3311 instance db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9433 and previous config saved to /var/cache/conftool/dbconfig/20191022-125435-marostegui.json
[12:54:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:58:12] <wikibugs>	 (03PS1) 10Ayounsi: Depool esams for onsite work [dns] - 10https://gerrit.wikimedia.org/r/545270
[13:00:04] <jouncebot>	 liw and brennen: Time to snap out of that daydream and deploy Mediawiki train - European Version. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T1300).
[13:01:52] <liw>	 branch cutting is still running
[13:04:13] <wikibugs>	 (03PS2) 10CDanis: Depool esams for onsite work [dns] - 10https://gerrit.wikimedia.org/r/545270 (https://phabricator.wikimedia.org/T235805) (owner: 10Ayounsi)
[13:04:21] <wikibugs>	 (03CR) 10Ema: [C: 03+1] Depool esams for onsite work [dns] - 10https://gerrit.wikimedia.org/r/545270 (https://phabricator.wikimedia.org/T235805) (owner: 10Ayounsi)
[13:05:26] <wikibugs>	 10Operations, 10Traffic, 10Patch-For-Review: ATS-tls nodes on the text cluster have a slightly higher rate of failed fetches on varnish-fe - https://phabricator.wikimedia.org/T234887 (10Vgutierrez) I highly suspect that's related to stricter timeouts on ats-be compared to varnish-be and atls-tls, that would...
[13:05:36] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Depool esams for onsite work [dns] - 10https://gerrit.wikimedia.org/r/545270 (https://phabricator.wikimedia.org/T235805) (owner: 10Ayounsi)
[13:05:57] <logmsgbot>	 !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1096:3316 db1105:3311 db1105:3312 after PDU and on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9434 and previous config saved to /var/cache/conftool/dbconfig/20191022-130556-marostegui.json
[13:06:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:13] <XioNoX>	 !log depool esams for onsite work - T235805
[13:06:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:06:19] <stashbot>	 T235805: ESAMS Refresh/Rebuild (October 2019) - https://phabricator.wikimedia.org/T235805
[13:06:32] <wikibugs>	 10Operations, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui)
[13:06:34] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: db1105 rebooted itself - https://phabricator.wikimedia.org/T235877 (10Marostegui) 05Open→03Resolved Host fully repooled in production. Thanks Chris!
[13:10:36] <wikibugs>	 (03PS1) 10Jcrespo: dbmonitor: Install the right apache module packages for >jessie [puppet] - 10https://gerrit.wikimedia.org/r/545273 (https://phabricator.wikimedia.org/T224589)
[13:13:41] <logmsgbot>	 !log ayounsi@cumin1001 START - Cookbook sre.hosts.downtime
[13:13:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:50] <logmsgbot>	 !log ayounsi@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[13:13:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:13:55] <wikibugs>	 10Operations, 10ops-esams, 10DC-Ops: ESAMS Refresh/Rebuild (October 2019) - https://phabricator.wikimedia.org/T235805 (10ops-monitoring-bot) Icinga downtime for 2:00:00 set by ayounsi@cumin1001 on 28 host(s) and their services with reason: Onsite work (asw) ` bast3002.wikimedia.org,cp[3007-3008,3010,3030,303...
[13:15:16] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 53.83 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[13:18:41] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/545273 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[13:24:09] <icinga-wm>	 PROBLEM - Kartotherian LVS eqiad #page on kartotherian.svc.eqiad.wmnet is CRITICAL: /v4/marker/pin-m-fuel+ffffff.png (Untitled test) timed out before a response was received: /v4/marker/pin-m-fuel+ffffff@2x.png (scaled pushpin marker with an icon) timed out before a response was received https://wikitech.wikimedia.org/wiki/Maps%23Kartotherian
[13:24:12] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] dbmonitor: Install the right apache module packages for >jessie [puppet] - 10https://gerrit.wikimedia.org/r/545273 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[13:24:35] <godog>	 womp womp kartotherian
[13:24:41] <gehel>	 ^ looking at kartotherian
[13:24:52] <godog>	 thanks gehel !
[13:25:02] <jynus>	 kartotherian
[13:25:05] <onimisionipe>	 hmm
[13:25:09] <cdanis>	 all maps machines in eqiad has been pegged on CPU for ~20 minutes
[13:25:14] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - kartotherian-ssl_443: Servers maps1004.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[13:25:20] <cdanis>	 network traffic increased as well
[13:25:56] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-upload site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:25:57] <gehel>	 tile generation is at its usual rate
[13:26:26] <onimisionipe>	 load is super high
[13:26:46] <wikibugs>	 (03PS1) 10Lars Wirzenius: Group0 to 1.35.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545281
[13:26:51] <onimisionipe>	 seems our friend is back
[13:27:02] <icinga-wm>	 PROBLEM - Maps HTTPS on maps1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Maps/RunBook
[13:27:58] <gehel>	 lot of errors similar to `[2019-10-22T13:27:24.731Z] ERROR: kartotherian/12073 on maps1003: Bad geojson - unknown type object (err.levelPath=error)
[13:27:58] <gehel>	     Error: Bad geojson - unknown type object
[13:27:58] <gehel>	 `
[13:28:30] <logmsgbot>	 !log liw@deploy1001 Started scap: testwiki to php-1.34.0-wmf.3 and rebuild l10n cache
[13:28:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:28:34] <icinga-wm>	 RECOVERY - Maps HTTPS on maps1002 is OK: HTTP OK: HTTP/1.1 200 OK - 1329 bytes in 6.199 second response time https://wikitech.wikimedia.org/wiki/Maps/RunBook
[13:29:57] <wikibugs>	 (03CR) 10Ottomata: "If we do put it on a Ganeti VM, Could/should search use their own MySQL instance there instead of the analytics-meta one?" [puppet] - 10https://gerrit.wikimedia.org/r/544989 (owner: 10EBernhardson)
[13:30:48] <liw>	 I seem to be running late with the group0 deployment; should I overrun the train time slot or pause and continue later?
[13:30:54] <liw>	 thcipriani, ^
[13:31:26] <wikibugs>	 (03CR) 10Ottomata: cumin: update which server is the kafka-main canary (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545094 (owner: 10Dzahn)
[13:31:38] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[13:32:06] <ema>	 hitrate slowly recovering https://grafana.wikimedia.org/d/000000500/varnish-caching?refresh=15m&orgId=1&from=now-3h&to=now&var-cluster=cache_text&var-cluster=cache_upload&var-site=codfw&var-site=eqiad&var-site=ulsfo&var-site=eqsin&var-status=1&var-status=2&var-status=3&var-status=4&var-status=5
[13:32:07] <thcipriani>	 liw: I would overrun the timeslot. Especially since nothing is scheduled for the next 2 hours or so
[13:32:53] <liw>	 ack, thanks
[13:32:55] <icinga-wm>	 RECOVERY - Kartotherian LVS eqiad #page on kartotherian.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Maps%23Kartotherian
[13:33:30] <icinga-wm>	 PROBLEM - Maps HTTPS on maps1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Maps/RunBook
[13:33:58] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-upload site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:34:49] <ema>	 ignore the esams availability alert, the DC is depooled 
[13:35:10] <logmsgbot>	 !log liw@deploy1001 scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="labtestwiki" --outdir="/tmp/scap_l10n_2419219323" --threads=30 --lang en  --quiet' returned non-zero exit status 1 (duration: 06m 40s)
[13:35:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:35:18] <volans>	 ema: would be nice if the check could check for depooled DCs
[13:35:23] <wikibugs>	 10Operations, 10Performance-Team, 10serviceops: Increased latency in POST requests - https://phabricator.wikimedia.org/T235755 (10Gilles) https://grafana.wikimedia.org/d/000000580/apache-backend-timing?orgId=1&from=now-30d&to=now  {F30875423, size=full}  Is this still an issue? The above distribution of back...
[13:35:32] <wikibugs>	 10Operations, 10Patch-For-Review: Migrate dbmonitor hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224589 (10jcrespo) 2 blockers:  * Exec of `/usr/sbin/a2enmod php7.0` fails, as ther right module would be php7.3- No support for buster on the http module? `Httpd/Httpd::Mod_conf[php7.0]/Exec[ensure...
[13:36:38] <icinga-wm>	 RECOVERY - Maps HTTPS on maps1002 is OK: HTTP OK: HTTP/1.1 200 OK - 1329 bytes in 5.853 second response time https://wikitech.wikimedia.org/wiki/Maps/RunBook
[13:36:58] <gehel>	 again, we don't have much more traffic than usual
[13:37:10] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:37:12] <gehel>	 so probably again some specific queries going wild
[13:37:48] <icinga-wm>	 PROBLEM - Kartotherian LVS eqiad #page on kartotherian.svc.eqiad.wmnet is CRITICAL: /v4/marker/pin-m-fuel+ffffff@2x.png (scaled pushpin marker with an icon) timed out before a response was received https://wikitech.wikimedia.org/wiki/Maps%23Kartotherian
[13:38:15] <liw>	 hm, scap synced testwiki version change, but https://test.wikipedia.org/wiki/Special:Version still shows -wmf.2, not -wmf.3
[13:38:17] <wikibugs>	 10Operations, 10Patch-For-Review: Migrate dbmonitor hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224589 (10Joe) >>! In T224589#5595135, @jcrespo wrote: > 2 blockers: >  > * Exec of `/usr/sbin/a2enmod php7.0` fails, as ther right module would be php7.3- No support for buster on the http module?...
[13:38:30] <liw>	 thcipriani, hashar, help?
[13:38:47] <volans>	 liw: there was a failed log above ~3m ago
[13:38:56] <gehel>	 !log silencing LVS check for katotherian (we know there is an issue) - T236163
[13:38:56] <wikibugs>	 10Operations, 10Maps: Maps servers overloaded in eqiad - https://phabricator.wikimedia.org/T236163 (10Gehel)
[13:38:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:59] <stashbot>	 T236163: Maps servers overloaded in eqiad - https://phabricator.wikimedia.org/T236163
[13:39:12] <liw>	 dang, I should not have /ignored so much...
[13:39:19] <icinga-wm>	 RECOVERY - Kartotherian LVS eqiad #page on kartotherian.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Maps%23Kartotherian
[13:39:23] <liw>	 volans, can you copy it for me?
[13:39:25] <hashar>	 deploy1001:/srv/mediawiki-staging$ grep '"testwiki"' wikiversions.json 
[13:39:25] <hashar>	     "testwiki": "php-1.35.0-wmf.2",
[13:39:46] <_joe_>	 gehel: to be frank, either some engineering time is spent on maps, or I will disable paging
[13:39:47] <hashar>	 $ grep wmf.3 wikiversions.json 
[13:39:47] <hashar>	     "labtestwiki": "php-1.35.0-wmf.3",
[13:39:53] <hashar>	 so you changed wikitech :D
[13:40:01] <gehel>	 _joe_: I agree!
[13:40:04] <_joe_>	 I think the last time someone invested engineering time on maps was... the last outage
[13:40:10] <_joe_>	 us that time too
[13:40:22] <hashar>	 well not wikitech sorry
[13:40:31] <hashar>	 liw: you changed the wrong wiki "labstestwiki"  instead of "testwiki" :]
[13:40:38] <liw>	 hashar, oh crap
[13:40:56] <hashar>	 not a big deal
[13:40:57] <liw>	 hashar, I fix wikiversions and re-run?
[13:41:00] <hashar>	 so yeah fix it
[13:41:07] <hashar>	 then I think it is scap sync-wikiversion
[13:41:18] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - kartotherian-ssl_443: Servers maps1002.eqiad.wmnet, maps1004.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[13:42:00] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-upload site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:42:10] <_joe_>	 gehel: ok lemme do it in a few then
[13:42:10] <wikibugs>	 10Operations, 10Maps: Maps servers overloaded in eqiad - https://phabricator.wikimedia.org/T236163 (10Gehel)
[13:42:57] <gehel>	 _joe_: we probably don't have a good way to selectively silence some of the pybal checks...
[13:43:07] <wikibugs>	 (03PS1) 10Jcrespo: dbmonitor: Deploy git repo as mwdeploy, otherwise no write permission [puppet] - 10https://gerrit.wikimedia.org/r/545282 (https://phabricator.wikimedia.org/T224589)
[13:43:16] <_joe_>	 gehel: we do
[13:43:20] <liw>	 hashar, I run "scap sync-wikiversions" not "scap sync..." as the train page says?
[13:44:30] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[13:45:58] <liw>	 thcipriani, ^ see q for h.ashar
[13:46:29] <thcipriani>	 liw: /me looks
[13:46:48] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-upload site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:46:58] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: lvs: do not page on karthoterian unavailability [puppet] - 10https://gerrit.wikimedia.org/r/545285
[13:47:06] <_joe_>	 gehel: ^^
[13:47:36] <wikibugs>	 (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/545285 (owner: 10Giuseppe Lavagetto)
[13:47:48] <XioNoX>	 that's probably because esams is depooled
[13:48:24] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:48:31] <thcipriani>	 liw: oh, I see scap failed. Which server did if fail on?
[13:48:53] <icinga-wm>	 ACKNOWLEDGEMENT - Confd template for /var/lib/gdnsd/discovery-kibana.state on authdns1001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-kibana.state Ema Known, to be fixed after esams repool https://wikitech.wikimedia.org/wiki/Confd
[13:48:53] <icinga-wm>	 ACKNOWLEDGEMENT - Confd template for /var/lib/gdnsd/discovery-kibana.state on authdns2001 is CRITICAL: File not found: /var/lib/gdnsd/discovery-kibana.state Ema Known, to be fixed after esams repool https://wikitech.wikimedia.org/wiki/Confd
[13:49:23] <wikibugs>	 (03CR) 10CDanis: [C: 03+1] lvs: do not page on karthoterian unavailability [puppet] - 10https://gerrit.wikimedia.org/r/545285 (owner: 10Giuseppe Lavagetto)
[13:49:58] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-upload site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:50:08] <liw>	 thcipriani, I don't know, I had the bots on ignore :( (unignored now, but channels is difficult to follow)
[13:50:47] <gehel>	 accordig to turnilo, there does not seem to be a specific URL that timeouts: 
[13:50:47] <gehel>	 https://turnilo.wikimedia.org/#webrequest_sampled_128/4/N4IgbglgzgrghgGwgLzgFwgewHYgFwhLYCmAtAMYAWcATmiADQgYC2xyOx+IAomuQHoAqgBUAwoxAAzCAjTEaUfAG1QaAJ4AHLgVZcmNYlO4B9E3sl6ASnGwBzYkryqQUNLXoEATAAYAjACcpH4+pF5eIj4A7Hg+PrE+AHRxPgBaksTYACbcvoHBoeEifgDMCQnJcekAvgC61QxqWjquaDQQ9pKGxgQw7SaUmG6ScOQYONwdkmCIMI4qICxwmlCJAO4QANYQbFkQcImYNHYgtUzYmJ5SiFDEDU3a3G7tnQZG3JRoaJombugwSiYo3GuAIUyYMwQcycyhAABYvAFTudLvhrghbnUmFBN
[13:50:47] <gehel>	 Eg0DCHi1nh0Tkw9mxsFAsKCQH0ICZNOhKJIoEdPKBuh8IPjLM0ngoIPMyRBDGNqdwso5yJk9q8QNp2pgcgQQA1CDtufhsDAEAh7sw+bodvoQOTMlSJgQzBYmHYaLYdbRuepuAAFEQAVgAsiy2fgOe8reZjbzHgQzZTxcLRSDuHAoNLsiTVUwkCxNXhtbqsa4BfNnAGpApMtKuTymFIjkt2Qaw6ajHAdfQIbMWinay02PG+lwc5oOiQsgARY2RnAws7ygfELIAZT9BEo3MBhGIDmy/tNo4tNLpDKZkjTGY9Lah+azCCY1CgADkdQg0Tc7leIHZKEg354L9UgA=
[13:50:56] <gehel>	 ofc, crappy urls
[13:51:08] <thcipriani>	 liw: should say in your console
[13:51:35] <liw>	 thcipriani, which console were?
[13:51:55] <cdanis>	 gehel: https://w.wiki is good for this
[13:52:24] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - kartotherian-ssl_443: Servers maps1003.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[13:52:31] <gehel>	 accordig to turnilo, there does not seem to be a specific URL that timeouts: https://w.wiki/AaP
[13:52:35] <andrewbogott>	 !log restarted slapd on ldap-eqiad-replica01
[13:52:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:52:42] <gehel>	 cdanis: thx!
[13:52:52] <cdanis>	 gehel: also wasn't the page after 13:00?
[13:53:12] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:53:19] <librenms-wmf>	 04Critical Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%
[13:53:37] <gehel>	 cdanis: right! data not yet in turnilo it seems
[13:54:00] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[13:54:03] <cdanis>	 yeah the webrequest table has an hour or two delay
[13:54:46] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-upload site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:55:02] <gehel>	 looks like a bunch of requests similar to https://maps.wikimedia.org/osm-liber/%7Bz%7D/%7Bx%7D/%7By%7D.png are taking more time than expected
[13:56:45] <gehel>	 looks like a client failing to process its placeholders
[13:59:07] <bblack>	 so we fixed something similar before
[13:59:11] <bblack>	 with some regex filtering
[13:59:34] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[13:59:49] <gehel>	 bblack: yep
[14:00:36] <bblack>	 what's new?
[14:00:56] <gehel>	 looks like mateusbs17 has a fix ready to be deployed
[14:01:17] <gehel>	 which pushes back the input validation in kartotherian
[14:01:36] <gehel>	 and hopefully is more exhaustive than what we currently have in varnish
[14:01:43] <wikibugs>	 (03PS1) 10Jcrespo: dbmonitor: Install the right apache modules for buster [puppet] - 10https://gerrit.wikimedia.org/r/545286 (https://phabricator.wikimedia.org/T224589)
[14:04:02] <wikibugs>	 10Operations, 10Patch-For-Review: Migrate dbmonitor hosts to Stretch/Buster - https://phabricator.wikimedia.org/T224589 (10jcrespo) Thanks, joe, I didn't see your comment so it tool me more time than I thought to find it. The above 2 patches should fix it?
[14:04:19] <liw>	 thcipriani, thanks for the help
[14:04:33] <wikibugs>	 (03PS1) 10Ema: kibana: add discovery record [dns] - 10https://gerrit.wikimedia.org/r/545287 (https://phabricator.wikimedia.org/T227432)
[14:04:35] <liw>	 https://phabricator.wikimedia.org/T236166 - reported, train can't continue to group0 before this is fixed
[14:05:01] <wikibugs>	 (03PS1) 10BBlack: Move GeoDNS default from eqiad to codfw [dns] - 10https://gerrit.wikimedia.org/r/545288 (https://phabricator.wikimedia.org/T235805)
[14:05:21] <thcipriani>	 liw: happy to help. it's a strange issue
[14:06:00] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-upload site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[14:06:26] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Move GeoDNS default from eqiad to codfw [dns] - 10https://gerrit.wikimedia.org/r/545288 (https://phabricator.wikimedia.org/T235805) (owner: 10BBlack)
[14:06:30] <XioNoX>	 gehel: are things under control? Waiting to do my maintenance
[14:06:54] <gehel>	 XioNoX: not entirely under control yet, but we have a good idea of how to fix it
[14:07:36] <gehel>	 your maintenance will probably not make things worse, but the noise from maps might hide some other problems
[14:07:36] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[14:08:19] <librenms-wmf>	 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%
[14:08:30] <XioNoX>	 thinking of the opposite, noise from my esams maintenance might flood irc
[14:08:37] <wikibugs>	 (03PS1) 10Jbond: CI rspec: update puppet version used in spec tests [puppet] - 10https://gerrit.wikimedia.org/r/545289 (https://phabricator.wikimedia.org/T228657)
[14:09:12] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-upload site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[14:09:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] CI rspec: update puppet version used in spec tests [puppet] - 10https://gerrit.wikimedia.org/r/545289 (https://phabricator.wikimedia.org/T228657) (owner: 10Jbond)
[14:09:41] <wikibugs>	 (03CR) 10BBlack: [C: 03+1] kibana: add discovery record [dns] - 10https://gerrit.wikimedia.org/r/545287 (https://phabricator.wikimedia.org/T227432) (owner: 10Ema)
[14:10:50] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[14:11:21] <gehel>	 XioNoX: we should have a patch deployed in ~15 minutes
[14:11:31] <XioNoX>	 ok, thx
[14:11:42] <gehel>	 XioNoX: don't let maps stop you, you can't make the situation worse :)
[14:12:26] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-upload site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[14:13:34] <XioNoX>	 !log restart asw-esams for onsite work
[14:13:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:53] <wikibugs>	 (03PS2) 10Jbond: CI rspec: update puppet version used in spec tests [puppet] - 10https://gerrit.wikimedia.org/r/545289 (https://phabricator.wikimedia.org/T228657)
[14:14:01] <godog>	 I'll silence the availability alerts for esams
[14:16:12] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 70.47 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[14:16:19] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] CI rspec: update puppet version used in spec tests [puppet] - 10https://gerrit.wikimedia.org/r/545289 (https://phabricator.wikimedia.org/T228657) (owner: 10Jbond)
[14:16:34] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - kartotherian-ssl_443: Servers maps1004.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[14:17:54] <icinga-wm>	 PROBLEM - Host lvs3001.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:17:54] <icinga-wm>	 PROBLEM - Host lvs3002.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:18:04] <icinga-wm>	 PROBLEM - Host lvs3003.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:18:04] <icinga-wm>	 PROBLEM - Host lvs3004.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:18:06] <icinga-wm>	 PROBLEM - Host maerlant.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:18:16] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:18:20] <icinga-wm>	 PROBLEM - Host ns2-v4 is DOWN: PING CRITICAL - Packet loss = 100%
[14:18:50] <icinga-wm>	 PROBLEM - Host cp3040.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:19:04] <icinga-wm>	 PROBLEM - Router interfaces on cr2-knams is CRITICAL: CRITICAL: host 91.198.174.246, interfaces up: 39, down: 4, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:19:20] <icinga-wm>	 PROBLEM - Host multatuli.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:19:28] <icinga-wm>	 PROBLEM - Host cp3010.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:19:28] <icinga-wm>	 PROBLEM - Host cp3032.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[14:19:48] <icinga-wm>	 PROBLEM - Host nescio.mgmt is DOWN: CRITICAL - Time to live exceeded (10.21.0.111)
[14:20:44] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[14:20:46] <icinga-wm>	 RECOVERY - Router interfaces on cr2-knams is OK: OK: host 91.198.174.246, interfaces up: 59, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[14:21:08] <icinga-wm>	 RECOVERY - Host ns2-v4 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms
[14:22:53] <wikibugs>	 (03PS3) 10Jbond: CI rspec: update puppet version used in spec tests [puppet] - 10https://gerrit.wikimedia.org/r/545289 (https://phabricator.wikimedia.org/T228657)
[14:23:36] <icinga-wm>	 RECOVERY - Host lvs3001.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.09 ms
[14:23:36] <icinga-wm>	 RECOVERY - Host lvs3002.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.48 ms
[14:23:46] <volans>	 XioNoX: {done} but I guess slighty too late
[14:23:46] <icinga-wm>	 RECOVERY - Host lvs3003.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.46 ms
[14:23:46] <icinga-wm>	 RECOVERY - Host lvs3004.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.82 ms
[14:23:48] <icinga-wm>	 RECOVERY - Host maerlant.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.67 ms
[14:24:32] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 29.13 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[14:24:32] <icinga-wm>	 RECOVERY - Host cp3040.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.31 ms
[14:24:54] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - kartotherian-ssl_443: Servers maps1003.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[14:25:00] <icinga-wm>	 RECOVERY - Host multatuli.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.24 ms
[14:25:08] <icinga-wm>	 RECOVERY - Host cp3010.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.05 ms
[14:25:08] <icinga-wm>	 RECOVERY - Host cp3032.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.08 ms
[14:25:09] <XioNoX>	 volans: nop, will need a restart soon ish again
[14:25:22] <icinga-wm>	 PROBLEM - BFD status on cr2-knams is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:25:28] <icinga-wm>	 RECOVERY - Host nescio.mgmt is UP: PING OK - Packet loss = 0%, RTA = 84.65 ms
[14:25:30] <volans>	 ok they are downtimed for 2h, lmk if you need more XioNoX 
[14:26:17] <XioNoX>	 thx
[14:29:40] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1015 is CRITICAL: PYBAL CRITICAL - CRITICAL - kartotherian-ssl_443: Servers maps1001.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[14:29:44] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:31:16] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1015 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[14:33:20] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Security-Team, 10Patch-For-Review: (2019-09) Create secteam groups in admin.yaml and define permissions - https://phabricator.wikimedia.org/T223463 (10sbassett) a:05sbassett→03chasemp
[14:33:34] <wikibugs>	 (03PS1) 10BBlack: Move most North American traffic westwards [dns] - 10https://gerrit.wikimedia.org/r/545294 (https://phabricator.wikimedia.org/T235805)
[14:34:08] <wikibugs>	 (03CR) 10Jcrespo: "Ignore the mysql package, that is supposed to be deleted as soon as it goes unused: https://phabricator.wikimedia.org/T162070" [puppet] - 10https://gerrit.wikimedia.org/r/545289 (https://phabricator.wikimedia.org/T228657) (owner: 10Jbond)
[14:35:56] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Move most North American traffic westwards [dns] - 10https://gerrit.wikimedia.org/r/545294 (https://phabricator.wikimedia.org/T235805) (owner: 10BBlack)
[14:37:40] <wikibugs>	 (03CR) 10Jbond: [C: 03+1] Extend wmf-userschema for additional MFA options: [puppet] - 10https://gerrit.wikimedia.org/r/543402 (owner: 10Muehlenhoff)
[14:39:26] <wikibugs>	 (03PS1) 10Ayounsi: Revert "Depool esams for onsite work" [dns] - 10https://gerrit.wikimedia.org/r/545298
[14:39:54] <icinga-wm>	 RECOVERY - BFD status on cr2-knams is OK: OK: UP: 8 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[14:40:28] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Scoring-platform-team: Grant LDAP groups and deployment shell access to Kevin Bazira - https://phabricator.wikimedia.org/T234209 (10Halfak) Yes he is staff.  He'll be pulling data from MariaDB for use training ORES models.
[14:42:07] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM overall, see inline. Also tests should be included for sth this critical IMHO" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/544220 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo)
[14:42:59] <logmsgbot>	 !log mbsantos@deploy1001 Started deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0
[14:43:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:43:09] <mateusbs17>	 gehel ^
[14:43:31] <gehel>	 mateusbs17: kool, let's see if the load goes down...
[14:45:41] <logmsgbot>	 !log mbsantos@deploy1001 Finished deploy [kartotherian/deploy@85ea6e1]: Deploy kartotherian 1.1.5-wmf.0 (duration: 02m 44s)
[14:45:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:47:14] <icinga-wm>	 PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-upload site=eqiad https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[14:48:48] <liw>	 the blocker got downgraded, continuing to deploy to group0
[14:48:50] <icinga-wm>	 RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d
[14:50:10] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[14:50:35] <logmsgbot>	 !log liw@deploy1001 Started scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache
[14:50:38] <wikibugs>	 (03PS3) 10Mforns: analytics::refinery::job::data_purge: Add timer to delete old MWH dumps [puppet] - 10https://gerrit.wikimedia.org/r/539151 (https://phabricator.wikimedia.org/T208612)
[14:50:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:36] <bblack>	 !log stopping puppet and pybal on lvs1014 (upload+maps traffic to 1016)
[14:52:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:53:08] <wikibugs>	 (03CR) 10Jcrespo: "> when a failed backup is detected log it" [puppet] - 10https://gerrit.wikimedia.org/r/544220 (https://phabricator.wikimedia.org/T234900) (owner: 10Jcrespo)
[14:55:47] <gehel>	 new kartotherian version deployed, load on maps servers is going down
[14:56:37] <wikibugs>	 10Operations, 10MediaWiki-Maintenance-scripts, 10cloud-services-team (Kanban): processEchoEmailBatch.php failing for labtestwiki - https://phabricator.wikimedia.org/T236145 (10Andrew) This problem went away after   ` bblack> Brandon Black !log stopping puppet and pybal on lvs1014 (upload+maps traffic to 1016) `
[14:57:28] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1014 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 https://wikitech.wikimedia.org/wiki/PyBal
[14:58:10] <icinga-wm>	 PROBLEM - pybal on lvs1014 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[14:58:54] <bblack>	 ^ known, from my log above
[15:01:22] <logmsgbot>	 !log jmm@cumin2001 START - Cookbook sre.hosts.downtime
[15:01:23] <logmsgbot>	 !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
[15:01:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:00] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs1014 is CRITICAL: CRITICAL: 0 connections established with conf1004.eqiad.wmnet:4001 (min=24) https://wikitech.wikimedia.org/wiki/PyBal
[15:02:13] <wikibugs>	 10Operations, 10MediaWiki-Maintenance-scripts, 10cloud-services-team (Kanban): processEchoEmailBatch.php failing for labtestwiki - https://phabricator.wikimedia.org/T236145 (10Andrew) I anticipate that this will be resolved by https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/
[15:03:49] <moritzm>	 !log rebooting kafka-main1005 for microcode debugging
[15:03:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:57] <liw>	 scap tells me: 15:04:12 Check 'Logstash Error rate for mw1263.eqiad.wmnet' failed: ERROR: 50% OVER_THRESHOLD (Avg. Error rate: Before: 0.03, After: 2.00, Threshold: 1.00)
[15:07:57] <_joe_>	 liw please stop deploying anything
[15:08:03] <_joe_>	 moritzm: likewise, please
[15:08:30] <liw>	 _joe_, scap sycn to testwiki is running, shall I abort_?
[15:09:04] <_joe_>	 liw: no of course but that seems like it's rolling back already given you had a failure surge?
[15:09:12] <wikibugs>	 10Operations, 10Maps: Maps servers overloaded in eqiad - https://phabricator.wikimedia.org/T236163 (10MSantos) 05Open→03Resolved a:03MSantos Deployed new version of kartotherian fixed it https://gerrit.wikimedia.org/r/c/maps/kartotherian/deploy/+/545299
[15:09:35] <liw>	 _joe_, yeah
[15:09:42] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] dbmonitor: Deploy git repo as mwdeploy, otherwise no write permission [puppet] - 10https://gerrit.wikimedia.org/r/545282 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[15:09:57] <_joe_>	 ok so I was asking to not do further deployments for now
[15:10:07] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] dbmonitor: Install the right apache modules for buster [puppet] - 10https://gerrit.wikimedia.org/r/545286 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[15:10:28] <wikibugs>	 (03PS2) 10CRusnov: coherence: Check unracked devices for connected console ports [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/545132
[15:10:30] <bblack>	 !log re-enabling lvs1014 pybal/puppet
[15:10:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:10:54] <icinga-wm>	 RECOVERY - pybal on lvs1014 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[15:11:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545286 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[15:11:50] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:13:10] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs1014 is OK: OK: 24 connections established with conf1004.eqiad.wmnet:4001 (min=24) https://wikitech.wikimedia.org/wiki/PyBal
[15:13:34] <bblack>	 !log re-disabling lvs1014 ...
[15:13:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:17:20] <icinga-wm>	 PROBLEM - pybal on lvs1014 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[15:18:18] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1014 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 https://wikitech.wikimedia.org/wiki/PyBal
[15:20:32] <XioNoX>	 !log rollback ns2 redirect
[15:20:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:21:10] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Scoring-platform-team: Grant LDAP groups and deployment shell access to Kevin Bazira - https://phabricator.wikimedia.org/T234209 (10Nuria) Ok, approved for wmf, analytics-privatedata-users, statistics-privatedata-users on my end
[15:22:00] <icinga-wm>	 PROBLEM - PyBal connections to etcd on lvs1014 is CRITICAL: CRITICAL: 0 connections established with conf1004.eqiad.wmnet:4001 (min=24) https://wikitech.wikimedia.org/wiki/PyBal
[15:24:24] <wikibugs>	 (03CR) 10ArielGlenn: "Woo hoo! :)" [dns] - 10https://gerrit.wikimedia.org/r/545268 (https://phabricator.wikimedia.org/T234076) (owner: 10Cmjohnson)
[15:24:59] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Revert "Depool esams for onsite work" [dns] - 10https://gerrit.wikimedia.org/r/545298 (owner: 10Ayounsi)
[15:25:05] <wikibugs>	 (03CR) 10Ayounsi: [C: 03+2] Revert "Depool esams for onsite work" [dns] - 10https://gerrit.wikimedia.org/r/545298 (owner: 10Ayounsi)
[15:25:09] <wikibugs>	 (03PS2) 10Ayounsi: Revert "Depool esams for onsite work" [dns] - 10https://gerrit.wikimedia.org/r/545298
[15:26:30] <XioNoX>	 !log repool esams
[15:26:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:28:14] <logmsgbot>	 !log liw@deploy1001 Finished scap: testwiki to php-1.35.0-wmf.3 and rebuild l10n cache (duration: 37m 39s)
[15:28:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:34] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is CRITICAL: 49.19 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[15:40:07] <bblack>	 !log rebooting lvs1014
[15:40:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:41:22] <icinga-wm>	 PROBLEM - Host lvs1014 is DOWN: PING CRITICAL - Packet loss = 100%
[15:42:28] <icinga-wm>	 RECOVERY - Host lvs1014 is UP: PING WARNING - Packet loss = 64%, RTA = 0.33 ms
[15:45:42] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1014 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Cant connect to localhost:9090 https://wikitech.wikimedia.org/wiki/PyBal
[15:46:22] <icinga-wm>	 PROBLEM - pybal on lvs1014 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[15:47:29] <bblack>	 !log enable pybal+puppet on rebooted lvs1014
[15:47:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:47:58] <icinga-wm>	 RECOVERY - pybal on lvs1014 is OK: PROCS OK: 1 process with UID = 0 (root), args /usr/sbin/pybal https://wikitech.wikimedia.org/wiki/PyBal
[15:48:52] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10RStallman-legalteam) NDA is signed and on file. Thanks!
[15:48:56] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[15:50:02] <icinga-wm>	 RECOVERY - PyBal connections to etcd on lvs1014 is OK: OK: 24 connections established with conf1004.eqiad.wmnet:4001 (min=24) https://wikitech.wikimedia.org/wiki/PyBal
[15:52:53] <wikibugs>	 10Operations, 10Gerrit: Editing in Gerrit isn't saved after the update/migration to gerrit1001 - https://phabricator.wikimedia.org/T236143 (10MoritzMuehlenhoff) This happened earlier the day, but I cannot currently reproduce it with a freshly created patch.
[15:57:35] <wikibugs>	 (03PS1) 10BBlack: Depool esams to test lvs1014 state [dns] - 10https://gerrit.wikimedia.org/r/545312
[15:58:04] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Depool esams to test lvs1014 state [dns] - 10https://gerrit.wikimedia.org/r/545312 (owner: 10BBlack)
[15:58:32] <bblack>	 !log depooling esams temporarily to test traffic scenario on lvs1014
[15:58:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:59:49] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Nuria) Thank you! @lexnasser: please ping @Dzahn with your e-mail address/user password for wikitech
[16:00:05] <jouncebot>	 godog and _joe_: Dear deployers, time to do the Puppet SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T1600).
[16:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[16:05:04] <icinga-wm>	 PROBLEM - Confd template for /var/lib/gdnsd/discovery-kibana.state on multatuli is CRITICAL: File not found: /var/lib/gdnsd/discovery-kibana.state https://wikitech.wikimedia.org/wiki/Confd
[16:05:38] <wikibugs>	 (03CR) 10Jcrespo: dbmonitor: Install the right apache modules for buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545286 (https://phabricator.wikimedia.org/T224589) (owner: 10Jcrespo)
[16:07:12] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is OK: (C)60 le (W)70 le 100 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:07:32] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/545132 (owner: 10CRusnov)
[16:10:12] <icinga-wm>	 PROBLEM - Check the Netbox report librenms for fail status. on netbox1001 is CRITICAL: librenms.LibreNMS CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[16:10:33] <wikibugs>	 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on elastic1046 - https://phabricator.wikimedia.org/T228606 (10wiki_willy)
[16:10:36] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 38.49 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[16:11:14] <wikibugs>	 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on elastic1046 - https://phabricator.wikimedia.org/T228606 (10wiki_willy) Procurement task created for Rob to order replacement drive.  Thanks, Willy
[16:13:24] <hashar>	 We are going to stop gerrit
[16:13:26] <hashar>	 jouncebot: now
[16:13:26] <jouncebot>	 For the next 0 hour(s) and 46 minute(s): Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T1600)
[16:14:04] <thcipriani>	 !log stopping gerrit to run a fix for T222391
[16:14:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:14:10] <stashbot>	 T222391: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391
[16:15:17] <wikibugs>	 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on elastic1046 - https://phabricator.wikimedia.org/T228606 (10wiki_willy) a:05wiki_willy→03Jclark-ctr
[16:17:52] <icinga-wm>	 PROBLEM - Check systemd state on gerrit1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:18:16] <icinga-wm>	 PROBLEM - SSH access on gerrit1001 is CRITICAL: connect to address 208.80.154.137 and port 29418: Connection refused https://wikitech.wikimedia.org/wiki/Gerrit
[16:18:20] <icinga-wm>	 PROBLEM - Gerrit JSON on gerrit.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - page size 1529 too small - 1529 bytes in 0.011 second response time https://wikitech.wikimedia.org/wiki/Gerrit%23Monitoring
[16:18:26] <icinga-wm>	 PROBLEM - Check systemd state on contint2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:18:54] <icinga-wm>	 PROBLEM - Gerrit Health Check on gerrit.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1529 bytes in 0.008 second response time https://gerrit.wikimedia.org/r/config/server/healthcheck%7Estatus
[16:18:56] <icinga-wm>	 PROBLEM - Check systemd state on deploy2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:19:10] <icinga-wm>	 PROBLEM - Check systemd state on deploy1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:19:26] <icinga-wm>	 PROBLEM - gerrit process on gerrit1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit
[16:19:44] <icinga-wm>	 PROBLEM - Check systemd state on contint1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:20:18] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on gerrit1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn WIP https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:20:18] <icinga-wm>	 ACKNOWLEDGEMENT - SSH access on gerrit1001 is CRITICAL: connect to address 208.80.154.137 and port 29418: Connection refused daniel_zahn WIP https://wikitech.wikimedia.org/wiki/Gerrit
[16:20:18] <icinga-wm>	 ACKNOWLEDGEMENT - gerrit process on gerrit1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site daniel_zahn WIP https://wikitech.wikimedia.org/wiki/Gerrit
[16:20:52] <thcipriani>	 !log restarting gerrit
[16:20:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:21:04] <icinga-wm>	 RECOVERY - gerrit process on gerrit1001 is OK: PROCS OK: 1 process with regex args ^/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war daemon -d /var/lib/gerrit2/review_site https://wikitech.wikimedia.org/wiki/Gerrit
[16:21:06] <icinga-wm>	 RECOVERY - Check systemd state on gerrit1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:21:30] <icinga-wm>	 RECOVERY - SSH access on gerrit1001 is OK: SSH OK - GerritCodeReview_2.15.14-16-g855b179b5f (SSHD-CORE-1.6.0) (protocol 2.0) https://wikitech.wikimedia.org/wiki/Gerrit
[16:21:32] <icinga-wm>	 RECOVERY - Gerrit JSON on gerrit.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 27019 bytes in 0.075 second response time https://wikitech.wikimedia.org/wiki/Gerrit%23Monitoring
[16:21:34] <mutante>	 !log running puppet on deployment servers
[16:21:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:21:40] <icinga-wm>	 RECOVERY - Check systemd state on contint2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:22:06] <icinga-wm>	 RECOVERY - Gerrit Health Check on gerrit.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 864 bytes in 0.044 second response time https://gerrit.wikimedia.org/r/config/server/healthcheck%7Estatus
[16:22:10] <icinga-wm>	 RECOVERY - Check systemd state on deploy2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:22:22] <icinga-wm>	 RECOVERY - Check systemd state on deploy1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:22:58] <icinga-wm>	 RECOVERY - Check systemd state on contint1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:25:12] <wikibugs>	 (03PS2) 10Dzahn: cumin: update which server is the kafka-main canary [puppet] - 10https://gerrit.wikimedia.org/r/545094
[16:25:25] <wikibugs>	 (03CR) 10Dzahn: cumin: update which server is the kafka-main canary (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545094 (owner: 10Dzahn)
[16:28:18] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] mariadb/ferm_misc: allow moscovium to connect to rt database [puppet] - 10https://gerrit.wikimedia.org/r/544079 (https://phabricator.wikimedia.org/T180641) (owner: 10Dzahn)
[16:28:26] <wikibugs>	 (03PS3) 10Dzahn: mariadb/ferm_misc: allow moscovium to connect to rt database [puppet] - 10https://gerrit.wikimedia.org/r/544079 (https://phabricator.wikimedia.org/T180641)
[16:40:25] <wikibugs>	 (03PS1) 10Dzahn: site: turn cobalt into a spare system (Do not merge) [puppet] - 10https://gerrit.wikimedia.org/r/545328
[16:40:28] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: b1-eqiad pdu refresh (Thursday 10/10 @11am UTC) - https://phabricator.wikimedia.org/T227536 (10RobH) 05Open→03Resolved a:05RobH→03None
[16:40:31] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Install new PDUs in rows A/B (Top level tracking task) - https://phabricator.wikimedia.org/T226778 (10RobH)
[16:40:44] <wikibugs>	 10Operations, 10ops-codfw, 10SRE-swift-storage, 10User-fgiunchedi: rack/setup/install ms-be205[1-6].codfw.wmnet - https://phabricator.wikimedia.org/T233638 (10fgiunchedi) 05Open→03Resolved This is completed, hosts are fully in service now.
[16:42:35] <wikibugs>	 10Operations, 10serviceops: decom cobalt - https://phabricator.wikimedia.org/T236187 (10Dzahn)
[16:42:44] <wikibugs>	 10Operations, 10serviceops: decom cobalt - https://phabricator.wikimedia.org/T236187 (10Dzahn) 05Open→03Stalled p:05Triage→03Normal
[16:42:50] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) - https://phabricator.wikimedia.org/T226782 (10RobH) a:05RobH→03Jclark-ctr My understanding of this task state is as follows:  * @Jclark-ctr had to emergency swap out ps1-a1-eqiad due to a failure * he left the old ps2-a1...
[16:43:10] <wikibugs>	 (03PS2) 10Dzahn: site: turn cobalt into a spare system (Do not merge) [puppet] - 10https://gerrit.wikimedia.org/r/545328 (https://phabricator.wikimedia.org/T236187)
[16:43:24] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) - https://phabricator.wikimedia.org/T226782 (10wiki_willy) @RobH - ps2 was swapped last Tuesday on 10/15
[16:44:07] <wikibugs>	 (03PS1) 10Dzahn: ci: remove cobalt from firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/545330 (https://phabricator.wikimedia.org/T236187)
[16:47:49] <wikibugs>	 (03PS1) 10Dzahn: mariadb: remove cobalt from ferm_misc rules [puppet] - 10https://gerrit.wikimedia.org/r/545333 (https://phabricator.wikimedia.org/T236187)
[16:47:51] <wikibugs>	 (03PS1) 10Dzahn: acme_chief: remove cobalt from authorized hosts [puppet] - 10https://gerrit.wikimedia.org/r/545334 (https://phabricator.wikimedia.org/T236187)
[16:47:53] <wikibugs>	 (03PS1) 10Dzahn: gerrit: remove cobalt from ssh known_hosts file [puppet] - 10https://gerrit.wikimedia.org/r/545335 (https://phabricator.wikimedia.org/T236187)
[16:47:55] <wikibugs>	 (03PS1) 10Dzahn: install_server: remove cobalt from DHCP and partman [puppet] - 10https://gerrit.wikimedia.org/r/545336 (https://phabricator.wikimedia.org/T236187)
[16:48:56] <wikibugs>	 (03PS1) 10BBlack: Revert "Move most North American traffic westwards" [dns] - 10https://gerrit.wikimedia.org/r/545338
[16:48:58] <wikibugs>	 (03PS1) 10BBlack: Revert "Move GeoDNS default from eqiad to codfw" [dns] - 10https://gerrit.wikimedia.org/r/545339
[16:49:00] <wikibugs>	 (03PS1) 10BBlack: Revert "Depool esams to test lvs1014 state" [dns] - 10https://gerrit.wikimedia.org/r/545340
[16:49:02] <wikibugs>	 (03PS1) 10RobH: setting new pdu models [puppet] - 10https://gerrit.wikimedia.org/r/545337 (https://phabricator.wikimedia.org/T227142)
[16:50:03] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] setting new pdu models [puppet] - 10https://gerrit.wikimedia.org/r/545337 (https://phabricator.wikimedia.org/T227142) (owner: 10RobH)
[16:50:43] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Revert "Move most North American traffic westwards" [dns] - 10https://gerrit.wikimedia.org/r/545338 (owner: 10BBlack)
[16:50:49] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Revert "Move GeoDNS default from eqiad to codfw" [dns] - 10https://gerrit.wikimedia.org/r/545339 (owner: 10BBlack)
[16:51:40] <bblack>	 !log geodns: moving all "normal" eqiad traffic back to eqiad (in addition to the esams-diverted traffic which is still pointed mostly at eqiad right now)
[16:51:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:52:34] <wikibugs>	 (03PS1) 10Dzahn: gerrit: change gerrit master_host to gerrit1001, remove duplicate [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391)
[16:55:17] <wikibugs>	 (03PS2) 10RobH: setting new pdu models [puppet] - 10https://gerrit.wikimedia.org/r/545337 (https://phabricator.wikimedia.org/T227142)
[16:55:23] <icinga-wm>	 ACKNOWLEDGEMENT - SSH mw1290.mgmt on mw1290.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds daniel_zahn https://phabricator.wikimedia.org/T234153 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[16:56:05] <wikibugs>	 (03CR) 10RobH: [C: 03+2] setting new pdu models [puppet] - 10https://gerrit.wikimedia.org/r/545337 (https://phabricator.wikimedia.org/T227142) (owner: 10RobH)
[16:56:53] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) - https://phabricator.wikimedia.org/T226782 (10RobH) 05Open→03Resolved Ok, just logged in and confirmed the ps1 sees ps2.  the rest was already configured from our deployment of ps1 except the model hadn't been updated....
[16:56:55] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Install new PDUs in rows A/B (Top level tracking task) - https://phabricator.wikimedia.org/T226778 (10RobH)
[16:57:09] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC) - https://phabricator.wikimedia.org/T226782 (10RobH)
[17:00:04] <jouncebot>	 cscott, arlolra, subbu, halfak, and accraze: How many deployers does it take to do Services – Graphoid / Parsoid / Citoid / ORES deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T1700).
[17:01:34] <wikibugs>	 10Operations, 10DC-Ops, 10serviceops: mw1252 - Memory correctable errors -EDAC- - https://phabricator.wikimedia.org/T236190 (10Dzahn)
[17:03:04] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is CRITICAL: 45.35 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:03:32] <wikibugs>	 10Operations, 10DC-Ops, 10serviceops: mw1252 - Memory correctable errors -EDAC- - https://phabricator.wikimedia.org/T236190 (10Dzahn) nothing special in SEL   ` /admin1-> racadm getsel Record:      1 Date/Time:   11/12/2014 09:37:12 Source:      system Severity:    Ok Description: Log cleared. --------------...
[17:04:03] <icinga-wm>	 ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on mw1252 is CRITICAL: 4 ge 4 daniel_zahn https://phabricator.wikimedia.org/T236190 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1252&var-datasource=eqiad+prometheus/ops
[17:04:56] <wikibugs>	 10Operations, 10ops-eqiad: Heating alerts for mw servers in eqiad - https://phabricator.wikimedia.org/T149287 (10Dzahn) please add mw1252 to the list (T236190)
[17:05:39] <wikibugs>	 10Operations, 10DC-Ops, 10serviceops: mw1252 - Memory correctable errors -EDAC- - https://phabricator.wikimedia.org/T236190 (10Dzahn) Support expiry date  Nov. 15, 2017    so i guess we won't fix these anymore.
[17:09:22] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10RobH)
[17:10:37] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10RobH) a:05RobH→03Jclark-ctr @wiki_willy requested I step in and setup the software side of things, but cannot do so as serial to this PDU isn't currently working.  Can you tr...
[17:13:19] <librenms-wmf>	 04Critical Alert for device cr1-eqiad.wikimedia.org - Primary outbound port utilisation over 80%
[17:14:52] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 70.81 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:15:55] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10RobH) the icinga downtime was set to expire in less than an hour, so I've extended it until 2300 GMT.
[17:17:23] <icinga-wm>	 RECOVERY - ps1-a1-eqiad-infeed-load-tower-A-phase-Z on ps1-a1-eqiad is OK: SNMP OK - ps1-a1-eqiad-infeed-load-tower-A-phase-Z 421 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:17:55] <icinga-wm>	 RECOVERY - ps1-a1-eqiad-infeed-load-tower-B-phase-X on ps1-a1-eqiad is OK: SNMP OK - ps1-a1-eqiad-infeed-load-tower-B-phase-X 150 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:17:57] <icinga-wm>	 RECOVERY - ps1-a1-eqiad-infeed-load-tower-B-phase-Y on ps1-a1-eqiad is OK: SNMP OK - ps1-a1-eqiad-infeed-load-tower-B-phase-Y 355 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:19:15] <logmsgbot>	 !log arlolra@deploy1001 Started deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91
[17:19:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:19:59] <wikibugs>	 (03CR) 10BBlack: [C: 03+2] Revert "Depool esams to test lvs1014 state" [dns] - 10https://gerrit.wikimedia.org/r/545340 (owner: 10BBlack)
[17:20:43] <bblack>	 !log geodns: re-pooling esams (at this point, we're entirely back in our "normal" state of affairs)
[17:20:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:23:49] <wikibugs>	 (03PS8) 10Krinkle: [WIP] Convert frankenstein vendor/ into thin local lib/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658
[17:23:55] <wikibugs>	 (03CR) 10Krinkle: "Rebased to resolve composer.lock conflict." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/542658 (owner: 10Krinkle)
[17:26:52] <logmsgbot>	 !log arlolra@deploy1001 Finished deploy [parsoid/deploy@4c64c9c]: Updating Parsoid to cf01d91 (duration: 07m 37s)
[17:26:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:32:39] <icinga-wm>	 RECOVERY - ps1-a1-eqiad-infeed-load-tower-A-phase-X on ps1-a1-eqiad is OK: SNMP OK - ps1-a1-eqiad-infeed-load-tower-A-phase-X 95 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:33:20] <librenms-wmf>	 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr1-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80%
[17:33:37] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is CRITICAL: 58.98 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:34:35] <icinga-wm>	 RECOVERY - ps1-a1-eqiad-infeed-load-tower-A-phase-Y on ps1-a1-eqiad is OK: SNMP OK - ps1-a1-eqiad-infeed-load-tower-A-phase-Y 296 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:36:07] <icinga-wm>	 PROBLEM - Check the Netbox report librenms for fail status. on netbox1001 is CRITICAL: librenms.LibreNMS CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[17:36:27] <icinga-wm>	 RECOVERY - ps1-a1-eqiad-infeed-load-tower-B-phase-Z on ps1-a1-eqiad is OK: SNMP OK - ps1-a1-eqiad-infeed-load-tower-B-phase-Z 373 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[17:36:35] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at codfw on icinga1001 is OK: (C)60 le (W)70 le 79.15 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[17:36:58] <sbassett>	 Hey all (liw thcipriani) - I wanted to do a sec-deploy for T234450 to wmf.2 and wmf.3.  Any current issues where I shouldn't?
[17:37:17] <liw>	 sbassett, yes
[17:37:29] <sbassett>	 liw: Ok, train stuff?
[17:37:46] <thcipriani>	 sbassett: I don't believe wmf.3 made it anywhere just yet. Backporting to wmf.3 is probably sufficient in that case.
[17:37:47] <arlolra>	 !log Updated Parsoid to cf01d91 (T234057, T234768, T235296, T235684, T235563)
[17:37:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:37:56] <stashbot>	 T235563: Link prefix differences between Parsoid/JS & Parsoid/PHP - https://phabricator.wikimedia.org/T235563
[17:37:57] <stashbot>	 T234057: Get rid of hybrid testing code - https://phabricator.wikimedia.org/T234057
[17:37:57] <stashbot>	 T235296: Parsoid/PHP adds a data-sort-value="", lang="" whereas Parsoid/JS doesn't - https://phabricator.wikimedia.org/T235296
[17:37:57] <stashbot>	 T234768: Create Balinese Wikipedia - https://phabricator.wikimedia.org/T234768
[17:37:57] <stashbot>	 T235684: id and fallback id differences - https://phabricator.wikimedia.org/T235684
[17:38:20] <thcipriani>	 sbassett: that is, it's not deployed so best not to scap sync-file or whatever for that branch, but patching on disk is fine
[17:38:51] <liw>	 correct, wmf.3 didn't get even to testwiki today
[17:39:15] <sbassett>	 thcipriani: Core patch - can I drop it in /srv/patches in the relevant wmf.2 and wmf.3 dirs and just deploy to wmf.2?  Or should I just not do anything with wmf.3 right now?
[17:41:00] <brennen>	 sbassett thcipriani liw:  my current thinking is that i will go ahead with train during the american deploy window @ 19:00 UTC.
[17:41:38] <thcipriani>	 sbassett: if you could apply the patch to wmf.3 so we don't have to do it later, that'd be good
[17:41:44] <brennen>	 ^
[17:43:07] <wikibugs>	 (03PS1) 10Paladox: Revert "gerrit: enable jgit gc" [puppet] - 10https://gerrit.wikimedia.org/r/545351 (https://phabricator.wikimedia.org/T236114)
[17:44:39] <sbassett>	 brennen thcipriani: ok, got thumbs up from _security for now.  I'll plan to patch and deploy to wmf.2 and just patch for wmf.3.  Sound good?
[17:44:52] <wikibugs>	 (03CR) 10Thcipriani: [C: 03+1] "Let's do this so we don't lose any data in T236114" [puppet] - 10https://gerrit.wikimedia.org/r/545351 (https://phabricator.wikimedia.org/T236114) (owner: 10Paladox)
[17:45:14] <wikibugs>	 (03CR) 10Dzahn: "https://puppet-compiler.wmflabs.org/compiler1001/18989/" [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) (owner: 10Dzahn)
[17:45:19] <brennen>	 sbassett: sounds good
[17:46:50] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] gerrit: change gerrit master_host to gerrit1001, remove duplicate [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) (owner: 10Dzahn)
[17:50:50] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] Revert "gerrit: enable jgit gc" [puppet] - 10https://gerrit.wikimedia.org/r/545351 (https://phabricator.wikimedia.org/T236114) (owner: 10Paladox)
[17:51:46] <logmsgbot>	 !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213)
[17:51:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:51:50] <stashbot>	 T235213: Optimize talk endpoint performance - https://phabricator.wikimedia.org/T235213
[17:54:14] <mutante>	 !log restarting gerrit to disable jgit gc (T236114)
[17:54:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:54:17] <stashbot>	 T236114: check and fix some Gerrit revs - https://phabricator.wikimedia.org/T236114
[17:57:00] <logmsgbot>	 !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@b4c484a]: Build structured talk pages by walking the DOM (T235213) (duration: 05m 14s)
[17:57:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:57:04] <stashbot>	 T235213: Optimize talk endpoint performance - https://phabricator.wikimedia.org/T235213
[17:57:13] <sbassett>	 !Deployed security fix for T234450 to wmf.2
[17:57:25] <sbassett>	 crap
[17:57:34] <sbassett>	 !log Deployed security fix for T234450 to wmf.2
[17:57:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:59:43] <sbassett>	 !log Uploaded and applied (but did not deploy per releng) security fix for T234450 to wmf.3
[17:59:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:00:04] <jouncebot>	 MaxSem, RoanKattouw, Niharika, and Urbanecm: #bothumor I � Unicode. All rise for Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T1800).
[18:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[18:04:49] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is OK: (C)60 le (W)70 le 82.47 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[18:06:28] <wikibugs>	 (03CR) 10Dzahn: Parsoid/PHP: Load the extension on all Parsoid nodes (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544878 (https://phabricator.wikimedia.org/T235898) (owner: 10Mobrovac)
[18:06:50] <wikibugs>	 (03PS2) 10Dzahn: Add Mon (mnw) language [dns] - 10https://gerrit.wikimedia.org/r/544325 (https://phabricator.wikimedia.org/T235739) (owner: 10Jon Harald Søby)
[18:07:26] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "approved by langcom - https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Mon" [dns] - 10https://gerrit.wikimedia.org/r/544325 (https://phabricator.wikimedia.org/T235739) (owner: 10Jon Harald Søby)
[18:08:17] <wikibugs>	 10Operations, 10observability, 10serviceops, 10Performance-Team (Radar): Messages in Logstash from php-fatal-error.php are missing from type:mediawiki/channel:fatal - https://phabricator.wikimedia.org/T234283 (10Krinkle)
[18:09:29] <mutante>	 !log DNS - added new Wikipedia language "mnw" (Mon) T235739 - a language spoken in Myanmar
[18:09:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:09:33] <stashbot>	 T235739: Create Mon Wikipedia - https://phabricator.wikimedia.org/T235739
[18:15:47] <wikibugs>	 (03PS1) 10Paladox: Revert "Revert "gerrit: enable jgit gc"" [puppet] - 10https://gerrit.wikimedia.org/r/545367
[18:16:02] <wikibugs>	 (03CR) 10Paladox: [C: 04-1] "Needs Thcipriani say so (and +1)" [puppet] - 10https://gerrit.wikimedia.org/r/545367 (owner: 10Paladox)
[18:24:51] <wikibugs>	 (03CR) 10Subramanya Sastry: [C: 03+1] Parsoid/PHP: Load the extension on all Parsoid nodes (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544878 (https://phabricator.wikimedia.org/T235898) (owner: 10Mobrovac)
[18:28:30] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn)
[18:34:29] <wikibugs>	 10Operations, 10serviceops, 10Patch-For-Review: decom cobalt - https://phabricator.wikimedia.org/T236187 (10Dzahn)
[18:34:36] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn)
[18:37:17] <wikibugs>	 (03PS1) 10Ppchelko: Varnish: don't decode/encode slashes for core REST API paths. [puppet] - 10https://gerrit.wikimedia.org/r/545369 (https://phabricator.wikimedia.org/T235779)
[18:48:20] <wikibugs>	 (03CR) 10Dzahn: [C: 04-1] "per IRC, we should make a parameter instead to include the migration/rsync stuff or not" [puppet] - 10https://gerrit.wikimedia.org/r/545342 (https://phabricator.wikimedia.org/T222391) (owner: 10Dzahn)
[18:48:45] <icinga-wm>	 PROBLEM - Check the Netbox report librenms for fail status. on netbox1001 is CRITICAL: librenms.LibreNMS CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports
[18:49:21] <wikibugs>	 (03Abandoned) 10Dzahn: Revert "gerrit::migration: switch master to gerrit1001" [puppet] - 10https://gerrit.wikimedia.org/r/545084 (owner: 10Dzahn)
[18:59:15] <wikibugs>	 (03CR) 10Mobrovac: "A couple of minors, otherwise lgtm" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/545369 (https://phabricator.wikimedia.org/T235779) (owner: 10Ppchelko)
[18:59:39] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/18990/" [puppet] - 10https://gerrit.wikimedia.org/r/545066 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox)
[19:00:05] <jouncebot>	 brennen: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Mediawiki train - American Version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T1900).
[19:00:45] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: update puppet for new PDU models - https://phabricator.wikimedia.org/T233129 (10RobH) 05Open→03Resolved Please note this is now a checkbox on all PDU upgrade tasks, so I'm resolving this task.
[19:00:47] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: Install new PDUs in rows A/B (Top level tracking task) - https://phabricator.wikimedia.org/T226778 (10RobH)
[19:03:10] <wikibugs>	 (03PS2) 10Ppchelko: Varnish: don't decode/encode slashes for core REST API paths. [puppet] - 10https://gerrit.wikimedia.org/r/545369 (https://phabricator.wikimedia.org/T235779)
[19:03:33] <brennen>	 !log proceeding with train for 1.35.0-wmf.3
[19:03:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:03:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Varnish: don't decode/encode slashes for core REST API paths. [puppet] - 10https://gerrit.wikimedia.org/r/545369 (https://phabricator.wikimedia.org/T235779) (owner: 10Ppchelko)
[19:05:00] <wikibugs>	 (03PS3) 10Ppchelko: Varnish: don't decode/encode slashes for core REST API paths [puppet] - 10https://gerrit.wikimedia.org/r/545369 (https://phabricator.wikimedia.org/T235779)
[19:05:25] <wikibugs>	 (03CR) 10Brennen Bearnes: [C: 03+2] Group0 to 1.35.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545281 (owner: 10Lars Wirzenius)
[19:05:42] <wikibugs>	 (03CR) 10Ppchelko: Varnish: don't decode/encode slashes for core REST API paths (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/545369 (https://phabricator.wikimedia.org/T235779) (owner: 10Ppchelko)
[19:06:14] <wikibugs>	 (03Merged) 10jenkins-bot: Group0 to 1.35.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545281 (owner: 10Lars Wirzenius)
[19:06:32] <wikibugs>	 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Dzahn) ^ The reason to merge this was not a comment on the general question to enable avatars. The reason was that during T222391 we noticed an undesirable dependency. During a Ger...
[19:07:51] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn)
[19:09:44] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Dzahn) This is mostly done and all boxes are checked.  Though only really closing it after:  T236114  is r...
[19:10:58] <wikibugs>	 (03PS2) 10Jhedden: openstack: patch python-designateclient header values [puppet] - 10https://gerrit.wikimedia.org/r/545072 (https://phabricator.wikimedia.org/T235863)
[19:11:25] <wikibugs>	 (03CR) 10Jhedden: openstack: patch python-designateclient header values (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/545072 (https://phabricator.wikimedia.org/T235863) (owner: 10Jhedden)
[19:13:32] <wikibugs>	 (03CR) 10Jhedden: [C: 03+2] openstack: patch python-designateclient header values [puppet] - 10https://gerrit.wikimedia.org/r/545072 (https://phabricator.wikimedia.org/T235863) (owner: 10Jhedden)
[19:16:34] <wikibugs>	 (03CR) 10Mobrovac: [C: 03+1] "Applied in beta already, works." [puppet] - 10https://gerrit.wikimedia.org/r/545369 (https://phabricator.wikimedia.org/T235779) (owner: 10Ppchelko)
[19:22:52] <wikibugs>	 (03PS1) 10Dzahn: gerrit: increase heap_size from 20G to 32G [puppet] - 10https://gerrit.wikimedia.org/r/545381 (https://phabricator.wikimedia.org/T225166)
[19:23:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] gerrit: increase heap_size from 20G to 32G [puppet] - 10https://gerrit.wikimedia.org/r/545381 (https://phabricator.wikimedia.org/T225166) (owner: 10Dzahn)
[19:24:07] <wikibugs>	 (03CR) 10Bstorm: host monitoring: add optional contact group for mgmt interfaces (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/543916 (https://phabricator.wikimedia.org/T223458) (owner: 10Bstorm)
[19:25:41] <icinga-wm>	 PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[19:27:00] <wikibugs>	 (03CR) 10Paladox: [C: 03+1] "Apart from jenkins error, +1" [puppet] - 10https://gerrit.wikimedia.org/r/545381 (https://phabricator.wikimedia.org/T225166) (owner: 10Dzahn)
[19:27:22] <logmsgbot>	 !log brennen@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.3
[19:27:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:31:39] <icinga-wm>	 PROBLEM - SSH druid1004.mgmt on druid1004.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[19:38:30] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC) - https://phabricator.wikimedia.org/T227142 (10Cmjohnson)
[19:40:22] <wikibugs>	 (03PS2) 10Dzahn: gerrit: increase heap_size from 20G to 32G [puppet] - 10https://gerrit.wikimedia.org/r/545381 (https://phabricator.wikimedia.org/T225166)
[19:41:39] <icinga-wm>	 RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1
[19:42:48] <hashar>	 !log gerrit1001: apt install colordiff # T236114
[19:42:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:42:53] <stashbot>	 T236114: check and fix some Gerrit revs - https://phabricator.wikimedia.org/T236114
[19:43:18] <wikibugs>	 (03PS3) 10Dzahn: gerrit: increase heap_size from 20G to 32G [puppet] - 10https://gerrit.wikimedia.org/r/545381 (https://phabricator.wikimedia.org/T225166)
[19:43:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] gerrit: increase heap_size from 20G to 32G [puppet] - 10https://gerrit.wikimedia.org/r/545381 (https://phabricator.wikimedia.org/T225166) (owner: 10Dzahn)
[19:44:51] <wikibugs>	 (03PS4) 10Paladox: gerrit: increase heap_size from 20G to 32G [puppet] - 10https://gerrit.wikimedia.org/r/545381 (https://phabricator.wikimedia.org/T225166) (owner: 10Dzahn)
[19:45:25] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] gerrit: increase heap_size from 20G to 32G [puppet] - 10https://gerrit.wikimedia.org/r/545381 (https://phabricator.wikimedia.org/T225166) (owner: 10Dzahn)
[19:45:33] <wikibugs>	 (03PS1) 10Hashar: gerrit: add colordiff package [puppet] - 10https://gerrit.wikimedia.org/r/545384 (https://phabricator.wikimedia.org/T236114)
[19:46:07] <wikibugs>	 (03PS5) 10Paladox: gerrit: increase heap_size from 20G to 32G [puppet] - 10https://gerrit.wikimedia.org/r/545381 (https://phabricator.wikimedia.org/T225166) (owner: 10Dzahn)
[19:46:41] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] gerrit: add colordiff package [puppet] - 10https://gerrit.wikimedia.org/r/545384 (https://phabricator.wikimedia.org/T236114) (owner: 10Hashar)
[19:46:51] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] host monitoring: add optional contact group for mgmt interfaces [puppet] - 10https://gerrit.wikimedia.org/r/543916 (https://phabricator.wikimedia.org/T223458) (owner: 10Bstorm)
[19:51:00] <hashar>	 mutante: thanks :)
[19:51:26] <wikibugs>	 (03CR) 10BPirkle: rename service definition (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544199 (https://phabricator.wikimedia.org/T222851) (owner: 10Eevans)
[19:51:26] <mutante>	 yw hashar, thanks for hard work on fixes
[19:54:18] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10Data-Services, and 2 others: Decommission labstore100[123] and their disk shelves - https://phabricator.wikimedia.org/T187456 (10RobH) irc update with john:  These are going to take WEEKS to wipe, and are all old hdd.  Rather than tie up that much onsite time swapping...
[19:56:36] <wikibugs>	 (03PS1) 10BBlack: geodns: eqiad non-primary for all public users [dns] - 10https://gerrit.wikimedia.org/r/545385 (https://phabricator.wikimedia.org/T235805)
[20:01:03] <wikibugs>	 (03CR) 10CRusnov: [C: 03+2] coherence: Check unracked devices for connected console ports [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/545132 (owner: 10CRusnov)
[20:03:25] <wikibugs>	 (03PS1) 10Bstorm: monitoring: set wmcs servers to email when mgmt interfaces fail [puppet] - 10https://gerrit.wikimedia.org/r/545386 (https://phabricator.wikimedia.org/T223458)
[20:05:21] <wikibugs>	 10Operations, 10Traffic: Elevated 502s observed in ulsfo - https://phabricator.wikimedia.org/T236130 (10colewhite) Of interest: all have user agent FortiGate (FortiOS 5.0) and [[ https://logstash.wikimedia.org/goto/3fa7d259cc2043eb0b56a6ae5e89298f | have appeared near simultaneously from a number of sources gl...
[20:05:41] <wikibugs>	 10Operations, 10Traffic: Elevated 502s observed in ulsfo - https://phabricator.wikimedia.org/T236130 (10colewhite) p:05Triage→03Normal
[20:07:10] <wikibugs>	 10Operations, 10MediaWiki-Maintenance-scripts, 10cloud-services-team (Kanban): processEchoEmailBatch.php failing for labtestwiki - https://phabricator.wikimedia.org/T236145 (10colewhite) p:05Triage→03Normal
[20:09:01] <wikibugs>	 10Operations, 10Gerrit: Editing in Gerrit isn't saved after the update/migration to gerrit1001 - https://phabricator.wikimedia.org/T236143 (10colewhite) p:05Triage→03Normal
[20:09:34] <mutante>	 !log gerrit1001 - mkdir /srv/gerrit/cobalt/git - rsyncing /srv/gerrit/git from cobalt to /srv/gerrit/cobalt/git/ on gerrit1001 (T236114)
[20:09:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:09:39] <stashbot>	 T236114: check and fix some Gerrit revs - https://phabricator.wikimedia.org/T236114
[20:09:55] <wikibugs>	 10Operations, 10Traffic: interface-rps.py should have a flag to avoid CPU0 - https://phabricator.wikimedia.org/T236208 (10BBlack) p:05Triage→03Normal
[20:18:46] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10wikimediafoundation.org: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10EdErhart-WMF)
[20:22:17] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10wikimediafoundation.org: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10Nuria) Is  CherRaye Glenn a contractor? If so when does the contract expire?
[20:26:02] <wikibugs>	 (03PS1) 10Dzahn: admins: add shell account for Lex Nasser [puppet] - 10https://gerrit.wikimedia.org/r/545388 (https://phabricator.wikimedia.org/T235688)
[20:29:52] <wikibugs>	 (03PS2) 10Dzahn: admins: add shell account for Lex Nasser [puppet] - 10https://gerrit.wikimedia.org/r/545388 (https://phabricator.wikimedia.org/T235688)
[20:31:50] <icinga-wm>	 RECOVERY - SSH druid1004.mgmt on druid1004.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook
[20:49:18] <wikibugs>	 (03CR) 10Mathew.onipe: query_service: prepare query_service for reusbility (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[20:51:15] <wikibugs>	 (03PS2) 10Bstorm: wiki replicas: Add the labsdb1012 replica to maintain_dbusers [puppet] - 10https://gerrit.wikimedia.org/r/543924 (https://phabricator.wikimedia.org/T235791)
[20:58:50] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[20:59:08] <wikibugs>	 (03PS2) 10Bstorm: monitoring: set wmcs servers to email when mgmt interfaces fail [puppet] - 10https://gerrit.wikimedia.org/r/545386 (https://phabricator.wikimedia.org/T223458)
[21:00:04] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops
[21:03:12] <wikibugs>	 (03CR) 10Catrope: [C: 03+2] Set GrowthExperiments task suggester config on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545155 (https://phabricator.wikimedia.org/T234426) (owner: 10Gergő Tisza)
[21:03:57] <wikibugs>	 (03Merged) 10jenkins-bot: Set GrowthExperiments task suggester config on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545155 (https://phabricator.wikimedia.org/T234426) (owner: 10Gergő Tisza)
[21:04:18] <wikibugs>	 (03CR) 10Bstorm: [C: 03+2] wiki replicas: Add the labsdb1012 replica to maintain_dbusers [puppet] - 10https://gerrit.wikimedia.org/r/543924 (https://phabricator.wikimedia.org/T235791) (owner: 10Bstorm)
[21:06:18] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10wikimediafoundation.org: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10Varnent) >>! In T236209#5596762, @Nuria wrote: > Is  CherRaye Glenn a contractor? If so when does the contract expire?  Ch...
[21:06:52] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10wikimediafoundation.org: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10Nuria)
[21:08:45] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10wikimediafoundation.org: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10Nuria) Then she should be added to wmf LDAP group  after @Heather's approval  ping @Dzahn which I...
[21:25:31] <wikibugs>	 (03PS19) 10Mathew.onipe: query_service: rename wdqs module to query_service [puppet] - 10https://gerrit.wikimedia.org/r/538572 (https://phabricator.wikimedia.org/T232297)
[21:25:33] <wikibugs>	 (03PS26) 10Mathew.onipe: query_service: prepare query_service for reusbility [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297)
[21:25:35] <wikibugs>	 (03PS23) 10Mathew.onipe: query_service: rename profile/wdqs to profile/query_service [puppet] - 10https://gerrit.wikimedia.org/r/538849 (https://phabricator.wikimedia.org/T232297)
[21:25:37] <wikibugs>	 (03PS18) 10Mathew.onipe: query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297)
[21:25:39] <wikibugs>	 (03PS18) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297)
[21:25:41] <wikibugs>	 (03PS18) 10Mathew.onipe: query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297)
[21:27:14] <wikibugs>	 (03PS3) 10Dzahn: admins: add shell account for Lex Nasser [puppet] - 10https://gerrit.wikimedia.org/r/545388 (https://phabricator.wikimedia.org/T235688)
[21:29:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[21:34:09] <wikibugs>	 10Operations, 10ops-esams: rack/setup/install ps[12]-oe1[456]-esams - https://phabricator.wikimedia.org/T184066 (10RobH) a:03RobH
[21:34:17] <wikibugs>	 10Operations, 10ops-esams: rack/setup/install ps[12]-oe1[456]-esams - https://phabricator.wikimedia.org/T184066 (10RobH)
[21:34:20] <wikibugs>	 10Operations, 10User-DannyS712: 503 Backend fetch failed - https://phabricator.wikimedia.org/T233271 (10MusikAnimal) There were several bursts of 503s over the past few weeks, the last was six days ago. But overall, yes, things have improved. I do realize 503s are super generic, it was just the frequency that...
[21:35:49] <wikibugs>	 (03CR) 10Papaul: [C: 03+1] admins: add shell account for Lex Nasser [puppet] - 10https://gerrit.wikimedia.org/r/545388 (https://phabricator.wikimedia.org/T235688) (owner: 10Dzahn)
[21:36:59] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admins: add shell account for Lex Nasser [puppet] - 10https://gerrit.wikimedia.org/r/545388 (https://phabricator.wikimedia.org/T235688) (owner: 10Dzahn)
[21:39:58] <wikibugs>	 (03PS19) 10Mathew.onipe: query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297)
[21:40:01] <wikibugs>	 (03PS19) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297)
[21:40:03] <wikibugs>	 (03PS19) 10Mathew.onipe: query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297)
[21:40:45] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Dzahn) @lexnasser Within max. 30 minutes this should work for you now. Please take a look at https://wikitech.wikimedia.org/wiki/Production_access...
[21:41:00] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC result is good" [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[21:41:09] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Dzahn) 05Open→03Resolved If any unexpected issues please just reopen the ticket.
[21:41:33] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Nuria) +1 , also let's make sure to go over the Data guidelines before working with the data.
[21:45:21] <mutante>	 !log LDAP - added lexnasser to nda group (T235688)
[21:45:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:45:25] <stashbot>	 T235688: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688
[21:46:29] <wikibugs>	 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: SSH access for Lex Nasser, analytics intern - https://phabricator.wikimedia.org/T235688 (10Dzahn) >>! In T235688#5587345, @Nuria wrote: > And also we need to add lex to nda group for access to turnilo and superset   Done!  @lexnasser You...
[21:48:04] <wikibugs>	 10Operations, 10User-DannyS712: 503 Backend fetch failed - https://phabricator.wikimedia.org/T233271 (10sbassett) 05Open→03Resolved a:03MusikAnimal >>! In T233271#5596898, @MusikAnimal wrote: > But overall, yes, things have improved. I do realize 503s are super generic, it was just the frequency that rai...
[21:49:15] <wikibugs>	 (03PS24) 10Mathew.onipe: query_service: rename profile/wdqs to profile/query_service [puppet] - 10https://gerrit.wikimedia.org/r/538849 (https://phabricator.wikimedia.org/T232297)
[21:49:17] <wikibugs>	 (03PS20) 10Mathew.onipe: query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297)
[21:49:19] <wikibugs>	 (03PS20) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297)
[21:49:21] <wikibugs>	 (03PS20) 10Mathew.onipe: query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297)
[21:49:27] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10wikimediafoundation.org: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10Dzahn) Actually that is @colewhite this week but we are on it.
[21:50:12] <wikibugs>	 10Operations, 10ops-esams: rack/setup/install ps[12]-oe1[456]-esams - https://phabricator.wikimedia.org/T184066 (10RobH)
[21:51:59] <wikibugs>	 10Operations, 10ops-esams: rack/setup/install ps[12]-oe1[456]-esams - https://phabricator.wikimedia.org/T184066 (10RobH)
[21:52:36] <thcipriani>	 jouncebot: now
[21:52:37] <jouncebot>	 No deployments scheduled for the next 1 hour(s) and 7 minute(s)
[21:52:41] <thcipriani>	 oh good.
[21:53:54] <wikibugs>	 10Operations, 10ops-esams: rack/setup/install ps[12]-oe1[456]-esams - https://phabricator.wikimedia.org/T184066 (10RobH)
[21:56:13] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10WMF-Legal: Requesting access to view EventLogging data for Co_WMDE - https://phabricator.wikimedia.org/T234429 (10Dzahn) a:05RStallman-legalteam→03colewhite
[21:56:38] <wikibugs>	 (03CR) 10Mathew.onipe: "> Patch Set 26:" [puppet] - 10https://gerrit.wikimedia.org/r/537138 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[21:57:34] <thcipriani>	 !log stopping gerrit to run ref-update script
[21:57:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:57:46] <thcipriani>	 !log stopping gerrit to run ref-update script T236114
[21:57:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:57:50] <stashbot>	 T236114: check and fix some Gerrit revs - https://phabricator.wikimedia.org/T236114
[21:59:00] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10wikimediafoundation.org: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10Heather) Approved. Thanks, everyone!
[21:59:11] <onimisionipe>	 thcipriani: you stopped gerrit! arggggh :)
[21:59:33] <robh>	 oh whew
[21:59:35] <robh>	 i was like wtfffff
[21:59:50] <robh>	 'how did i bork my git now, it was working a second ago'
[21:59:55] <thcipriani>	 :)
[21:59:59] <thcipriani>	 sorry folks
[22:00:10] <thcipriani>	 should be back
[22:00:13] <robh>	 no worries
[22:00:18] <robh>	 im just happy it wasnt me.
[22:00:31] <onimisionipe>	 no p ;)
[22:00:32] <wikibugs>	 (03PS1) 10RobH: adding new pdus to esams mgmt [dns] - 10https://gerrit.wikimedia.org/r/545406
[22:01:45] <wikibugs>	 (03CR) 10RobH: [C: 03+2] adding new pdus to esams mgmt [dns] - 10https://gerrit.wikimedia.org/r/545406 (owner: 10RobH)
[22:02:28] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC is Ok: https://puppet-compiler.wmflabs.org/compiler1002/19001/" [puppet] - 10https://gerrit.wikimedia.org/r/538849 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[22:03:22] <robh>	 hrmm
[22:03:39] <robh>	 linking into tasks automatically via patchset doesnt seem to be happening (or is doing so slowly)
[22:03:52] <robh>	 ie: my new dns patch shows the bug in gerrit, but didnt update on the phab task
[22:04:33] <wikibugs>	 (03PS1) 10Cwhite: admin: add Nikki to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/545407 (https://phabricator.wikimedia.org/T235136)
[22:05:54] <bd808>	 robh: hmmm... does wikibugs make those links? It might not have liked the restart of gerrit if so.
[22:06:00] * bd808 goes to figure that out
[22:06:55] <bd808>	 nope. that's done by https://wikitech.wikimedia.org/wiki/Gerrit_Notification_Bot which is a gerrit plugin apparently
[22:07:16] <mutante>	 robh: missing :
[22:07:25] <mutante>	 Bug: T...
[22:08:07] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10WMF-Legal: Requesting access to view EventLogging data for Co_WMDE - https://phabricator.wikimedia.org/T234429 (10colewhite)
[22:08:16] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] admin: add Nikki to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/545407 (https://phabricator.wikimedia.org/T235136) (owner: 10Cwhite)
[22:15:49] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] admin: add Nikki to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/545407 (https://phabricator.wikimedia.org/T235136) (owner: 10Cwhite)
[22:21:29] <wikibugs>	 (03PS2) 10Dzahn: DNS: Remove production and mgmt DNS for frav1001 [dns] - 10https://gerrit.wikimedia.org/r/544279 (https://phabricator.wikimedia.org/T222109) (owner: 10Papaul)
[22:21:37] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2060.codfw.wmnet - https://phabricator.wikimedia.org/T231625 (10Papaul) ` papaul@asw-d-codfw# show | compare  [edit interfaces interface-range vlan-private1-d-codfw] -    member ge-6/0/8; [edit interfaces interface-range disabled]      mem...
[22:21:42] <wikibugs>	 (03PS1) 10Cwhite: admin: add cohi to researchers and analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/545409 (https://phabricator.wikimedia.org/T234429)
[22:22:29] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10WMF-Legal, 10Patch-For-Review: Requesting access to view EventLogging data for Co_WMDE - https://phabricator.wikimedia.org/T234429 (10colewhite)
[22:22:31] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: Decommission db2060.codfw.wmnet - https://phabricator.wikimedia.org/T231625 (10Papaul)
[22:23:26] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP membership for new employee Nikki Nikkhoui - https://phabricator.wikimedia.org/T235136 (10colewhite) Hi Nikki!  I've deployed the necessary changes and added you to the wmf group.  Please let me know if you encounter any related issue.
[22:23:36] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP membership for new employee Nikki Nikkhoui - https://phabricator.wikimedia.org/T235136 (10colewhite) 05Open→03Resolved
[22:23:40] <wikibugs>	 (03CR) 10Dzahn: "production DNS entries are not removed yet it looks" [dns] - 10https://gerrit.wikimedia.org/r/544279 (https://phabricator.wikimedia.org/T222109) (owner: 10Papaul)
[22:25:13] <wikibugs>	 (03PS2) 10Cwhite: admin: add cohi to researchers and analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/545409 (https://phabricator.wikimedia.org/T234429)
[22:25:36] <robh>	 ha
[22:25:38] <robh>	 damn
[22:27:13] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10wikimediafoundation.org: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10colewhite) a:03colewhite
[22:27:14] <robh>	 id have thought the ci would have failed it for that
[22:27:21] <robh>	 rather than allow and plugin fail
[22:33:29] <wikibugs>	 (03PS1) 10Cwhite: admin: add keepit-ssh (CherRaye Glenn) to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/545410 (https://phabricator.wikimedia.org/T236209)
[22:34:12] <wikibugs>	 10Operations, 10ops-esams: rack/setup/install ganeti300[123] - https://phabricator.wikimedia.org/T236216 (10RobH) p:05Triage→03Normal
[22:34:20] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10SRE-Access-Requests, and 2 others: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10colewhite) p:05Triage→03Normal
[22:34:24] <wikibugs>	 10Operations, 10ops-esams: rack/setup/install ganeti300[123] - https://phabricator.wikimedia.org/T236216 (10RobH)
[22:36:41] <wikibugs>	 10Operations, 10ops-esams, 10DNS, 10Traffic: rack/setup/install dns300[123] - https://phabricator.wikimedia.org/T236217 (10RobH) p:05Triage→03Normal
[22:37:05] <wikibugs>	 10Operations, 10ops-esams, 10DNS, 10Traffic: rack/setup/install dns300[123] - https://phabricator.wikimedia.org/T236217 (10RobH)
[22:38:11] <wikibugs>	 (03PS7) 10Jforrester: Variant configuration: Allow for YAML-based inheritance of configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/538129 (https://phabricator.wikimedia.org/T223602)
[22:38:21] <bd808>	 robh: there is a commit message test that can be added to any gerrit repo, but there are not many repos that have opted-in to using it.
[22:38:26] <wikibugs>	 (03PS20) 10Jforrester: Variant configuration: Pre-calculate config for each wiki and store it in config.git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507729 (https://phabricator.wikimedia.org/T223602)
[22:38:28] <wikibugs>	 (03PS1) 10Jforrester: Variant configuration: Generate dblists from YAML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/545411 (https://phabricator.wikimedia.org/T223602)
[22:39:34] <wikibugs>	 (03CR) 10Cwhite: [C: 03+1] logstash: remove deprecated elasticsearch options [puppet] - 10https://gerrit.wikimedia.org/r/545236 (https://phabricator.wikimedia.org/T235891) (owner: 10Filippo Giunchedi)
[22:39:38] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Variant configuration: Pre-calculate config for each wiki and store it in config.git [mediawiki-config] - 10https://gerrit.wikimedia.org/r/507729 (https://phabricator.wikimedia.org/T223602) (owner: 10Jforrester)
[22:39:59] <wikibugs>	 (03Abandoned) 10Jforrester: Variant configuration: Move some dblist configuration into YAML [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539414 (owner: 10Jforrester)
[22:43:23] <wikibugs>	 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10thcipriani)
[22:47:04] <wikibugs>	 (03PS21) 10Mathew.onipe: query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297)
[22:47:06] <wikibugs>	 (03PS21) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297)
[22:47:08] <wikibugs>	 (03PS21) 10Mathew.onipe: query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297)
[22:49:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[22:50:30] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[22:58:05] <wikibugs>	 (03PS18) 10Andrew Bogott: labtestwiki: move to a wmcs-hosted database on clouddb2001-dev [mediawiki-config] - 10https://gerrit.wikimedia.org/r/543664 (https://phabricator.wikimedia.org/T233236)
[23:00:04] <jouncebot>	 MaxSem, RoanKattouw, Niharika, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191022T2300).
[23:00:04] <jouncebot>	 andrewbogott and Dbarratt: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:00:12] <davidwbarratt>	 here!
[23:00:17] <andrewbogott>	 me too!
[23:00:41] <wikibugs>	 (03PS22) 10Mathew.onipe: query_service: separate categories from main blazegraph profile [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297)
[23:00:43] <wikibugs>	 (03PS22) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297)
[23:00:45] <wikibugs>	 (03PS22) 10Mathew.onipe: query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297)
[23:01:55] <andrewbogott>	 do we have a deployer?
[23:03:33] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[23:05:26] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC is Ok: https://puppet-compiler.wmflabs.org/compiler1001/19004/" [puppet] - 10https://gerrit.wikimedia.org/r/539285 (https://phabricator.wikimedia.org/T232297) (owner: 10Mathew.onipe)
[23:06:11] <wikibugs>	 (03PS1) 10Paladox: Update scap targets [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/545416
[23:06:23] <wikibugs>	 (03PS2) 10Paladox: Update scap targets [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/545416
[23:06:36] <davidwbarratt>	 ping 	MaxSem, RoanKattouw, Niharika, and Urbanecm
[23:08:16] <wikibugs>	 (03PS23) 10Mathew.onipe: query_service: properly adapt query_service profile [puppet] - 10https://gerrit.wikimedia.org/r/539513 (https://phabricator.wikimedia.org/T232297)
[23:08:17] <wikibugs>	 (03PS23) 10Mathew.onipe: query_service: properly adapt hiera configs [puppet] - 10https://gerrit.wikimedia.org/r/539998 (https://phabricator.wikimedia.org/T232297)
[23:14:14] <davidwbarratt>	 andrewbogott I guess not. :(
[23:14:30] <andrewbogott>	 MaxSem, RoanKattouw, Niharika, Urbanecm, I'm going to step away but please ping me here if one of you appears and I'll rush back to my keyboard.
[23:14:57] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] DNS: Remove production and mgmt DNS for frav1001 [dns] - 10https://gerrit.wikimedia.org/r/544279 (https://phabricator.wikimedia.org/T222109) (owner: 10Papaul)
[23:15:02] <icinga-wm>	 PROBLEM - Host ps1-a6-eqiad is DOWN: PING CRITICAL - Packet loss = 100%
[23:16:32] <mooeypoo>	 Is anyone doing SWAT?
[23:17:27] <andrewbogott>	 mooeypoo: seems not
[23:17:39] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] admin: add cohi to researchers and analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/545409 (https://phabricator.wikimedia.org/T234429) (owner: 10Cwhite)
[23:19:32] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "looks good, but pending approval by Heather" [puppet] - 10https://gerrit.wikimedia.org/r/545410 (https://phabricator.wikimedia.org/T236209) (owner: 10Cwhite)
[23:20:16] <mooeypoo>	 pretty please @RoanKattouw / @MaxSem ...? either of you available for SWAT ?
[23:20:58] <MaxSem>	 Sorry, was in a meeting
[23:21:08] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "approval is there, ready to go" [puppet] - 10https://gerrit.wikimedia.org/r/545410 (https://phabricator.wikimedia.org/T236209) (owner: 10Cwhite)
[23:21:09] <mooeypoo>	 (with... me.... :D )
[23:21:33] <MaxSem>	 andrewbogott & davidwbarratt, yt?
[23:21:39] <davidwbarratt>	 MaxSem yep!
[23:22:03] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Lengthy delays in emails being recieved from mailing lists - https://phabricator.wikimedia.org/T235983 (10colewhite) I've been monitoring this the past couple days.  Since yesterday we've gone from over 20k messages in the queue to less than 6k.  The backlog s...
[23:22:45] <andrewbogott>	 MaxSem: I'm here!
[23:23:20] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] admin: add cohi to researchers and analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/545409 (https://phabricator.wikimedia.org/T234429) (owner: 10Cwhite)
[23:23:28] <wikibugs>	 (03PS3) 10Cwhite: admin: add cohi to researchers and analytics-users [puppet] - 10https://gerrit.wikimedia.org/r/545409 (https://phabricator.wikimedia.org/T234429)
[23:24:35] <wikibugs>	 (03PS7) 10Andrew Bogott: labtestwikitech: use the new codfw1-dev servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/543943 (https://phabricator.wikimedia.org/T229441)
[23:25:46] <wikibugs>	 (03CR) 10MaxSem: [C: 03+2] labtestwiki: move to a wmcs-hosted database on clouddb2001-dev [mediawiki-config] - 10https://gerrit.wikimedia.org/r/543664 (https://phabricator.wikimedia.org/T233236) (owner: 10Andrew Bogott)
[23:26:05] <MaxSem>	 Now I need to figure out how to deploy that change...
[23:26:30] <wikibugs>	 (03Merged) 10jenkins-bot: labtestwiki: move to a wmcs-hosted database on clouddb2001-dev [mediawiki-config] - 10https://gerrit.wikimedia.org/r/543664 (https://phabricator.wikimedia.org/T233236) (owner: 10Andrew Bogott)
[23:26:48] <andrewbogott>	  I'm not sure how to selectively deploy a rename
[23:27:04] <andrewbogott>	 maybe just do the after and then the before for cleanup
[23:28:05] <bd808>	 MaxSem: for andrewbogott's changes, the main thing is to make sure they don't break "real" wikis. So I think staging on mwdebugXXXX and testing there that say enwiki + mw.o work should be sufficient. We can deal with actually fully testing testlabswiki separately later.
[23:28:38] <andrewbogott>	 yeah, we definitely don't need to worry about breaking testlabswiki, I'm pretty much the only one who ever looks at it
[23:29:56] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admin: add keepit-ssh (CherRaye Glenn) to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/545410 (https://phabricator.wikimedia.org/T236209) (owner: 10Cwhite)
[23:30:16] <wikibugs>	 (03PS2) 10Dzahn: admin: add keepit-ssh (CherRaye Glenn) to ldap_only_users [puppet] - 10https://gerrit.wikimedia.org/r/545410 (https://phabricator.wikimedia.org/T236209) (owner: 10Cwhite)
[23:30:19] <MaxSem>	 andrewbogott: pulled on mwdebug1002, please test
[23:31:06] <andrewbogott>	 ok!  Um… what url do I use to hit that host?
[23:31:31] <andrewbogott>	 bd808: ^ ?
[23:31:31] <MaxSem>	 Use the Wikimedia-Debug browser extension
[23:31:49] <bd808>	 MaxSem: I just tested mw.o reads and edits on mwdebug1002 and they look good.
[23:31:58] <andrewbogott>	 thank you :)
[23:32:52] <mutante>	 !log LDAP - added keepit-ssh to wmf group (T236209)
[23:32:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:32:57] <stashbot>	 T236209: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209
[23:33:05] <bd808>	 MaxSem: enwiki too. So if you didn't see any spurt of soft errors on the backend should be good to go
[23:33:47] <bd808>	 andrewbogott: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug is the magic to be able to test things on the mwdebugXXXX hosts
[23:34:58] * andrewbogott installs the extension for future reference
[23:36:16] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10SRE-Access-Requests, and 2 others: WikimediaFoundation.org analytics access for CherRaye Glenn - https://phabricator.wikimedia.org/T236209 (10Dzahn) 05Open→03Resolved done. she has been added to the "wmf" group
[23:37:04] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests, and 2 others: Analytics Access for Grant  (groups cn=wmf and analytics-privatedata-users) - https://phabricator.wikimedia.org/T235260 (10Dzahn)
[23:37:40] <andrewbogott>	 MaxSem: looks good to me too
[23:38:03] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests, and 2 others: Analytics Access for Grant  (groups cn=wmf and analytics-privatedata-users) - https://phabricator.wikimedia.org/T235260 (10Dzahn) a:05herron→03colewhite L3 has been signed. This is unblocked.
[23:38:45] <logmsgbot>	 !log maxsem@deploy1001 Synchronized dblists/labtestwiki.dblist: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 02s)
[23:38:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:40:30] <MaxSem>	 Okay, now we either explode or win
[23:41:37] <logmsgbot>	 !log maxsem@deploy1001 Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 01m 01s)
[23:41:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:41:52] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Scoring-platform-team: Grant LDAP groups and deployment shell access to Kevin Bazira - https://phabricator.wikimedia.org/T234209 (10Dzahn)
[23:43:11] <logmsgbot>	 !log maxsem@deploy1001 Synchronized dblists/: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543664/ (duration: 00m 59s)
[23:43:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:46:17] <wikibugs>	 (03CR) 10MaxSem: [C: 03+2] labtestwikitech: use the new codfw1-dev servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/543943 (https://phabricator.wikimedia.org/T229441) (owner: 10Andrew Bogott)
[23:46:58] <wikibugs>	 (03Merged) 10jenkins-bot: labtestwikitech: use the new codfw1-dev servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/543943 (https://phabricator.wikimedia.org/T229441) (owner: 10Andrew Bogott)
[23:48:24] <MaxSem>	 andrewbogott and davidwbarratt, your changes are staged on mwdebug1002
[23:48:34] <davidwbarratt>	 MaxSem thanks!
[23:49:49] <andrewbogott>	 In my case there's not much to test since the second patch only affects wikitech-style wikis.
[23:49:57] <andrewbogott>	 (which renders it fairly harmless as well)
[23:50:10] <davidwbarratt>	 davidwbarratt nothing I can really test, but it doesn't appear to have broken anything. :)
[23:50:23] <davidwbarratt>	 MaxSem ^
[23:53:02] <logmsgbot>	 !log maxsem@deploy1001 Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/543943/ (duration: 01m 01s)
[23:53:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:53:28] <MaxSem>	 andrewbogott: please test ^
[23:55:04] <andrewbogott>	 MaxSem: lgtm.  Edited and logged out/in
[23:55:53] <andrewbogott>	 that's everything, right?
[23:57:06] <logmsgbot>	 !log maxsem@deploy1001 Synchronized php-1.35.0-wmf.3/includes/block/DatabaseBlock.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/545373/ (duration: 00m 59s)
[23:57:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:57:24] <MaxSem>	 davidwbarratt: please test ^
[23:57:31] <davidwbarratt>	 kk
[23:57:40] * andrewbogott -> the kitchen
[23:57:46] <andrewbogott>	 Thank you MaxSem!
[23:58:10] <davidwbarratt>	 MaxSem nothing appears to be broken!
[23:58:25] <davidwbarratt>	 MaxSem thanks!
[23:58:51] <wikibugs>	 (03PS1) 10Cwhite: admin: add Kevin Bazira to several groups [puppet] - 10https://gerrit.wikimedia.org/r/545418 (https://phabricator.wikimedia.org/T234209)