[00:00:04] twentyafterfour: #bothumor My software never has bugs. It just develops random features. Rise for Phabricator update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200402T0000). [00:09:01] (03PS13) 10Mstyles: kibana: move httpd proxy authentication to a separate profile [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) [00:09:03] 10Operations, 10ops-codfw, 10Traffic, 10decommission: decommission cp2001.codfw.wmnet - https://phabricator.wikimedia.org/T248815 (10Papaul) [00:09:24] 10Operations, 10ops-codfw, 10Traffic, 10decommission: decommission cp2002.codfw.wmnet - https://phabricator.wikimedia.org/T248818 (10Papaul) [00:09:57] 10Operations, 10ops-codfw, 10Traffic, 10decommission: decommission cp2004.codfw.wmnet - https://phabricator.wikimedia.org/T248824 (10Papaul) [00:10:20] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [00:10:24] 10Operations, 10ops-codfw, 10Traffic, 10decommission: decommission cp2005.codfw.wmnet - https://phabricator.wikimedia.org/T248848 (10Papaul) [00:10:54] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [00:11:51] 10Operations, 10ops-codfw, 10DC-Ops, 10Traffic, 10decommission: decommission cp2006.codfw.wmnet - https://phabricator.wikimedia.org/T248856 (10Papaul) [00:12:04] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [00:19:28] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [00:20:39] (03PS1) 10ArielGlenn: add test of get_prefetch_arg requiring multiple prefetch files [dumps] - 10https://gerrit.wikimedia.org/r/585355 (https://phabricator.wikimedia.org/T249131) [00:23:56] (03CR) 10Mstyles: "things seem good! https://puppet-compiler.wmflabs.org/compiler1003/21662/" [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [01:21:54] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [01:31:44] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [01:48:37] (03PS2) 10Krinkle: [WIP] logging: Remove useMicrosecondTimestamps(false) calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/580096 (https://phabricator.wikimedia.org/T116550) [02:23:18] PROBLEM - PHP opcache health on mw2164 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [02:25:06] RECOVERY - PHP opcache health on mw2164 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [02:41:16] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [02:42:28] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [03:49:56] PROBLEM - PHP opcache health on mw2272 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [03:55:28] RECOVERY - PHP opcache health on mw2272 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [04:31:06] PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [04:31:40] PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [04:49:19] (03CR) 10Vgutierrez: [C: 03+2] admin: Change my SSH key [puppet] - 10https://gerrit.wikimedia.org/r/584187 (owner: 10Ladsgroup) [04:49:42] (03PS2) 10Vgutierrez: admin: Change my SSH key [puppet] - 10https://gerrit.wikimedia.org/r/584187 (owner: 10Ladsgroup) [05:14:47] (03PS1) 10Vgutierrez: tftp,bastionhost: Add missing ensure_service hiera key [puppet] - 10https://gerrit.wikimedia.org/r/585370 (https://phabricator.wikimedia.org/T224576) [05:16:06] (03CR) 10jerkins-bot: [V: 04-1] tftp,bastionhost: Add missing ensure_service hiera key [puppet] - 10https://gerrit.wikimedia.org/r/585370 (https://phabricator.wikimedia.org/T224576) (owner: 10Vgutierrez) [05:16:47] (03PS1) 10Samwilson: Enable password-reset-update on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585371 (https://phabricator.wikimedia.org/T245791) [05:17:25] (03PS2) 10Vgutierrez: tftp,bastionhost: Add missing ensure_service hiera key [puppet] - 10https://gerrit.wikimedia.org/r/585370 (https://phabricator.wikimedia.org/T224576) [05:17:50] PROBLEM - Host analytics1045 is DOWN: PING CRITICAL - Packet loss = 100% [05:18:48] elukey: ^^ are you around already? [05:19:32] I am yes [05:19:37] will check in a second [05:19:44] it is a hadoop worker node so nothing on fire [05:21:17] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Remove "Cache-control: no-cache" hack from wmf-config - https://phabricator.wikimedia.org/T247783 (10Joe) [05:22:22] (03PS1) 10KartikMistry: apertium-oc-es: Fix FTBFS with apertium 3.6 [debs/contenttranslation/apertium-oc-es] - 10https://gerrit.wikimedia.org/r/585372 (https://phabricator.wikimedia.org/T247585) [05:24:35] (03CR) 10Vgutierrez: [C: 03+2] "pcc is happy: https://puppet-compiler.wmflabs.org/compiler1003/21664/" [puppet] - 10https://gerrit.wikimedia.org/r/585370 (https://phabricator.wikimedia.org/T224576) (owner: 10Vgutierrez) [05:29:40] !log powercycle analytics1045 (host not responsive to ssh, weird chars showed in mgmt serial console) [05:29:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:30:05] racadm getsel says nothing, weird [05:31:48] elukey: that sounds a lot like the cp crashes we had on esams [05:32:31] see https://phabricator.wikimedia.org/T238305 [05:34:27] and BTW, I think we didn't get any since we upgraded to buster [05:37:20] 10Operations, 10Traffic: Servers freezing across the caching cluster (November 2019) - https://phabricator.wikimedia.org/T238305 (10Vgutierrez) @faidon actually the cp hosts are running buster (T242093) since February 13th. I do believe we haven't seen more occurrences of this issue on the cache cluster since... [05:38:36] vgutierrez: I have seen recently some troubles with old analytics workers close to refresh, not the newer ones.. it might be age :D [05:39:40] sadly the path to buster this time is harder, we want to move to Apache BigTop first on stretch (but it involves a HDFS upgrade) then upgrade again to the new distro that supports buster (and another HDFS upgrade) [05:41:42] (but at the end we'll have another apache project instead of cloudera, plus a jump from hadoop 2.6 to 2.10) [05:44:13] (03CR) 10Elukey: [C: 03+1] "Nice work! Keith let me know if you like the code, and in case feel free to merge and rollout when you have time (I'd do it but I am not f" [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [05:44:34] (03CR) 10Elukey: [C: 03+1] "Cole: same thing for you :)" [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [05:46:10] RECOVERY - Host analytics1045 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [05:48:40] (03CR) 10Hashar: "That worked:" [puppet] - 10https://gerrit.wikimedia.org/r/579602 (owner: 10Thcipriani) [05:49:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10848 and previous config saved to /var/cache/conftool/dbconfig/20200402-054931-marostegui.json [05:49:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:49:58] !log Deploy schema change on db1101:3318 [05:50:00] PROBLEM - Host analytics1045 is DOWN: PING CRITICAL - Packet loss = 100% [05:50:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:50:05] (03CR) 10Hashar: [V: 03+1 C: 03+1] Integration Cluster: update gitcache nightly [puppet] - 10https://gerrit.wikimedia.org/r/579602 (owner: 10Thcipriani) [05:52:20] RECOVERY - Host analytics1045 is UP: PING OK - Packet loss = 0%, RTA = 0.21 ms [05:55:30] PROBLEM - Host analytics1045 is DOWN: PING CRITICAL - Packet loss = 100% [05:55:42] still me sorry, should resolve in a sec [05:56:48] RECOVERY - Check systemd state on analytics1045 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:56:50] RECOVERY - Host analytics1045 is UP: PING OK - Packet loss = 0%, RTA = 1.38 ms [05:57:42] but we also saw these random crashes on backup2001 which was running Buster/4.19, so it could also be that there were different crashes which simply appeared the same [05:58:41] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/577553 (https://phabricator.wikimedia.org/T245771) (owner: 10Jbond) [05:59:07] (03PS1) 10Elukey: profile::kerberos: fix typo in manage_principals.py [puppet] - 10https://gerrit.wikimedia.org/r/585373 (https://phabricator.wikimedia.org/T249103) [06:00:22] (03PS1) 10Marostegui: parsercache.pp: Prepare basedir for buster [puppet] - 10https://gerrit.wikimedia.org/r/585374 [06:03:28] (03CR) 10Muehlenhoff: [C: 03+1] profile::kerberos: fix typo in manage_principals.py [puppet] - 10https://gerrit.wikimedia.org/r/585373 (https://phabricator.wikimedia.org/T249103) (owner: 10Elukey) [06:03:54] (03PS2) 10Marostegui: parsercache.pp: Prepare basedir for buster [puppet] - 10https://gerrit.wikimedia.org/r/585374 [06:04:53] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/585374 (owner: 10Marostegui) [06:04:56] (03CR) 10Elukey: [C: 03+2] profile::kerberos: fix typo in manage_principals.py [puppet] - 10https://gerrit.wikimedia.org/r/585373 (https://phabricator.wikimedia.org/T249103) (owner: 10Elukey) [06:05:21] (03PS1) 10Ladsgroup: Change my SSH key again [puppet] - 10https://gerrit.wikimedia.org/r/585375 [06:06:28] RECOVERY - MegaRAID on analytics1045 is OK: OK: optimal, 12 logical, 13 physical, WriteBack policy https://wikitech.wikimedia.org/wiki/MegaCli%23Monitoring [06:06:58] (03CR) 10Marostegui: "NOOP as expected: https://puppet-compiler.wmflabs.org/compiler1002/21666/" [puppet] - 10https://gerrit.wikimedia.org/r/585374 (owner: 10Marostegui) [06:07:18] (03CR) 10Marostegui: [C: 03+2] parsercache.pp: Prepare basedir for buster [puppet] - 10https://gerrit.wikimedia.org/r/585374 (owner: 10Marostegui) [06:08:15] XioNoX: o/ - netflow4001 looks ok, if you are ok we could merge and restart everywhere :) [06:08:22] (whenever you have time) [06:08:46] (03CR) 10Vgutierrez: [C: 04-1] Change my SSH key again (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/585375 (owner: 10Ladsgroup) [06:11:41] (03PS2) 10Ladsgroup: admin: Change SSH key for Ladsgroup again [puppet] - 10https://gerrit.wikimedia.org/r/585375 [06:19:10] (03PS1) 10Elukey: Add peer_ip_src dimension for netflow in Druid and Turnilo [puppet] - 10https://gerrit.wikimedia.org/r/585377 (https://phabricator.wikimedia.org/T246186) [06:22:35] (03CR) 10Elukey: [C: 03+2] Add peer_ip_src dimension for netflow in Druid and Turnilo [puppet] - 10https://gerrit.wikimedia.org/r/585377 (https://phabricator.wikimedia.org/T246186) (owner: 10Elukey) [06:35:20] (03CR) 10Vgutierrez: [C: 03+2] admin: Change SSH key for Ladsgroup again [puppet] - 10https://gerrit.wikimedia.org/r/585375 (owner: 10Ladsgroup) [07:11:34] (03CR) 10Dzahn: "thank you for fixing it. i did not think about POP bastionhosts having this as well" [puppet] - 10https://gerrit.wikimedia.org/r/585370 (https://phabricator.wikimedia.org/T224576) (owner: 10Vgutierrez) [07:21:15] (03PS1) 10Elukey: Use G1 GC and reduce Hive metastore heap max usage in Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/585424 [07:21:56] (03CR) 10Elukey: [C: 03+2] Use G1 GC and reduce Hive metastore heap max usage in Hadoop test [puppet] - 10https://gerrit.wikimedia.org/r/585424 (owner: 10Elukey) [07:25:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10849 and previous config saved to /var/cache/conftool/dbconfig/20200402-072500-marostegui.json [07:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:31] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P10850 and previous config saved to /var/cache/conftool/dbconfig/20200402-072730-marostegui.json [07:27:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:57] !log Deploy schema change on db1092 [07:28:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:11] 10Operations, 10Traffic: ATS TLS session cache efficiency reduced in TLSv1.3 - https://phabricator.wikimedia.org/T245502 (10Vgutierrez) This has been mitigated by providing support for TLS Session Tickets (T245616) and reducing the number of issued tickets on new connections from 2 to 1 by submitting this patc... [07:33:26] 10Operations, 10Traffic: ATS TLS session cache efficiency reduced in TLSv1.3 - https://phabricator.wikimedia.org/T245502 (10Vgutierrez) 05Open→03Resolved [07:33:30] elukey: wfm! [07:33:31] 10Operations, 10Traffic, 10Goal, 10Patch-For-Review, 10Performance-Team (Radar): Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 (10Vgutierrez) [07:33:45] 10Operations, 10Traffic: Provide a simple and automated SSL Ticket key generation system for ATS - https://phabricator.wikimedia.org/T245616 (10Vgutierrez) 05Open→03Resolved [07:33:47] 10Operations, 10Traffic: ATS TLS session cache efficiency reduced in TLSv1.3 - https://phabricator.wikimedia.org/T245502 (10Vgutierrez) [07:34:38] (03CR) 10Dzahn: [C: 03+2] site: fix comment about public/private IPs of apt repo [puppet] - 10https://gerrit.wikimedia.org/r/585247 (owner: 10Dzahn) [07:34:46] (03PS3) 10Dzahn: site: fix comment about public/private IPs of apt repo [puppet] - 10https://gerrit.wikimedia.org/r/585247 [07:35:00] (03PS1) 10Vgutierrez: ATS: Enable inbound TLSv1.3 in upload@esams [puppet] - 10https://gerrit.wikimedia.org/r/585426 (https://phabricator.wikimedia.org/T170567) [07:39:20] (03CR) 10Vgutierrez: "pcc seems happy: https://puppet-compiler.wmflabs.org/compiler1002/21667/" [puppet] - 10https://gerrit.wikimedia.org/r/585426 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [07:45:24] !log bounced ferm on ms-be1040 [07:45:26] RECOVERY - Check systemd state on ms-be1040 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:54] (03PS1) 10Giuseppe Lavagetto: canary_api: use the correct certificate [puppet] - 10https://gerrit.wikimedia.org/r/585427 [07:52:22] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Pita - https://phabricator.wikimedia.org/T247722 (10jcrespo) Thanks, that confirms josepita as your main identity, which which you will have access to logstash and other wmf/nda sites. I will now send an email to verif... [07:53:51] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/21668/mw1276.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/585427 (owner: 10Giuseppe Lavagetto) [07:57:16] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to mwmaint1002.eqiad.wmnet for holger - https://phabricator.wikimedia.org/T248922 (10jcrespo) [07:57:51] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for aaron, dpifke, phedenskog - https://phabricator.wikimedia.org/T248797 (10jcrespo) [07:59:35] (03PS4) 10Jcrespo: admin: Add aaron, dpifke, phedenskog to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/584915 (https://phabricator.wikimedia.org/T248797) [08:02:30] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1040 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm [08:03:33] (03CR) 10Jcrespo: [C: 03+2] admin: Add aaron, dpifke, phedenskog to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/584915 (https://phabricator.wikimedia.org/T248797) (owner: 10Jcrespo) [08:06:38] 10Operations, 10netops: netflow hosts spamming /var/log - https://phabricator.wikimedia.org/T249177 (10ayounsi) [08:06:40] 10Operations, 10netops: fastnetmon spamming /var/log on netflow hosts leading to disk saturation - https://phabricator.wikimedia.org/T240658 (10ayounsi) [08:07:19] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for aaron, dpifke, phedenskog - https://phabricator.wikimedia.org/T248797 (10jcrespo) SRE access has been deployed, in around 30 minutes it should apply to the relevant servers. @elukey I belive this re... [08:08:58] 10Operations, 10netops: fastnetmon spamming /var/log on netflow hosts leading to disk saturation - https://phabricator.wikimedia.org/T240658 (10ayounsi) Note that Fastnetmon 1.1.4 is in Debian 11. Back-porting it might be easy. [08:10:03] (03PS3) 10Jcrespo: admin: Add holger to restricted group to run maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/584932 (https://phabricator.wikimedia.org/T248922) [08:11:06] 10Operations, 10netops: netflow2001 kafkatee-webrequest restart loop - https://phabricator.wikimedia.org/T249176 (10ayounsi) As long as we drop RPKI invalids, we can remove RPKI counter. [08:12:33] (03CR) 10Jcrespo: [C: 03+2] admin: Add holger to restricted group to run maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/584932 (https://phabricator.wikimedia.org/T248922) (owner: 10Jcrespo) [08:15:07] (03CR) 10Ema: [C: 03+1] ATS: Enable inbound TLSv1.3 in upload@esams [puppet] - 10https://gerrit.wikimedia.org/r/585426 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [08:16:31] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to mwmaint1002.eqiad.wmnet for holger - https://phabricator.wikimedia.org/T248922 (10jcrespo) @holger.knust your access has been deployed, in a few minutes it will take effect on the desired hosts (mwmaint1002, ...). **Please test ac... [08:16:56] (03CR) 10Vgutierrez: [C: 03+2] ATS: Enable inbound TLSv1.3 in upload@esams [puppet] - 10https://gerrit.wikimedia.org/r/585426 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [08:19:03] (03PS2) 10Gehel: maps: isolate osm master from the codfw maps cluster [puppet] - 10https://gerrit.wikimedia.org/r/585153 (https://phabricator.wikimedia.org/T249086) [08:21:49] !log Enable TLS Session tickets in esams - T245616 [08:21:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:55] T245616: Provide a simple and automated SSL Ticket key generation system for ATS - https://phabricator.wikimedia.org/T245616 [08:22:06] log Enable inbound TLSv1.3 in upload@esams - T170567 [08:22:06] T170567: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 [08:22:19] !log Enable inbound TLSv1.3 in upload@esams - T170567 [08:22:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:41] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Pita - https://phabricator.wikimedia.org/T247722 (10jcrespo) I got confirmation by owner of account to delete/revoke the non-wikimedia associated LDAP account. Thanks for the prompt response! [08:28:14] !log repooling wdqs1006 - catched up on lag [08:28:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:56] (03CR) 10Elukey: [C: 03+1] maps: isolate osm master from the codfw maps cluster [puppet] - 10https://gerrit.wikimedia.org/r/585153 (https://phabricator.wikimedia.org/T249086) (owner: 10Gehel) [08:32:22] (03CR) 10Ema: [C: 03+2] systemd: add support for network accounting [puppet] - 10https://gerrit.wikimedia.org/r/584553 (https://phabricator.wikimedia.org/T183146) (owner: 10Ema) [08:36:07] (03CR) 10Ema: [C: 03+2] cache: update service restart scripts comments [puppet] - 10https://gerrit.wikimedia.org/r/584901 (https://phabricator.wikimedia.org/T238625) (owner: 10Ema) [08:36:25] (03PS1) 10Giuseppe Lavagetto: envoyproxy::tls_terminator: update accesslog directives [puppet] - 10https://gerrit.wikimedia.org/r/585433 [08:37:00] (03CR) 10Volans: "extra nit inline ;)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/585373 (https://phabricator.wikimedia.org/T249103) (owner: 10Elukey) [08:39:18] (03PS2) 10Jcrespo: admin: Remove jpita account, only apply special privileges to Josepita [puppet] - 10https://gerrit.wikimedia.org/r/583720 (https://phabricator.wikimedia.org/T247722) (owner: 10Volans) [08:39:37] (03CR) 10Jcrespo: admin: Remove jpita account, only apply special privileges to Josepita (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/583720 (https://phabricator.wikimedia.org/T247722) (owner: 10Volans) [08:39:50] (03PS1) 10Ayounsi: Remove RPKIcounter [puppet] - 10https://gerrit.wikimedia.org/r/585434 (https://phabricator.wikimedia.org/T249176) [08:41:17] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for aaron, dpifke, phedenskog - https://phabricator.wikimedia.org/T248797 (10Gilles) Indeed: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos#Create_a_principal_for_a_real_user Who should we assign that part to? [08:42:02] (03PS1) 10Jcrespo: admin: Complete remove all references to jpita [puppet] - 10https://gerrit.wikimedia.org/r/585437 (https://phabricator.wikimedia.org/T247722) [08:44:17] (03CR) 10Ayounsi: "https://puppet-compiler.wmflabs.org/compiler1001/21670/netflow1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/585434 (https://phabricator.wikimedia.org/T249176) (owner: 10Ayounsi) [08:48:31] (03CR) 10Muehlenhoff: [C: 03+1] admin: Complete remove all references to jpita [puppet] - 10https://gerrit.wikimedia.org/r/585437 (https://phabricator.wikimedia.org/T247722) (owner: 10Jcrespo) [08:50:20] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Pita - https://phabricator.wikimedia.org/T247722 (10jcrespo) Double checking this is how things will end up after the changes are applied: * Jpita privileges stripped, cannot be used to access logstash or other sites.... [08:50:21] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P10853 and previous config saved to /var/cache/conftool/dbconfig/20200402-085019-marostegui.json [08:50:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:41] (03CR) 10Elukey: [C: 03+2] profile::kerberos: fix typo in manage_principals.py (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/585373 (https://phabricator.wikimedia.org/T249103) (owner: 10Elukey) [08:50:58] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1104 for schema change', diff saved to https://phabricator.wikimedia.org/P10854 and previous config saved to /var/cache/conftool/dbconfig/20200402-085057-marostegui.json [08:51:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:15] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for aaron, dpifke, phedenskog - https://phabricator.wikimedia.org/T248797 (10jcrespo) a:05Nuria→03elukey [08:51:19] !log Deploy schema change db1104 [08:51:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:37] (03CR) 10Aklapper: [C: 03+1] "As we already have `video/ogg` in there I don't see a reason against this" [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T244162) (owner: 10Zoranzoki21) [09:00:19] 10Operations, 10serviceops, 10Patch-For-Review: CORS errors on commons on debug servers - https://phabricator.wikimedia.org/T249107 (10ema) p:05Triage→03Medium [09:00:51] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020), 10Patch-For-Review: Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10jcrespo) Ping @Nuria. [09:01:33] (03PS2) 10Ema: Whitelist X-Wikimedia-Debug header for CORS media requests [puppet] - 10https://gerrit.wikimedia.org/r/585252 (https://phabricator.wikimedia.org/T249107) (owner: 10Gergő Tisza) [09:01:42] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) p:05Triage→03Medium [09:04:51] (03CR) 10ArielGlenn: [C: 03+2] add test of get_prefetch_arg requiring multiple prefetch files [dumps] - 10https://gerrit.wikimedia.org/r/585355 (https://phabricator.wikimedia.org/T249131) (owner: 10ArielGlenn) [09:04:56] 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1023 crashed / Smart Storage Battery failure - https://phabricator.wikimedia.org/T249174 (10fgiunchedi) [09:05:38] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) [09:05:52] (03CR) 10Ema: [C: 03+2] "VTC test case updated too, other than that the patch looks good. Merging." [puppet] - 10https://gerrit.wikimedia.org/r/585252 (https://phabricator.wikimedia.org/T249107) (owner: 10Gergő Tisza) [09:06:00] 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1023 crashed / Smart Storage Battery failure - https://phabricator.wikimedia.org/T249174 (10fgiunchedi) Thanks @Volans for taking a look! Indeed seems like the battery failed, maybe we can try a reseat first @Cmjohnson @Jclark-ctr next time you get a chanc... [09:10:30] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) [09:11:41] (03CR) 10Ema: [C: 03+2] ATS: increase check_trafficserver_log_fifo timeout [puppet] - 10https://gerrit.wikimedia.org/r/585256 (https://phabricator.wikimedia.org/T248067) (owner: 10Ema) [09:14:22] (03PS2) 10Giuseppe Lavagetto: envoyproxy::tls_terminator: update accesslog directives [puppet] - 10https://gerrit.wikimedia.org/r/585433 [09:14:24] (03PS1) 10Giuseppe Lavagetto: mediawiki: turn on the access log on the canaries [puppet] - 10https://gerrit.wikimedia.org/r/585446 [09:14:51] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] "LGTM" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/580128 (https://phabricator.wikimedia.org/T215458) (owner: 10Hashar) [09:15:02] ;] [09:20:51] (03PS3) 10Jcrespo: admin: Remove jpita account, only apply special privileges to Josepita [puppet] - 10https://gerrit.wikimedia.org/r/583720 (https://phabricator.wikimedia.org/T247722) (owner: 10Volans) [09:23:21] 10Operations, 10observability, 10Performance-Team (Radar): Decide on `service-runner` aggregated prometheus metrics and use of `service` label - https://phabricator.wikimedia.org/T247820 (10fgiunchedi) >>! In T247820#6014844, @Ottomata wrote: >> e.g. golang services), and we seem to be fine without it? > If... [09:24:55] PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops [09:25:44] (03PS1) 10Giuseppe Lavagetto: Add buster support [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/585451 [09:25:52] (03PS3) 10Zoranzoki21: Add .webm in files.viewable-mime-types of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T244162) [09:26:05] RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops [09:26:11] (03PS4) 10Zoranzoki21: Add .webm in files.viewable-mime-types of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T215360) [09:26:17] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Add buster support [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/585451 (owner: 10Giuseppe Lavagetto) [09:26:28] (03CR) 10Zoranzoki21: "(changed bug number)" [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T215360) (owner: 10Zoranzoki21) [09:27:04] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [09:28:38] !log oblivian@deploy1001 Started deploy [docker-pkg/deploy@4f86d77]: (no justification provided) [09:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:47] !log oblivian@deploy1001 Finished deploy [docker-pkg/deploy@4f86d77]: (no justification provided) (duration: 00m 09s) [09:28:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:37] (03PS1) 10Giuseppe Lavagetto: Brown paper bag fix of deneb's domain [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/585454 [09:32:42] !log oblivian@deploy1001 Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) [09:32:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:00] !log oblivian@deploy1001 Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 18s) [09:33:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:12] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] Brown paper bag fix of deneb's domain [docker-images/docker-pkg/deploy] - 10https://gerrit.wikimedia.org/r/585454 (owner: 10Giuseppe Lavagetto) [09:34:22] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [09:39:48] (03PS3) 10Gehel: maps: isolate osm master from the codfw maps cluster [puppet] - 10https://gerrit.wikimedia.org/r/585153 (https://phabricator.wikimedia.org/T249086) [09:40:33] !log depool wdqs2004 for data reimport - T249086 [09:40:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:39] T249086: Reimport OSM data on maps servers - https://phabricator.wikimedia.org/T249086 [09:40:48] !log CORRECTION: depool maps2004 for data reimport - T249086 [09:41:13] (03CR) 10Gehel: [C: 03+2] maps: isolate osm master from the codfw maps cluster [puppet] - 10https://gerrit.wikimedia.org/r/585153 (https://phabricator.wikimedia.org/T249086) (owner: 10Gehel) [09:45:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:58] 10Operations, 10serviceops, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): Build and publish a python2 based container to build wheels - https://phabricator.wikimedia.org/T249110 (10Joe) 05Open→03Resolved a:03Joe Successfully published i... [09:46:08] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) a:03aezell Team manager approval is required next, then we will need the ok from analytics. [09:47:32] !log Remove haproxy@10.64.37.14 from labsdb hosts - T231280 T248944 [09:47:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:38] T248944: Decommission dbproxy1010.eqiad.wmnet - https://phabricator.wikimedia.org/T248944 [09:47:39] T231280: Remove grants for the old dbproxy hosts from the misc databases - https://phabricator.wikimedia.org/T231280 [09:51:27] 10Operations, 10serviceops, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): Build and publish a python2 based container to build wheels - https://phabricator.wikimedia.org/T249110 (10hashar) `counterexample $ docker pull docker-registry.wikimed... [09:53:07] (03PS4) 10Ayounsi: Manage static flowspec rules via Homer [homer/public] - 10https://gerrit.wikimedia.org/r/585232 [09:53:41] (03PS1) 10Jcrespo: admin: Add analytics private group to tchanders, dmaza, dbarratt, wikigit [puppet] - 10https://gerrit.wikimedia.org/r/585458 (https://phabricator.wikimedia.org/T249059) [09:54:59] (03CR) 10Ayounsi: [C: 03+2] Manage static flowspec rules via Homer (032 comments) [homer/public] - 10https://gerrit.wikimedia.org/r/585232 (owner: 10Ayounsi) [09:55:19] (03Merged) 10jenkins-bot: Manage static flowspec rules via Homer [homer/public] - 10https://gerrit.wikimedia.org/r/585232 (owner: 10Ayounsi) [09:55:35] (03CR) 10Jcrespo: [C: 04-1] "Waiting for manager and analytics approval." [puppet] - 10https://gerrit.wikimedia.org/r/585458 (https://phabricator.wikimedia.org/T249059) (owner: 10Jcrespo) [10:00:20] (03PS3) 10Giuseppe Lavagetto: envoyproxy::tls_terminator: update accesslog directives [puppet] - 10https://gerrit.wikimedia.org/r/585433 [10:00:22] (03PS2) 10Giuseppe Lavagetto: mediawiki: turn on the access log on the canaries [puppet] - 10https://gerrit.wikimedia.org/r/585446 [10:00:24] (03PS1) 10Giuseppe Lavagetto: docker_registry_ha: Add new image builder to the list [puppet] - 10https://gerrit.wikimedia.org/r/585461 [10:00:44] (03PS2) 10Giuseppe Lavagetto: docker_registry_ha: Add new image builder to the list [puppet] - 10https://gerrit.wikimedia.org/r/585461 [10:03:30] (03CR) 10Giuseppe Lavagetto: [C: 03+2] docker_registry_ha: Add new image builder to the list [puppet] - 10https://gerrit.wikimedia.org/r/585461 (owner: 10Giuseppe Lavagetto) [10:04:03] (03PS1) 10JMeybohm: admin: upgrade jayme to root shell user (ops) [puppet] - 10https://gerrit.wikimedia.org/r/585462 (https://phabricator.wikimedia.org/T249081) [10:04:11] (03PS1) 10Volans: netbox: increase changelog retention to 2 years [puppet] - 10https://gerrit.wikimedia.org/r/585463 [10:07:06] (03CR) 10Volans: "The default is 90 days. The current size of the 'extras_objectchange' table is 5MB. I think it would be useful to keep those changelog fo" [puppet] - 10https://gerrit.wikimedia.org/r/585463 (owner: 10Volans) [10:07:53] (03PS2) 10Elukey: Enable TLS encryption to Kafka Jumbo for all pmacct instances [puppet] - 10https://gerrit.wikimedia.org/r/585255 (https://phabricator.wikimedia.org/T248980) [10:10:07] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/585462 (https://phabricator.wikimedia.org/T249081) (owner: 10JMeybohm) [10:11:21] (03CR) 10Elukey: [C: 03+2] Enable TLS encryption to Kafka Jumbo for all pmacct instances [puppet] - 10https://gerrit.wikimedia.org/r/585255 (https://phabricator.wikimedia.org/T248980) (owner: 10Elukey) [10:13:53] (03CR) 10Ayounsi: [C: 03+1] "Can put a calendar reminded for in 1.9y to check the DB size and increase it to 5y if good. :)" [puppet] - 10https://gerrit.wikimedia.org/r/585463 (owner: 10Volans) [10:15:06] 10Operations, 10serviceops, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): Build and publish a python2 based container to build wheels - https://phabricator.wikimedia.org/T249110 (10hashar) There is some issue with the registry, but eventually... [10:17:48] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P10856 and previous config saved to /var/cache/conftool/dbconfig/20200402-101747-marostegui.json [10:17:50] !log set up TLS encryption for all pmacct instances on netflow* to Kafka Jumbo [10:17:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P10857 and previous config saved to /var/cache/conftool/dbconfig/20200402-101920-marostegui.json [10:19:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:32] !log Deploy schema change on db1087, this will generate lag on s8 on wiki replicas [10:19:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:18] (03CR) 10Filippo Giunchedi: [C: 03+2] smart: stop smartd on Buster + hpsa [puppet] - 10https://gerrit.wikimedia.org/r/581617 (https://phabricator.wikimedia.org/T246997) (owner: 10Filippo Giunchedi) [10:23:54] RECOVERY - Check systemd state on db1078 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:27:38] 10Operations, 10Patch-For-Review, 10User-fgiunchedi: smartd not starting properly on gen9 + buster - https://phabricator.wikimedia.org/T246997 (10fgiunchedi) 05Open→03Resolved a:03fgiunchedi `smartd.service` will be masked on Buster + hpsa, resolving [10:29:18] (03CR) 10Jbond: "lgtm on nit on the commit message" (031 comment) [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585258 (owner: 10Ssingh) [10:32:36] (03CR) 10Jcrespo: "I think there is some issues with the patch, talk to me on IRC." (034 comments) [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/584814 (https://phabricator.wikimedia.org/T248843) (owner: 10Marostegui) [10:35:23] (03CR) 10Jbond: [C: 03+2] offboard-user: update script so that it can traverse subgroups [puppet] - 10https://gerrit.wikimedia.org/r/577553 (https://phabricator.wikimedia.org/T245771) (owner: 10Jbond) [10:35:54] (03PS4) 10Jbond: offboard-user: update script so that it can traverse subgroups [puppet] - 10https://gerrit.wikimedia.org/r/577553 (https://phabricator.wikimedia.org/T245771) [10:37:16] (03PS1) 10Filippo Giunchedi: prometheus: additional external_labels for Thanos [puppet] - 10https://gerrit.wikimedia.org/r/585468 (https://phabricator.wikimedia.org/T233956) [10:38:05] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/21676/restbase1018.eqiad.wmnet/ this is a noop right now." [puppet] - 10https://gerrit.wikimedia.org/r/585433 (owner: 10Giuseppe Lavagetto) [10:39:55] (03CR) 10Jbond: [C: 03+1] admin: Complete remove all references to jpita [puppet] - 10https://gerrit.wikimedia.org/r/585437 (https://phabricator.wikimedia.org/T247722) (owner: 10Jcrespo) [10:41:34] (03CR) 10jerkins-bot: [V: 04-1] prometheus: additional external_labels for Thanos [puppet] - 10https://gerrit.wikimedia.org/r/585468 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [10:41:42] (03PS2) 10Filippo Giunchedi: prometheus: additional external_labels for Thanos [puppet] - 10https://gerrit.wikimedia.org/r/585468 (https://phabricator.wikimedia.org/T233956) [10:41:44] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1001/21677/" [puppet] - 10https://gerrit.wikimedia.org/r/585468 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [10:43:06] 10Operations, 10Phabricator, 10Security-Team, 10Patch-For-Review, 10Security: Adjust onboarding/offboarding logic to accommodate changes to #security (now acl*security) - https://phabricator.wikimedia.org/T245771 (10jbond) The script has been updated please feel free to test it further on mwmaint1002 and... [10:43:27] 10Operations, 10Phabricator, 10Security-Team, 10Patch-For-Review, 10Security: Adjust onboarding/offboarding logic to accommodate changes to #security (now acl*security) - https://phabricator.wikimedia.org/T245771 (10jbond) a:05jbond→03None [10:43:31] (03PS3) 10Filippo Giunchedi: prometheus: additional external_labels for Thanos [puppet] - 10https://gerrit.wikimedia.org/r/585468 (https://phabricator.wikimedia.org/T233956) [10:46:07] (03CR) 10Giuseppe Lavagetto: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/21678/ does what we expect." [puppet] - 10https://gerrit.wikimedia.org/r/585446 (owner: 10Giuseppe Lavagetto) [10:51:37] (03PS1) 10Ayounsi: Fix bug in the new flowspec_flows [homer/public] - 10https://gerrit.wikimedia.org/r/585470 [10:52:13] (03CR) 10Volans: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/585470 (owner: 10Ayounsi) [10:52:25] (03CR) 10Ayounsi: [C: 03+2] Fix bug in the new flowspec_flows [homer/public] - 10https://gerrit.wikimedia.org/r/585470 (owner: 10Ayounsi) [10:52:44] (03Merged) 10jenkins-bot: Fix bug in the new flowspec_flows [homer/public] - 10https://gerrit.wikimedia.org/r/585470 (owner: 10Ayounsi) [10:55:47] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [10:55:50] (03CR) 10Marostegui: wmf-pt-kill: Update package to PT 3.1.0 (034 comments) [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/584814 (https://phabricator.wikimedia.org/T248843) (owner: 10Marostegui) [10:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:27] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [10:56:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:56:34] 10Operations, 10serviceops, 10Patch-For-Review: upgrade planet.wikimedia.org backends to buster - https://phabricator.wikimedia.org/T247651 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `planet1001.eqiad.wmnet` - planet1001.eqiad.wmnet (**PASS**) - Downtime... [10:56:35] !log decom planet1001 (T248863) [10:56:59] !log dzahn@cumin1001 START - Cookbook sre.hosts.decommission [10:57:08] !log decom planet2001 (T248863) [10:57:47] !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [10:57:53] 10Operations, 10serviceops, 10Patch-For-Review: upgrade planet.wikimedia.org backends to buster - https://phabricator.wikimedia.org/T247651 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: `planet2001.codfw.wmnet` - planet2001.codfw.wmnet (**PASS**) - Downtime... [10:58:54] (03CR) 10Dzahn: [C: 03+2] decom planet1001 and planet2001 [puppet] - 10https://gerrit.wikimedia.org/r/585218 (https://phabricator.wikimedia.org/T247651) (owner: 10Dzahn) [10:59:03] (03PS3) 10Dzahn: decom planet1001 and planet2001 [puppet] - 10https://gerrit.wikimedia.org/r/585218 (https://phabricator.wikimedia.org/T247651) [10:59:31] (03PS1) 10Giuseppe Lavagetto: envoyproxy::tls_terminator: fix filter name for access log [puppet] - 10https://gerrit.wikimedia.org/r/585471 [10:59:40] (03PS2) 10Marostegui: wmf-pt-kill: Update package to PT 3.1.0 [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/584814 (https://phabricator.wikimedia.org/T248843) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200402T1100). [11:00:04] cormacparle: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:33] ok for me to go ahead? [11:00:49] cormacparle__: yes [11:02:16] ok done! [11:02:53] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] "This is a fix for a production issue, will self-verify." [puppet] - 10https://gerrit.wikimedia.org/r/585471 (owner: 10Giuseppe Lavagetto) [11:03:16] (03PS1) 10Dzahn: remove planet1001 and planet2001 [dns] - 10https://gerrit.wikimedia.org/r/585473 (https://phabricator.wikimedia.org/T247651) [11:03:27] !log created table wbqc_constraints on testcommonswiki [11:03:45] (03CR) 10jerkins-bot: [V: 04-1] remove planet1001 and planet2001 [dns] - 10https://gerrit.wikimedia.org/r/585473 (https://phabricator.wikimedia.org/T247651) (owner: 10Dzahn) [11:04:35] (03PS2) 10Dzahn: remove planet1001 and planet2001 [dns] - 10https://gerrit.wikimedia.org/r/585473 (https://phabricator.wikimedia.org/T247651) [11:11:32] (03CR) 10Dzahn: [C: 03+2] remove planet1001 and planet2001 [dns] - 10https://gerrit.wikimedia.org/r/585473 (https://phabricator.wikimedia.org/T247651) (owner: 10Dzahn) [11:19:00] (03CR) 10Muehlenhoff: "I tested a package installation from apt1001.wikimedia.org on a stretch, buster and jessie system. I think the last part to test is a pack" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [11:25:55] 10Operations, 10vm-requests: Site: EQIAD/CODFW 2 VM request for planet - https://phabricator.wikimedia.org/T248863 (10Dzahn) 05Open→03Resolved [11:25:59] 10Operations, 10serviceops, 10Patch-For-Review: upgrade planet.wikimedia.org backends to buster - https://phabricator.wikimedia.org/T247651 (10Dzahn) [11:26:44] 10Operations, 10DBA, 10MediaWiki-General: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished - https://phabricator.wikimedia.org/T193224 (10jcrespo) [11:30:22] (03PS1) 10Dzahn: add IPv6 records for planet2002.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/585476 (https://phabricator.wikimedia.org/T248863) [11:33:48] (03CR) 10Dzahn: [C: 03+2] add IPv6 records for planet2002.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/585476 (https://phabricator.wikimedia.org/T248863) (owner: 10Dzahn) [11:33:53] (03PS2) 10Dzahn: add IPv6 records for planet2002.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/585476 (https://phabricator.wikimedia.org/T248863) [11:37:25] (03CR) 10Dzahn: [C: 03+1] "> I think the last part to test is a package import (and other basic reprepro operations) on apt1001." [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [11:38:14] (03PS3) 10Marostegui: wmf-pt-kill: Update package to PT 3.1.0 [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/584814 (https://phabricator.wikimedia.org/T248843) [11:40:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1087 after schema change', diff saved to https://phabricator.wikimedia.org/P10858 and previous config saved to /var/cache/conftool/dbconfig/20200402-114020-marostegui.json [11:40:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:23] 10Operations, 10serviceops, 10Patch-For-Review: upgrade planet.wikimedia.org backends to buster - https://phabricator.wikimedia.org/T247651 (10Dzahn) 05Open→03Resolved done. planet1002 and planet2002 on buster have replaced 1001/2001 and the old stretch VMs have been decom'ed. [11:41:25] 10Operations, 10Epic: Migrate all of production metal to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn) [11:48:25] (03CR) 10Dzahn: [C: 03+1] "ah, this difference above is because i have this in .bash_profile:" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [11:51:25] (03PS2) 10Dzahn: hiera/apt.wikimedia.org: switch from install1002 to apt1001 [puppet] - 10https://gerrit.wikimedia.org/r/585245 (https://phabricator.wikimedia.org/T224576) [11:51:26] (03PS1) 10Dzahn: adjust my .bash_profile to set base dir for reprepro on apt* [puppet] - 10https://gerrit.wikimedia.org/r/585485 (https://phabricator.wikimedia.org/T224576) [11:51:44] (03CR) 10Jcrespo: [C: 03+1] "This looks much better :-D I added just a nitpick, but voting +1 anyway." (031 comment) [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/584814 (https://phabricator.wikimedia.org/T248843) (owner: 10Marostegui) [12:02:44] (03CR) 10Dzahn: [C: 03+2] adjust my .bash_profile to set base dir for reprepro on apt* [puppet] - 10https://gerrit.wikimedia.org/r/585485 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [12:02:45] (03PS2) 10Dzahn: adjust my .bash_profile to set base dir for reprepro on apt* [puppet] - 10https://gerrit.wikimedia.org/r/585485 (https://phabricator.wikimedia.org/T224576) [12:09:58] (03CR) 10Dzahn: [C: 03+1] "Yea, after https://gerrit.wikimedia.org/r/c/operations/puppet/+/585485 now it also works with sudo -E." [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [12:20:52] 10Operations, 10Patch-For-Review: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10Dzahn) [12:21:34] 10Operations, 10Patch-For-Review: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10Dzahn) Added to the "maint-announce" Google shared inbox. [12:21:47] (03PS2) 10Ssingh: Allow remote connections to the Postgres database for OONI tests [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585258 [12:22:13] PROBLEM - Varnish frontend child restarted on cp3057 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3057&var-datasource=esams+prometheus/ops [12:22:38] (03CR) 10Ssingh: ">" (031 comment) [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585258 (owner: 10Ssingh) [12:22:45] PROBLEM - Varnish frontend child restarted on cp3061 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3061&var-datasource=esams+prometheus/ops [12:23:11] 10Operations, 10Patch-For-Review: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10Dzahn) [12:23:29] ACKNOWLEDGEMENT - Varnish frontend child restarted on cp3057 is CRITICAL: 2 ge 2 Ema known https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3057&var-datasource=esams+prometheus/ops [12:23:29] ACKNOWLEDGEMENT - Varnish frontend child restarted on cp3061 is CRITICAL: 2 ge 2 Ema known https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3061&var-datasource=esams+prometheus/ops [12:24:29] (03CR) 10Mark Bergsma: [C: 03+2] admin: upgrade jayme to root shell user (ops) [puppet] - 10https://gerrit.wikimedia.org/r/585462 (https://phabricator.wikimedia.org/T249081) (owner: 10JMeybohm) [12:33:16] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10Dzahn) Icinga alert is in state OK and also in downtime. And it's testing-only. [12:35:34] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10Dzahn) 05Open→03Resolved /dev/vda1 63G 41G 20G 68% / [12:35:37] 10Operations, 10Gerrit, 10vm-requests: Gerrit VM to test data migration - https://phabricator.wikimedia.org/T239151 (10Dzahn) [12:37:31] PROBLEM - PHP opcache health on mw2163 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:41:09] RECOVERY - PHP opcache health on mw2163 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [12:42:16] 10Operations, 10serviceops, 10Patch-For-Review: decom old appservers in eqiad - https://phabricator.wikimedia.org/T247780 (10Dzahn) [12:48:17] 10Operations, 10serviceops, 10Patch-For-Review: decom old appservers in eqiad - https://phabricator.wikimedia.org/T247780 (10Dzahn) @cmjohnson @RobH 36 servers have been decom'ed. 30 in D5 and 6 in D4 But the original procurement ticket https://rt.wikimedia.org/Ticket/Display.html?id=8786 and installati... [12:52:28] !log mw1297 - is pooled and serving traffic but status "staged" in netbox. set to "active" [12:52:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:55:00] !log mw1390 - mw1399 - pooled and active but status "staged" in netbox, fixing to 'active' [12:55:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:03] 10Operations, 10serviceops, 10Patch-For-Review: decom old appservers in eqiad - https://phabricator.wikimedia.org/T247780 (10Dzahn) 05Open→03Resolved Yes, it's thumbor1003 and thumbor1004, they are from the same procurement RT ticket. 36 of the 38 old servers from RT8786 have been decom'ed and the 2... [13:03:54] (03PS4) 10Marostegui: wmf-pt-kill: Update package to PT 3.1.0 [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/584814 (https://phabricator.wikimedia.org/T248843) [13:19:25] (03CR) 10Marostegui: wmf-pt-kill: Update package to PT 3.1.0 (031 comment) [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/584814 (https://phabricator.wikimedia.org/T248843) (owner: 10Marostegui) [13:19:25] (03CR) 10Marostegui: [C: 03+2] wmf-pt-kill: Update package to PT 3.1.0 [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/584814 (https://phabricator.wikimedia.org/T248843) (owner: 10Marostegui) [13:22:44] (03CR) 10Giuseppe Lavagetto: [C: 03+1] profile::mediawiki::maintenance: Migrate pagetriage jobs to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/582933 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [13:24:10] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "LGTM but add removal of the now-unused template." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/585250 (owner: 10RLazarus) [13:31:57] (03Abandoned) 10Giuseppe Lavagetto: mw1261: switch to envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/580343 (owner: 10Giuseppe Lavagetto) [13:32:45] !log OSM data reimport on maps2004 - T249086 [13:32:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:50] T249086: Reimport OSM data on maps servers - https://phabricator.wikimedia.org/T249086 [13:33:28] 10Operations, 10docker-pkg, 10serviceops: Investigate why the apt configuration of the wikimedia-buster docker image doesn't seem to prefer wikimedia packages - https://phabricator.wikimedia.org/T249218 (10Joe) [13:33:36] 10Operations, 10docker-pkg, 10serviceops: Investigate why the apt configuration of the wikimedia-buster docker image doesn't seem to prefer wikimedia packages - https://phabricator.wikimedia.org/T249218 (10Joe) p:05Triage→03Medium [13:34:20] 10Operations, 10ops-codfw, 10Traffic, 10decommission: decommission cp20[18,20,22,24-26].codfw.wmnet - https://phabricator.wikimedia.org/T249115 (10Papaul) ` [edit interfaces interface-range disabled] member xe-7/0/3 { ... } + member xe-7/0/5; [edit interfaces] - xe-7/0/5 { - description cp2... [13:37:20] 10Operations, 10ops-codfw, 10Traffic, 10decommission: decommission cp20[18,20,22,24-26].codfw.wmnet - https://phabricator.wikimedia.org/T249115 (10Papaul) [13:37:46] (03PS1) 10Vgutierrez: ATS: Enable inbound TLSv1.3 in upload@codfw [puppet] - 10https://gerrit.wikimedia.org/r/585492 (https://phabricator.wikimedia.org/T170567) [13:39:57] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P10861 and previous config saved to /var/cache/conftool/dbconfig/20200402-133956-marostegui.json [13:40:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:34] !log Deploy schema change on db1111 [13:40:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:36] !log update puppet compiler facts [13:44:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:03] 10Operations, 10serviceops-radar, 10Core Platform Team (Icebox): Onboarding Hugh Nowlan - https://phabricator.wikimedia.org/T242309 (10jcrespo) [13:46:24] (03PS1) 10Gehel: maps: follow redirects during OSM downloads [puppet] - 10https://gerrit.wikimedia.org/r/585494 [13:46:46] (03PS2) 10Gehel: maps: follow redirects during OSM downloads [puppet] - 10https://gerrit.wikimedia.org/r/585494 [13:47:29] (03CR) 10MSantos: [C: 03+1] maps: follow redirects during OSM downloads [puppet] - 10https://gerrit.wikimedia.org/r/585494 (owner: 10Gehel) [13:48:06] (03PS1) 10Marostegui: wmf-pt-kill: Update original binary [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/585495 (https://phabricator.wikimedia.org/T248843) [13:48:42] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: elk7: fields indexed without position data; cannot run PhraseQuery - https://phabricator.wikimedia.org/T248400 (10jcrespo) p:05Triage→03Medium Feel free to update priority (and assign it to yourself), this is just a guess, just triaging to avoid unn... [13:50:30] !log Compress wbqc_constraints on testcommonswiki and commonswiki (empty tables) - T248967 [13:50:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:35] T248967: Create WikibaseQualityConstraints table on commons - https://phabricator.wikimedia.org/T248967 [13:51:28] 10Operations, 10Puppet: Upgrade Puppet to 5.5.19 - https://phabricator.wikimedia.org/T248168 (10jcrespo) [13:51:55] (03CR) 10Gehel: [C: 03+2] maps: follow redirects during OSM downloads [puppet] - 10https://gerrit.wikimedia.org/r/585494 (owner: 10Gehel) [13:55:52] (03CR) 10Vgutierrez: "pcc is happy: https://puppet-compiler.wmflabs.org/compiler1003/21680/" [puppet] - 10https://gerrit.wikimedia.org/r/585492 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [14:02:14] (03PS1) 10Elukey: role::analytics_cluster::coordinator: set G1 for hive daemons [puppet] - 10https://gerrit.wikimedia.org/r/585498 [14:03:43] (03CR) 10Elukey: [C: 03+2] role::analytics_cluster::coordinator: set G1 for hive daemons [puppet] - 10https://gerrit.wikimedia.org/r/585498 (owner: 10Elukey) [14:04:15] (03CR) 10Joal: "LGTM :)" [puppet] - 10https://gerrit.wikimedia.org/r/585498 (owner: 10Elukey) [14:04:24] (03CR) 10Joal: [C: 03+1] role::analytics_cluster::coordinator: set G1 for hive daemons [puppet] - 10https://gerrit.wikimedia.org/r/585498 (owner: 10Elukey) [14:10:23] (03PS1) 10JMeybohm: site: Fix Debian release name (*B*uster) in comment [puppet] - 10https://gerrit.wikimedia.org/r/585500 [14:11:50] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10862 and previous config saved to /var/cache/conftool/dbconfig/20200402-141149-marostegui.json [14:11:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:36] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10863 and previous config saved to /var/cache/conftool/dbconfig/20200402-141335-marostegui.json [14:13:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:23] 10Operations, 10Mail, 10MediaWiki-Email: Domain of sender address of Wikimedia mail notifications is set to mw1337.eqiad.wmn for emails from Sinhala Wikipedia - https://phabricator.wikimedia.org/T249014 (10jcrespo) @Rehman Thanks for reporting. I can reproduce the error. First of all, don't be alarmed- this... [14:17:48] (03PS1) 10Jbond: profile::mail::jumpcloud: add new class to manage jumpcloud aliases [puppet] - 10https://gerrit.wikimedia.org/r/585501 (https://phabricator.wikimedia.org/T244792) [14:18:02] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10864 and previous config saved to /var/cache/conftool/dbconfig/20200402-141802-marostegui.json [14:18:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:34] (03PS1) 10Hnowlan: changeprop: Bump log level [deployment-charts] - 10https://gerrit.wikimedia.org/r/585502 (https://phabricator.wikimedia.org/T248677) [14:21:25] (03CR) 10jerkins-bot: [V: 04-1] profile::mail::jumpcloud: add new class to manage jumpcloud aliases [puppet] - 10https://gerrit.wikimedia.org/r/585501 (https://phabricator.wikimedia.org/T244792) (owner: 10Jbond) [14:23:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10865 and previous config saved to /var/cache/conftool/dbconfig/20200402-142338-marostegui.json [14:23:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:23] (03CR) 10Ema: [C: 03+1] ATS: Enable inbound TLSv1.3 in upload@codfw [puppet] - 10https://gerrit.wikimedia.org/r/585492 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [14:24:50] !log updating bluez on ganeti and cloudvirt [14:24:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:42] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Move netflow to TLS encryption/authentication via librdkafka - https://phabricator.wikimedia.org/T248980 (10elukey) For the moment I am happy with TLS encryption only, since we'll probably move to kerberos authentication soon and it doesn't make much... [14:26:52] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Move netflow to TLS encryption/authentication via librdkafka - https://phabricator.wikimedia.org/T248980 (10elukey) [14:28:34] (03CR) 10Vgutierrez: [C: 03+2] ATS: Enable inbound TLSv1.3 in upload@codfw [puppet] - 10https://gerrit.wikimedia.org/r/585492 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [14:28:44] (03CR) 10Muehlenhoff: [C: 03+1] site: Fix Debian release name (*B*uster) in comment [puppet] - 10https://gerrit.wikimedia.org/r/585500 (owner: 10JMeybohm) [14:30:06] (03CR) 10Jcrespo: [C: 03+1] wmf-pt-kill: Update original binary [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/585495 (https://phabricator.wikimedia.org/T248843) (owner: 10Marostegui) [14:30:56] (03PS2) 10Jbond: profile::mail::jumpcloud: add new class to manage jumpcloud aliases [puppet] - 10https://gerrit.wikimedia.org/r/585501 (https://phabricator.wikimedia.org/T244792) [14:31:43] (03CR) 10JMeybohm: [C: 03+2] site: Fix Debian release name (*B*uster) in comment [puppet] - 10https://gerrit.wikimedia.org/r/585500 (owner: 10JMeybohm) [14:32:07] 10Operations, 10Core Platform Team, 10MediaWiki-API, 10serviceops, 10Patch-For-Review: CORS errors on commons on debug servers - https://phabricator.wikimedia.org/T249107 (10Tgr) [14:33:50] !log Enable TLS Session tickets in codfw - T245616 [14:33:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:56] T245616: Provide a simple and automated SSL Ticket key generation system for ATS - https://phabricator.wikimedia.org/T245616 [14:33:56] !log Enable inbound TLSv1.3 in upload@codfw - T170567 [14:34:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:01] T170567: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 [14:34:37] (03CR) 10Jbond: [C: 03+1] "LGTM" [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585258 (owner: 10Ssingh) [14:36:06] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10jbond) 05Resolved→03Open @Dzahn see my [[ https://phabricator.wikimedia.org/T243808#6018042 | comment above ]] I think we should investigate fi... [14:36:08] 10Operations, 10Gerrit, 10vm-requests: Gerrit VM to test data migration - https://phabricator.wikimedia.org/T239151 (10jbond) [14:39:18] (03CR) 10Marostegui: "recheck" [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/585495 (https://phabricator.wikimedia.org/T248843) (owner: 10Marostegui) [14:41:36] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [14:42:39] (03PS4) 10Andrew Bogott: neutron: enable l3_agent_only_dmz_cidr_hack in eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/585031 (https://phabricator.wikimedia.org/T247505) [14:42:41] (03PS6) 10Andrew Bogott: Openstack Neutron: add neutron l3 hacks for Rocky [puppet] - 10https://gerrit.wikimedia.org/r/585034 (https://phabricator.wikimedia.org/T248635) [14:42:43] (03PS1) 10Andrew Bogott: Remove hiera def for labmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/585507 (https://phabricator.wikimedia.org/T249058) [14:42:45] (03PS1) 10Andrew Bogott: Remove hosts hiera for cloudmetrics1001 and 1002 [puppet] - 10https://gerrit.wikimedia.org/r/585508 (https://phabricator.wikimedia.org/T249058) [14:43:54] (03PS2) 10Andrew Bogott: Remove hiera def for labmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/585507 (https://phabricator.wikimedia.org/T249058) [14:43:56] (03PS2) 10Andrew Bogott: Remove hosts hiera for cloudmetrics1001 and 1002 [puppet] - 10https://gerrit.wikimedia.org/r/585508 (https://phabricator.wikimedia.org/T249058) [14:44:13] (03PS1) 10Jcrespo: InitializeSettings: Update site name for siwiki, causes formatting issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585509 (https://phabricator.wikimedia.org/T249014) [14:45:12] (03CR) 10Marostegui: "recheck" [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/585495 (https://phabricator.wikimedia.org/T248843) (owner: 10Marostegui) [14:45:20] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10dbarratt) [14:45:49] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10dbarratt) >>! In T249059#6020558, @Tchanders wrote: > @dbarratt I think you have production acc... [14:49:01] (03CR) 10Marostegui: [V: 03+2 C: 03+2] wmf-pt-kill: Update original binary [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/585495 (https://phabricator.wikimedia.org/T248843) (owner: 10Marostegui) [14:49:07] (03CR) 10CDanis: [C: 03+1] prometheus: additional external_labels for Thanos [puppet] - 10https://gerrit.wikimedia.org/r/585468 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [14:50:10] (03PS1) 10Volans: commit: do not commit_check on initial empty diff [software/homer] - 10https://gerrit.wikimedia.org/r/585510 (https://phabricator.wikimedia.org/T244363) [14:50:13] (03PS1) 10Volans: diff: allow to omit the actual diff [software/homer] - 10https://gerrit.wikimedia.org/r/585511 [14:50:18] (03PS1) 10Volans: diff: use different exit code if there is a diff [software/homer] - 10https://gerrit.wikimedia.org/r/585512 (https://phabricator.wikimedia.org/T249224) [14:53:47] (03CR) 10jerkins-bot: [V: 04-1] diff: use different exit code if there is a diff [software/homer] - 10https://gerrit.wikimedia.org/r/585512 (https://phabricator.wikimedia.org/T249224) (owner: 10Volans) [14:53:59] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) Don't worry too much, I checked for everybody, based on the checklist, hence why it wa... [14:55:41] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10elukey) > Reason for access: As part of our work for CheckUser and IP masking projects, we need... [14:56:52] !log push new test switch config for cloudvirt2001 - T248425 [14:56:54] (03CR) 10Andrew Bogott: [C: 03+2] Remove hiera def for labmon1002 [puppet] - 10https://gerrit.wikimedia.org/r/585507 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [14:56:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:57] T248425: Test using trunked interfaces for cloudvirts - https://phabricator.wikimedia.org/T248425 [14:57:17] (03CR) 10Jcrespo: "If this works, we can later try to understand WHY this was happening." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585509 (https://phabricator.wikimedia.org/T249014) (owner: 10Jcrespo) [14:57:32] (03PS2) 10Volans: diff: use different exit code if there is a diff [software/homer] - 10https://gerrit.wikimedia.org/r/585512 (https://phabricator.wikimedia.org/T249224) [15:00:41] 10Operations, 10MediaWiki-API, 10serviceops, 10Core Platform Team Workboards (External Code Reviews), 10Patch-For-Review: CORS errors on commons on debug servers - https://phabricator.wikimedia.org/T249107 (10Anomie) [15:00:59] (03CR) 10Scardenasmolinar: [C: 03+1] "Awesome!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585371 (https://phabricator.wikimedia.org/T245791) (owner: 10Samwilson) [15:02:37] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10JMeybohm) [15:06:45] (03CR) 10Ssingh: [C: 03+2] Allow remote connections to the Postgres database for OONI tests [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585258 (owner: 10Ssingh) [15:07:26] (03CR) 10Jcrespo: [C: 04-1] "RTL is not happy with me. :-(" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585509 (https://phabricator.wikimedia.org/T249014) (owner: 10Jcrespo) [15:08:29] (03CR) 10Muehlenhoff: "The keys are still missing, we need to copy the PGP keys used to validate access to external repositories from install1002 to apt1001, "gp" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [15:08:51] (03PS1) 10Jbond: profile::tlsprox::envoy: update request_timeout parameter [puppet] - 10https://gerrit.wikimedia.org/r/585517 [15:11:35] (03CR) 10jerkins-bot: [V: 04-1] profile::tlsprox::envoy: update request_timeout parameter [puppet] - 10https://gerrit.wikimedia.org/r/585517 (owner: 10Jbond) [15:18:21] (03PS2) 10Jcrespo: InitializeSettings: Update site name for siwiki, causes formatting issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585509 (https://phabricator.wikimedia.org/T249014) [15:22:25] (03CR) 10Ppchelko: [C: 03+2] changeprop: Bump log level (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/585502 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [15:22:51] (03Merged) 10jenkins-bot: changeprop: Bump log level [deployment-charts] - 10https://gerrit.wikimedia.org/r/585502 (https://phabricator.wikimedia.org/T248677) (owner: 10Hnowlan) [15:28:52] (03PS1) 10Hoo man: Restrict labs access to Wikibase's wb_changes [puppet] - 10https://gerrit.wikimedia.org/r/585523 (https://phabricator.wikimedia.org/T249010) [15:30:42] 10Operations, 10ops-eqiad, 10Analytics: (Need by: TBD) rack/setup/install kafka-jumbo100[789].eqiad.wmnet - https://phabricator.wikimedia.org/T244506 (10Cmjohnson) These are failing during install. @elukey can you verify the raid configuration please Failed to partition the selected disk │... [15:33:05] (03PS2) 10Jbond: profile::tlsprox::envoy: update request_timeout parameter [puppet] - 10https://gerrit.wikimedia.org/r/585517 [15:36:39] (03CR) 10Jbond: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/585517 (owner: 10Jbond) [15:41:15] (03CR) 10Cwhite: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/585468 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [15:44:16] (03PS1) 10Muehlenhoff: Add s-nail to send mails after package imports [puppet] - 10https://gerrit.wikimedia.org/r/585528 (https://phabricator.wikimedia.org/T224576) [15:48:21] (03CR) 10Muehlenhoff: "I built a dummy patched "hello" package and imported it, that went fine:" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [15:55:52] (03PS1) 10Ssingh: Update the changelog for v0.1.1 release [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585530 [15:58:01] 10Operations, 10Wikimedia-Mailing-lists: Creation of three Wikimedia CH mailing lists - https://phabricator.wikimedia.org/T248910 (10jcrespo) a:03jcrespo [16:00:04] godog and _joe_: It is that lovely time of the day again! You are hereby commanded to deploy Puppet SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200402T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:00:13] oh [16:00:21] if anyone willing, I have a patch for contint1001 [16:00:27] to tweak Jenkins Content Security Policy rules [16:01:30] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [16:03:18] 10Operations, 10Wikimedia-Mailing-lists: Delete email addresses with privileged @domain names from mailing lists at offboarding - https://phabricator.wikimedia.org/T248384 (10jcrespo) @akosiaris This seems unrelated to your recent proposal for mailman tooling, but could it be an included, secondary use case? [16:07:01] 10Operations, 10MediaWiki-General, 10observability, 10serviceops, 10Patch-For-Review: MediaWiki Prometheus support - https://phabricator.wikimedia.org/T240685 (10colewhite) a:03colewhite [16:07:30] (03CR) 10Ssingh: "Merging without review as no code was changed; only the changelog was updated." [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585530 (owner: 10Ssingh) [16:07:34] (03CR) 10Ssingh: [C: 03+2] Update the changelog for v0.1.1 release [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585530 (owner: 10Ssingh) [16:10:23] 10Operations, 10Wikimedia-Mailing-lists: Creation of three Wikimedia CH mailing lists - https://phabricator.wikimedia.org/T248910 (10jcrespo) I intend to create those as requested tomorrow morning (CEST). If you happen to be around -for verification- that would be great :-D. [16:10:31] 10Operations, 10Wikimedia-Mailing-lists: Creation of three Wikimedia CH mailing lists - https://phabricator.wikimedia.org/T248910 (10jcrespo) p:05Triage→03Medium [16:16:01] 10Operations, 10Wikimedia-Mailing-lists: Delete email addresses with privileged @domain names from mailing lists at offboarding - https://phabricator.wikimedia.org/T248384 (10akosiaris) Yes the same API could be used as well so we could accommodate it. [16:19:28] !log upgrade netflow4001's fastnetmon to 1.1.4 - T240658 [16:19:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:34] T240658: fastnetmon spamming /var/log on netflow hosts leading to disk saturation - https://phabricator.wikimedia.org/T240658 [16:19:58] (03PS1) 10ArielGlenn: remove extraneous private/public wik checks [dumps] - 10https://gerrit.wikimedia.org/r/585534 [16:20:00] (03PS1) 10ArielGlenn: remove capability to dump private tables [dumps] - 10https://gerrit.wikimedia.org/r/585535 [16:20:07] (03CR) 10Jforrester: "Huh. This has been this way since Roan's initial commit of this repo into git in 015f5b713. But yeah, this seems reasonable." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585509 (https://phabricator.wikimedia.org/T249014) (owner: 10Jcrespo) [16:20:19] (03CR) 10jerkins-bot: [V: 04-1] remove capability to dump private tables [dumps] - 10https://gerrit.wikimedia.org/r/585535 (owner: 10ArielGlenn) [16:24:54] (03CR) 10Cwhite: [C: 04-1] "See inline. I'm fairly certain the lookups will fail as the are right now." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [16:26:41] (03CR) 10Amire80: [C: 03+1] InitializeSettings: Update site name for siwiki, causes formatting issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585509 (https://phabricator.wikimedia.org/T249014) (owner: 10Jcrespo) [16:26:47] (03PS1) 10Volans: netbox: silently skip devices without platform [software/homer] - 10https://gerrit.wikimedia.org/r/585536 [16:27:14] (03PS3) 10Jforrester: [siwiki] Change wgSitename to drop the ',' as it causes formatting issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585509 (https://phabricator.wikimedia.org/T249014) (owner: 10Jcrespo) [16:28:12] jouncebot: now [16:28:12] For the next 0 hour(s) and 31 minute(s): Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200402T1600) [16:28:28] (03CR) 10Jforrester: [C: 03+2] "PS3 adds a test so this doesn't regress." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585509 (https://phabricator.wikimedia.org/T249014) (owner: 10Jcrespo) [16:29:26] (03Merged) 10jenkins-bot: [siwiki] Change wgSitename to drop the ',' as it causes formatting issues [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585509 (https://phabricator.wikimedia.org/T249014) (owner: 10Jcrespo) [16:29:34] (03CR) 10Volans: [C: 03+2] sre.dns.netbox: pull the specific SHA1 [cookbooks] - 10https://gerrit.wikimedia.org/r/583676 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [16:30:41] !log joal@deploy1001 Started deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8] [16:30:43] (03CR) 10jerkins-bot: [V: 04-1] netbox: silently skip devices without platform [software/homer] - 10https://gerrit.wikimedia.org/r/585536 (owner: 10Volans) [16:30:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:36] (03Merged) 10jenkins-bot: sre.dns.netbox: pull the specific SHA1 [cookbooks] - 10https://gerrit.wikimedia.org/r/583676 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [16:32:24] (03PS2) 10Volans: netbox: silently skip devices without platform [software/homer] - 10https://gerrit.wikimedia.org/r/585536 [16:32:58] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T249014 [siwiki] Change wgSitename to drop the ',' (duration: 01m 07s) [16:33:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:03] T249014: Domain of sender address of Wikimedia mail notifications is set to mw1337.eqiad.wmn for emails from Sinhala Wikipedia - https://phabricator.wikimedia.org/T249014 [16:33:07] !log jforrester@deploy1001 sync-file aborted: T249014 [siwiki] Change wgSitename to drop the ',' (duration: 00m 00s) [16:33:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:16] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 05s) [16:34:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:54] !log volans@cumin1001 START - Cookbook sre.dns.netbox [16:34:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:28] (03PS2) 10ArielGlenn: remove capability to dump private tables [dumps] - 10https://gerrit.wikimedia.org/r/585535 [16:36:47] 10Operations, 10netops: fastnetmon spamming /var/log on netflow hosts leading to disk saturation - https://phabricator.wikimedia.org/T240658 (10ayounsi) Thanks to Moritz I backported (locally only for now) FNM 1.1.4 and installed it on netflow4001. `Unpacking fastnetmon (1.1.4-1~deb10u1) over (1.1.3+dfsg-8.1)... [16:37:33] !log volans@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [16:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:41:00] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 105.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [16:44:31] !log joal@deploy1001 Finished deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8] (duration: 13m 50s) [16:44:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:47:21] (03PS14) 10Mstyles: kibana: move httpd proxy authentication to a separate profile [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) [16:49:43] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.26/includes/actions/Action.php: T249162 Partially revert 'WikiPage/Article split. Rely on Article inside Action' (duration: 01m 07s) [16:49:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:49] T249162: High rate of timeouts on jsonTruncated channel upon group1 1.35.0-wmf.26 promotion - https://phabricator.wikimedia.org/T249162 [16:53:06] !log joal@deploy1001 Started deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8] [16:53:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:14] !log joal@deploy1001 Finished deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8] (duration: 00m 08s) [16:53:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:34] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "This patch LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/570348 (https://phabricator.wikimedia.org/T244222) (owner: 10Jbond) [17:00:04] halfak and accraze: Dear deployers, time to do the Services – Graphoid / Citoid / ORES deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200402T1700). [17:00:51] marxarelli: T249162 is provisionally fixed in prod; should be good to re-roll to group1. [17:00:51] T249162: High rate of timeouts on jsonTruncated channel upon group1 1.35.0-wmf.26 promotion - https://phabricator.wikimedia.org/T249162 [17:06:56] (03PS1) 10Volans: mgmt: use netbox-generated data for ulsfo [dns] - 10https://gerrit.wikimedia.org/r/585545 (https://phabricator.wikimedia.org/T233183) [17:45:21] (03PS2) 10Volans: mgmt: use netbox-generated data for ulsfo [dns] - 10https://gerrit.wikimedia.org/r/585545 (https://phabricator.wikimedia.org/T233183) [17:45:52] (03CR) 10Elukey: [C: 03+1] kibana: move httpd proxy authentication to a separate profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [17:45:57] !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@7650fbe]: Update mobileapps to 61977bd7 [17:46:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:46] shdubsh: o/ totally miss the includes in the code change, soorrrryyy [17:46:59] I promise I'll get more coffee next time before review [17:47:05] I only checked types :D [17:48:18] (03CR) 10Elukey: "@Mstyles can you run the puppet compiler and add a link?" [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [17:48:57] elukey: don't worry about it, that's what code reviews are for :) [17:49:18] !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@7650fbe]: Update mobileapps to 61977bd7 (duration: 03m 21s) [17:49:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:49:33] (03CR) 10Volans: "And here's the patch for the first includes of dynamically generated data for ulsfo, ready for when we want to flip the switch!" [dns] - 10https://gerrit.wikimedia.org/r/585545 (https://phabricator.wikimedia.org/T233183) (owner: 10Volans) [17:50:09] shdubsh: I see that maryum already uploaded a new version, that looks way better. [17:50:18] if pcc agrees we should be close to merge [17:50:35] ack! [17:50:53] elukey: great! [17:51:41] maryum: one thing to notice is that this will also affect logstash for labs, so let's remember to check that when this gets merged [17:51:56] elukey: noted [17:54:22] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 79.32 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [17:56:54] jouncebot: next [17:56:54] In 0 hour(s) and 3 minute(s): Morning SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200402T1800) [17:57:01] jouncebot: refresh [17:57:02] I refreshed my knowledge about deployments. [18:00:04] RoanKattouw, Niharika, and Urbanecm: May I have your attention please! Morning SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200402T1800) [18:00:04] MatmaRex: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:28] I can SWAT today! [18:00:35] hi MatmaRex [18:00:52] hi. thank you [18:02:12] (03CR) 10Mstyles: "https://puppet-compiler.wmflabs.org/compiler1003/21683/" [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [18:02:56] MatmaRex: +2'ed both, will ping you once they'll be ready to test [18:07:55] longma: We should wait for the SWAT to be over, though. ;-) [18:10:42] yeah good idea [18:16:29] MatmaRex: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/585554 is ready for you to test at mwdebug1001 [18:17:14] thanks. looking [18:19:34] Urbanecm: looks good [18:19:40] thanks, syncing to prod [18:22:08] !log urbanecm@deploy1001 Synchronized php-1.35.0-wmf.26/extensions/VisualEditor/modules/ve-mw: SWAT: 94ded03: Fix issues with treating section "numbers" as integers (T248795; T248968; T249112) (duration: 01m 10s) [18:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:17] T248795: Section 0 (lead) uses first line of wikitext as section title in NWE summary - https://phabricator.wikimedia.org/T248795 [18:22:17] T249112: Section headings no longer in edit summary - https://phabricator.wikimedia.org/T249112 [18:22:17] T248968: Editor jumps on top of the page when using VisualEditor - https://phabricator.wikimedia.org/T248968 [18:23:28] MatmaRex: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/MobileFrontend/+/585555 is available for you to test at mwdebug1001 [18:24:58] looking [18:25:56] thanks [18:26:03] Urbanecm: also looks good [18:26:09] thanks, syncing to prod! [18:27:51] !log urbanecm@deploy1001 Synchronized php-1.35.0-wmf.26/extensions/MobileFrontend/: SWAT: 4e2a092: EditorGateway: Fix handling of null sectionId (T249169) (duration: 01m 09s) [18:27:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:57] T249169: Can't edit page after switching editors on mobile phone - https://phabricator.wikimedia.org/T249169 [18:28:17] MatmaRex: all done! [18:28:27] James_F: longma: I'm done [18:28:41] thanks Urbanecm [18:28:48] happy to help! [18:28:55] thanks Urbanecm [18:31:47] 10Operations, 10Phatality, 10observability: Deploying "Phatality" plugin for Kibana invokes oom-killer on logstash::collector nodes - https://phabricator.wikimedia.org/T237706 (10Krinkle) [18:32:03] James_F: ready? [18:33:39] Always! [18:33:41] Also, yes. [18:33:54] :) [18:35:13] (03PS1) 10Jeena Huneidi: group1 wikis to 1.35.0-wmf.26 refs T247773 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585573 [18:35:15] (03CR) 10Jeena Huneidi: [C: 03+2] group1 wikis to 1.35.0-wmf.26 refs T247773 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585573 (owner: 10Jeena Huneidi) [18:36:26] (03Merged) 10jenkins-bot: group1 wikis to 1.35.0-wmf.26 refs T247773 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585573 (owner: 10Jeena Huneidi) [18:36:46] oh I thought it would log. [18:37:01] !log rolling group1 to 1.35.0-wmf.26 [18:37:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:38:12] !log jhuneidi@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26 refs T247773 [18:38:15] longma: Looks OK to me so far. [18:38:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:38:18] T247773: 1.35.0-wmf.26 deployment blockers - https://phabricator.wikimedia.org/T247773 [18:38:54] scap is still running [18:39:18] longma: Did you do a full scap? [18:39:18] !log jhuneidi@deploy1001 Synchronized php: group1 wikis to 1.35.0-wmf.26 refs T247773 (duration: 01m 05s) [18:39:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:29] no it's done now [18:39:33] Oh, right, yeah [18:41:03] Hmm. [18:41:14] Special:ExportTranlsations errors? [18:41:33] Shall we wait a bit longer? [18:41:37] But all from group0. [18:41:41] oh [18:41:54] And it was a spike previously. [18:41:58] Never mind. [18:42:23] Overall error rate looks fine. [18:42:37] cool [18:42:47] To group2 at the turn of the hour? [18:43:00] Yes [18:43:35] WFM. [18:51:20] (03PS3) 10Andrew Bogott: Remove hosts hiera for cloudmetrics1001 and 1002 [puppet] - 10https://gerrit.wikimedia.org/r/585508 (https://phabricator.wikimedia.org/T249058) [18:51:57] (03CR) 10jerkins-bot: [V: 04-1] Remove hosts hiera for cloudmetrics1001 and 1002 [puppet] - 10https://gerrit.wikimedia.org/r/585508 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [18:55:58] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove old OpenStack config and manifests - https://phabricator.wikimedia.org/T249058 (10Andrew) [18:56:30] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove old OpenStack config and manifests - https://phabricator.wikimedia.org/T249058 (10Andrew) [18:58:40] (03PS1) 10Andrew Bogott: Remove hosts hiera for cloudmetrics1001 and 1002 [puppet] - 10https://gerrit.wikimedia.org/r/585576 (https://phabricator.wikimedia.org/T249058) [19:00:05] dduvall and longma: (Dis)respected human, time to deploy Mediawiki train - American Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200402T1900). Please do the needful. [19:00:25] longma: Ready from my end. [19:00:28] okay [19:00:51] !log promoting all to 1.35.0-wmf.26 [19:00:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:32] (03PS1) 10Jeena Huneidi: all wikis to 1.35.0-wmf.26 refs T247773 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585577 [19:01:34] (03CR) 10Jeena Huneidi: [C: 03+2] all wikis to 1.35.0-wmf.26 refs T247773 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585577 (owner: 10Jeena Huneidi) [19:01:42] (03PS4) 10Andrew Bogott: Remove hosts hiera for cloudmetrics1001 and 1002 [puppet] - 10https://gerrit.wikimedia.org/r/585508 (https://phabricator.wikimedia.org/T249058) [19:01:44] (03PS2) 10Andrew Bogott: Remove hosts hiera for labstore1004 and labstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/585576 (https://phabricator.wikimedia.org/T249058) [19:02:42] (03Merged) 10jenkins-bot: all wikis to 1.35.0-wmf.26 refs T247773 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585577 (owner: 10Jeena Huneidi) [19:03:53] (03CR) 10Andrew Bogott: [C: 03+2] Remove hosts hiera for cloudmetrics1001 and 1002 [puppet] - 10https://gerrit.wikimedia.org/r/585508 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [19:05:03] !log jhuneidi@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.26 refs T247773 [19:05:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:05:09] T247773: 1.35.0-wmf.26 deployment blockers - https://phabricator.wikimedia.org/T247773 [19:08:10] PROBLEM - Logstash Elasticsearch indexing errors on icinga1001 is CRITICAL: 11.82 ge 8 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash [19:09:44] longma: I was about to say that all looks fine, but apparently the logstash ES is having issues. [19:09:58] RECOVERY - Logstash Elasticsearch indexing errors on icinga1001 is OK: (C)8 ge (W)1 ge 0.2708 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash [19:10:08] well there is the recovery :P [19:10:14] Yeah. [19:10:23] Error rate looks OK. [19:10:33] Declare done? [19:10:37] yeah I don't see any spikes [19:10:41] 10Operations, 10MediaWiki-Debug-Logger, 10Traffic, 10Developer Productivity: noc.wikimedia.org with X-Wikimedia-Debug routes to mwdebug but host is not served there - https://phabricator.wikimedia.org/T245552 (10Krinkle) [19:10:47] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: 2020-03-01) rack/setup/install htmldumper1001.eqiad.wmnet. - https://phabricator.wikimedia.org/T245567 (10Dzahn) >>! In T245567#6012810, @Cmjohnson wrote: > @Dzahn I reimaged and am now able to login @Cmjohnson Yes, i can login now. Thank you! :) [19:11:10] Wrapping up train here. See you all again next week for another episode of "Will the train run smoothly?" [19:11:26] * James_F grins. [19:11:31] longma: kudos and thanks for running train on short notice :) [19:11:47] Thanks to James_F as well! [19:12:44] 10Operations, 10MediaWiki-Debug-Logger, 10Traffic, 10Developer Productivity: noc.wikimedia.org with X-Wikimedia-Debug routes to mwdebug but host is not served there - https://phabricator.wikimedia.org/T245552 (10Krinkle) Ah okay, so we're stuck between a rock and a hard place. * Before: We don't route XWD... [19:14:42] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:14:44] Happy to help. [19:15:18] noc.wikimedia.org , shouldn't we just phase it out ? ;) [19:15:38] 10Operations, 10MediaWiki-Debug-Logger, 10Traffic, 10Developer Productivity: noc.wikimedia.org with X-Wikimedia-Debug routes to mwdebug but host is not served there - https://phabricator.wikimedia.org/T245552 (10Krinkle) [19:16:16] hashar: maybe. most of it replaced by config-master.wikimeda.org [19:16:23] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020), 10Patch-For-Review: Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10Nuria) Approved on my end. [19:16:57] it's a collection of links and https://noc.wikimedia.org/conf/ though [19:17:03] good night :) [19:18:43] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: 2020-03-01) rack/setup/install htmldumper1001.eqiad.wmnet. - https://phabricator.wikimedia.org/T245567 (10Dzahn) @ArielGlenn htmldumper1001 is now usable. Any idea what kind of role we want on it? [19:21:52] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10Dzahn) Alright. This is a one-time installation though to test the Gerrit upgrade to 2.16 and then remove it again. But that doesn't mean there ca... [19:22:02] mutante: I think we can get contint2001 upgraded to buster eventually [19:22:02] ;) [19:22:21] probably not today, but some time next week would be possible [19:22:21] hashar: :) great! [19:22:27] sounds good [19:23:08] there will be a bunch of data to rsync around though :-\ [19:23:13] happy to do the reimage. but not today because i am also in EUrope [19:23:23] i can do that too ..with puppet [19:23:28] oh [19:23:44] !log ppchelko@deploy1001 Started deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps T248431 [19:23:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:23:50] T248431: Mobile-HTML: CSP header updates not appearing on recently re-rendered pages - https://phabricator.wikimedia.org/T248431 [19:24:25] tell me the directories to rsync, ideally on the ticket and i puppetize it to install rsyncd and open firewall etc [19:24:47] between 1001 and 2001 and back after reinstall or something [19:24:57] yeah [19:25:29] Pchelolo: hasn't the CSP change already been deployed? I already see the new value in prod. [19:26:13] bearND: it's deploying right now [19:26:22] you've just been lucky [19:27:05] I was already seeing `app://*.wikipedia.org` in `connect-src`. [19:27:36] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1081 is OK: HTTP OK: HTTP/1.0 200 OK - 22324 bytes in 7.366 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:29:58] !log hashar@deploy1001 Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) [19:30:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:10] !log hashar@deploy1001 Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 12s) [19:30:12] !log docker-pkg update on contint hosts [19:30:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:36] PROBLEM - restbase endpoints health on restbase2019 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:33:41] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) [19:34:04] 10Operations, 10LDAP-Access-Requests, 10serviceops, 10Patch-For-Review: Grant Access to Logstash to Peter(peter.ovchyn@speedandfunction.com) - https://phabricator.wikimedia.org/T249037 (10KFrancis) @jcrespo I checked with legal counsel... Could you please provide more information on what's in Logstash. If... [19:34:28] RECOVERY - restbase endpoints health on restbase2019 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/restbase [19:35:27] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.26/includes/MovePage.php: T248789 MovePage: Use correct Title when creating the null revision (duration: 00m 59s) [19:35:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:33] T248789: Page titles don't update after move unless manually purged - https://phabricator.wikimedia.org/T248789 [19:35:38] 10Operations, 10Traffic: Servers freezing across the caching cluster (November 2019) - https://phabricator.wikimedia.org/T238305 (10faidon) Ah! That's awesome to hear. May I suggest to resolve this (and the associated "upgrade firmware"?) task then, and reopen if we have another one of these? [19:38:57] !log ppchelko@deploy1001 Finished deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps T248431 (duration: 15m 13s) [19:39:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:02] T248431: Mobile-HTML: CSP header updates not appearing on recently re-rendered pages - https://phabricator.wikimedia.org/T248431 [19:39:37] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) @Dzahn For the data migrations we will... [19:42:03] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) [19:43:30] 10Operations, 10LDAP-Access-Requests, 10observability, 10serviceops, 10Patch-For-Review: Grant Access to Logstash to Peter(peter.ovchyn@speedandfunction.com) - https://phabricator.wikimedia.org/T249037 (10Dzahn) [19:43:46] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) [19:44:19] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10Dzahn) a:03Dzahn [19:44:30] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10Dzahn) 05Stalled→03Open [19:44:35] 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10Dzahn) [19:45:24] mutante: that is roughly 300G of data to move around. But I can probably shrink that volume [19:46:01] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10Dzahn) [19:46:03] 10Operations, 10Epic: Migrate all of production metal to Buster or later - https://phabricator.wikimedia.org/T247045 (10Dzahn) [19:53:44] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.26/extensions/Translate/specials/SpecialExportTranslations.php: T249258: Revert 'Special:ExportTranslations: Disallow exporting huge groups' (duration: 00m 59s) [19:53:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:50] T249258: Call to undefined method WikiPageMessageGroup::getKeys() - https://phabricator.wikimedia.org/T249258 [20:00:20] (03CR) 10RLazarus: [C: 03+2] profile::mediawiki::maintenance: Migrate pagetriage jobs to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/582933 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [20:03:16] (03CR) 10Hashar: "I need 18.06.2 on the CI instances for now." [puppet] - 10https://gerrit.wikimedia.org/r/566383 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [20:05:07] 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1023 crashed / Smart Storage Battery failure - https://phabricator.wikimedia.org/T249174 (10Jclark-ctr) @RobH @wiki_willy Can we order a few extra bbu? [20:06:23] 10Operations, 10LDAP-Access-Requests, 10observability, 10serviceops, 10Patch-For-Review: Grant Access to Logstash to Peter(peter.ovchyn@speedandfunction.com) - https://phabricator.wikimedia.org/T249037 (10AMooney) @KFrancis to my understanding. Logstash does contain PII. [20:14:48] 10Operations, 10LDAP-Access-Requests, 10observability, 10serviceops, 10Patch-For-Review: Grant Access to Logstash to Peter(peter.ovchyn@speedandfunction.com) - https://phabricator.wikimedia.org/T249037 (10bd808) >>! In T249037#6024232, @KFrancis wrote: > @jcrespo I checked with legal counsel... Could yo... [20:19:29] (03PS4) 10RLazarus: maintenance: Migrate translationnotifications jobs to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/585250 (https://phabricator.wikimedia.org/T211250) [20:19:53] (03CR) 10jerkins-bot: [V: 04-1] maintenance: Migrate translationnotifications jobs to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/585250 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [20:21:56] (03PS5) 10RLazarus: maintenance: Migrate translationnotifications jobs to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/585250 (https://phabricator.wikimedia.org/T211250) [20:30:06] 10Operations, 10LDAP-Access-Requests, 10observability, 10serviceops, 10Patch-For-Review: Grant Access to Logstash to Peter(peter.ovchyn@speedandfunction.com) - https://phabricator.wikimedia.org/T249037 (10Nuria) >This would be the same or similar NDA needed for access to the production databases as a sof... [20:33:12] (03PS3) 10Andrew Bogott: Remove hosts hiera for labstore1004 and labstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/585576 (https://phabricator.wikimedia.org/T249058) [20:33:14] (03PS1) 10Andrew Bogott: Openstack client packages: define for queens/jessie [puppet] - 10https://gerrit.wikimedia.org/r/585601 (https://phabricator.wikimedia.org/T249058) [20:37:29] (03CR) 10Andrew Bogott: "Scary but, I think, ultimately minor changes from PCC:" [puppet] - 10https://gerrit.wikimedia.org/r/585576 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [20:38:15] (03CR) 10RLazarus: "> LGTM but add removal of the now-unused template." [puppet] - 10https://gerrit.wikimedia.org/r/585250 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [20:39:42] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove old OpenStack config and manifests - https://phabricator.wikimedia.org/T249058 (10Andrew) [20:41:46] (03PS2) 10Andrew Bogott: Openstack client packages: define for queens/jessie [puppet] - 10https://gerrit.wikimedia.org/r/585601 (https://phabricator.wikimedia.org/T249058) [20:41:48] (03PS4) 10Andrew Bogott: Remove hosts hiera for labstore1004 and labstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/585576 (https://phabricator.wikimedia.org/T249058) [20:41:50] (03PS1) 10Andrew Bogott: Remove hiera host settings for cloudstore1008 and 1009 [puppet] - 10https://gerrit.wikimedia.org/r/585602 (https://phabricator.wikimedia.org/T249058) [20:43:06] (03CR) 10Brennen Bearnes: [C: 04-1] "So I'm testing this locally and I think it works except:" (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/584733 (https://phabricator.wikimedia.org/T246921) (owner: 10Jeena Huneidi) [20:44:19] (03CR) 10Andrew Bogott: [C: 03+2] "confirmed no-op by pcc:" [puppet] - 10https://gerrit.wikimedia.org/r/585602 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [20:46:11] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove old OpenStack config and manifests - https://phabricator.wikimedia.org/T249058 (10Andrew) [20:48:42] (03PS1) 10Andrew Bogott: wmcs::nfs::secondary: use latest OpenStack client packages [puppet] - 10https://gerrit.wikimedia.org/r/585605 (https://phabricator.wikimedia.org/T249058) [20:51:04] (03CR) 10Andrew Bogott: "pcc: https://puppet-compiler.wmflabs.org/compiler1003/21691/" [puppet] - 10https://gerrit.wikimedia.org/r/585605 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [20:57:09] (03CR) 10Andrew Bogott: "This might be a bad idea -- it might constitute a downgrade on Jessie. I need to check." [puppet] - 10https://gerrit.wikimedia.org/r/585576 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [20:59:54] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Alex Paskulin - https://phabricator.wikimedia.org/T249272 (10apaskulin) [21:19:57] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10aezell) I approve this access for these staff that report to me. [21:34:18] (03CR) 10Herron: "Will give this another close look tomorrow and barring issues will shoot to deploy on Monday" [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [21:41:45] 10Operations, 10Traffic, 10User-DannyS712: 503 error on enwikinews - https://phabricator.wikimedia.org/T249280 (10DannyS712) [21:42:03] 10Operations, 10Traffic, 10User-DannyS712: 503 error on enwikinews - https://phabricator.wikimedia.org/T249280 (10DannyS712) [21:44:06] 10Operations, 10Wikimedia-Mailing-lists: Request for new list - https://phabricator.wikimedia.org/T249281 (10Maryana) [22:19:30] 10Operations, 10ops-eqiad, 10DC-Ops: druid1008 missing asset tag in netbox - https://phabricator.wikimedia.org/T249286 (10RobH) [22:20:27] 10Operations, 10ops-eqsin: apply asset tags to s[12]-60[34]-eqsin - https://phabricator.wikimedia.org/T244900 (10RobH) [22:23:06] 10Operations, 10ops-ulsfo: update rack location of decom wmf5801 - https://phabricator.wikimedia.org/T249287 (10RobH) p:05Triage→03Low [22:23:14] 10Operations, 10ops-ulsfo: update rack location of decom wmf5801 - https://phabricator.wikimedia.org/T249287 (10RobH) [22:25:21] 10Operations, 10ops-ulsfo: update rack location of decom wmf5801 - https://phabricator.wikimedia.org/T249287 (10RobH) [22:25:24] 10Operations, 10netbox, 10Patch-For-Review: Netbox report check for no position set in rack - https://phabricator.wikimedia.org/T239244 (10RobH) [22:34:48] PROBLEM - Debian mirror in sync with upstream on sodium is CRITICAL: /srv/mirrors/debian is over 14 hours old. https://wikitech.wikimedia.org/wiki/Mirrors [23:00:04] RoanKattouw, Niharika, and Urbanecm: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Evening SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200402T2300). [23:00:04] DannyS712: A patch you scheduled for Evening SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:29] (03PS4) 10DannyS712: Don't try to grant `oathauth-enable` to `*` (part 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/582615 (https://phabricator.wikimedia.org/T248282) [23:00:45] I can do the SWAT [23:02:44] (03CR) 10Catrope: [C: 03+2] Don't try to grant `oathauth-enable` to `*` (part 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/582615 (https://phabricator.wikimedia.org/T248282) (owner: 10DannyS712) [23:03:36] (03Merged) 10jenkins-bot: Don't try to grant `oathauth-enable` to `*` (part 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/582615 (https://phabricator.wikimedia.org/T248282) (owner: 10DannyS712) [23:09:01] !log catrope@deploy1001 Synchronized wmf-config/CommonSettings.php: Don't try to grant 'oathauth-enable' to '*' (part 2) (T248282) (duration: 00m 58s) [23:09:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:09:08] T248282: Never try to grant `oathauth-enable` to `*` - https://phabricator.wikimedia.org/T248282 [23:50:28] RoanKattouw can we fit https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/FlaggedRevs/+/585618/ into the current SWAT window? [23:50:30] See T249277 [23:50:30] T249277: Can't reject changes: improper error message about blank diff - https://phabricator.wikimedia.org/T249277 [23:52:22] Sure [23:53:38] !log Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary. [23:53:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:57:58] Given a +2 by @catrope; standing by to test that the fix works on the beta cluster