[00:10:07] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10Krinkle) @hashar I noticed that the `zuul` com... [00:11:57] (03CR) 10Krinkle: [C: 03+2] dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [00:12:58] (03Merged) 10jenkins-bot: dblists: Remove "do not modify" note from all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593968 (owner: 10Urbanecm) [02:00:07] RECOVERY - Check systemd state on boron is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:05:37] PROBLEM - Check systemd state on boron is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [05:03:14] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 - https://phabricator.wikimedia.org/T251639 (10Marostegui) a:05jcrespo→03Papaul [05:03:28] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 - https://phabricator.wikimedia.org/T251639 (10Marostegui) >>! In T251639#6101118, @RobH wrote: > @jcrespo or @Marostegui: > > The racking details from the ordering task only list 4 hosts, but we ended... [05:12:51] (03PS1) 10Marostegui: mariadb: Add db213[6-9] and db2140 as spares [puppet] - 10https://gerrit.wikimedia.org/r/593974 (https://phabricator.wikimedia.org/T251639) [05:16:40] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: 31st May) rack/setup/install db114[1-9] - https://phabricator.wikimedia.org/T251614 (10Marostegui) [05:40:45] (03PS4) 10KartikMistry: Adjust ContentTranslation MT threshold for Chinese Wikipedia to 70% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592479 (https://phabricator.wikimedia.org/T246383) (owner: 10VulpesVulpes825) [05:41:55] (03PS5) 10KartikMistry: Adjust ContentTranslation MT threshold for Chinese Wikipedia to 70% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592479 (https://phabricator.wikimedia.org/T246383) (owner: 10VulpesVulpes825) [06:04:33] (03PS1) 10QEDK: Enable cross-project search on frwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593977 (https://phabricator.wikimedia.org/T251683) [06:30:46] (03PS6) 10DannyS712: Remove "Create a book" link on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561403 (https://phabricator.wikimedia.org/T241683) [06:30:55] (03CR) 10DannyS712: Remove "Create a book" link on enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561403 (https://phabricator.wikimedia.org/T241683) (owner: 10DannyS712) [06:31:08] (03PS7) 10DannyS712: Remove "Create a book" link on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/561403 (https://phabricator.wikimedia.org/T241683) [06:32:22] (03CR) 10DannyS712: Enable cross-project search on frwikibooks (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593977 (https://phabricator.wikimedia.org/T251683) (owner: 10QEDK) [06:34:53] (03PS1) 10Elukey: Move import_wikidata_entities_dumps timers one hour later [puppet] - 10https://gerrit.wikimedia.org/r/593979 [06:41:00] !log upload prometheus-druid-exporter 0.8-1 to stretch-wikimedia [06:41:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:01] (03CR) 10Elukey: [C: 03+2] Move import_wikidata_entities_dumps timers one hour later [puppet] - 10https://gerrit.wikimedia.org/r/593979 (owner: 10Elukey) [06:47:13] (03CR) 10Elukey: [C: 03+2] prometheus::druid_exporter: upgrade to 0.8 [puppet] - 10https://gerrit.wikimedia.org/r/593885 (owner: 10Elukey) [06:50:28] (03PS1) 10Elukey: prometheus::druid_exporter: add metrics_config_file to defaults [puppet] - 10https://gerrit.wikimedia.org/r/593983 [06:50:59] (03CR) 10Elukey: [C: 03+2] prometheus::druid_exporter: add metrics_config_file to defaults [puppet] - 10https://gerrit.wikimedia.org/r/593983 (owner: 10Elukey) [06:51:19] PROBLEM - Check systemd state on druid1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:51:34] this is me --^ [06:53:07] RECOVERY - Check systemd state on druid1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:07:53] !log execute ifdown eno1; ifup eno1 on analytics1052 - interface neg speed flapping [07:07:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:34] XioNoX: hello! --^ happened twice, I checked on asw2-a-eqiad and on the host but I don't see any clear issue of NIC/port problems, when you have time let me know if you have any idea what could be [07:10:05] doing ifdown/ifup seems to solve the issue (up to its next occurrence) [07:10:11] elukey: what's the issue? [07:11:17] XioNoX: the interface negotiated speed flaps between 100Mbs to 1000Mbps [07:11:23] (03CR) 10DannyS712: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593902 (https://phabricator.wikimedia.org/T248391) (owner: 10QEDK) [07:11:42] now it doesn't anymore after the ifdown/ifup [07:12:28] but it happened twice on the same host, so I am wondering if there is something failing (either the host's nic or the switch's port ) [07:13:12] elukey: I'd say swap the patch cable [07:13:30] if not good, try a different port on the host and/or on the switch [07:14:11] (03CR) 10Elukey: "Chris: the cr was set in review mode so "Submit" wasn't available. Since you +2ed I'll merge and try to install the os :)" [puppet] - 10https://gerrit.wikimedia.org/r/593613 (https://phabricator.wikimedia.org/T249062) (owner: 10Cmjohnson) [07:15:54] XioNoX: super thanks, I'll open a task [07:19:53] (03PS1) 10Elukey: Assign role insetup to cloudelastic100[5,6] [puppet] - 10https://gerrit.wikimedia.org/r/594093 (https://phabricator.wikimedia.org/T249062) [07:19:56] 10Operations, 10ops-codfw, 10DC-Ops: db2082 mgmt iface flapping - https://phabricator.wikimedia.org/T251724 (10Marostegui) [07:19:58] ACKNOWLEDGEMENT - Check systemd state on boron is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Muehlenhoff Host will be taken down later the day https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:19:58] ACKNOWLEDGEMENT - Check the last execution of package_builder_Clean_up_build_directory on boron is CRITICAL: CRITICAL: Status of the systemd unit package_builder_Clean_up_build_directory Muehlenhoff Host will be taken down later the day https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:23:24] !log removed lexnasser from cn=nda [07:23:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:42] !log removed Kerberos principal for lexnasser and jmorgan [07:24:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:50] !log Install 10.1.43-2 on s5 (db110) and s6 (db1131) masters in preparations for tomorrow's restart - T251154 [07:24:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:24:53] T251154: Upgrade and restart s5 and s6 primary DB master: Tue 5th May - https://phabricator.wikimedia.org/T251154 [07:25:43] 10Operations, 10DBA: Upgrade and restart s5 and s6 primary DB master: Tue 5th May - https://phabricator.wikimedia.org/T251154 (10Marostegui) 10.1.43-2 has been installed on both masters (without mysql_upgrade) and they are ready for tomorrow's restart. [07:26:21] (03PS1) 10Muehlenhoff: Track removal of kerberos principal for lexnasser [puppet] - 10https://gerrit.wikimedia.org/r/594094 [07:26:44] !log removed jmorgan from cn=wmf [07:26:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:27:19] (03CR) 10Elukey: [C: 03+1] ":(" [puppet] - 10https://gerrit.wikimedia.org/r/594094 (owner: 10Muehlenhoff) [07:27:48] (03CR) 10Kormat: [C: 03+1] "Looks good from here." [puppet] - 10https://gerrit.wikimedia.org/r/593974 (https://phabricator.wikimedia.org/T251639) (owner: 10Marostegui) [07:28:32] (03CR) 10Elukey: [C: 03+2] Assign role insetup to cloudelastic100[5,6] [puppet] - 10https://gerrit.wikimedia.org/r/594093 (https://phabricator.wikimedia.org/T249062) (owner: 10Elukey) [07:30:30] (03PS2) 10Marostegui: mariadb: Add db213[6-9] and db2140 as spares [puppet] - 10https://gerrit.wikimedia.org/r/593974 (https://phabricator.wikimedia.org/T251639) [07:30:37] 10Operations, 10ops-eqiad: Netbox report PuppetDB PhysicalHosts critical - https://phabricator.wikimedia.org/T251725 (10ayounsi) [07:30:48] (03CR) 10Muehlenhoff: [C: 03+2] Track removal of kerberos principal for lexnasser [puppet] - 10https://gerrit.wikimedia.org/r/594094 (owner: 10Muehlenhoff) [07:30:59] !log Drop unused flagged* tables from mediawikiwiki - T248298 [07:31:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:31:04] T248298: Drop flagged revs tables on mediawikiwiki - https://phabricator.wikimedia.org/T248298 [07:32:05] (03CR) 10Marostegui: [C: 03+2] mariadb: Add db213[6-9] and db2140 as spares [puppet] - 10https://gerrit.wikimedia.org/r/593974 (https://phabricator.wikimedia.org/T251639) (owner: 10Marostegui) [07:33:16] (03PS2) 10QEDK: Enable cross-project search on frwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593977 (https://phabricator.wikimedia.org/T251683) [07:33:36] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 - https://phabricator.wikimedia.org/T251639 (10Marostegui) [07:34:38] 10Operations, 10ops-codfw, 10DBA, 10DC-Ops, 10Patch-For-Review: (Need By: 31st May) rack/setup/install db213[6-9] and db2140 - https://phabricator.wikimedia.org/T251639 (10Marostegui) @Papaul the initial puppet changes are done. From puppet side the only pending thing is; to add them to the DCHP file (if... [07:35:50] 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1005.wikimedia.org ` The log can be found in... [07:35:54] 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1005.wikimedia.org'] ` Of which those **FAILED**: ` ['cloudelastic1005.wikimedia.org'] ` [07:36:20] 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1005.wikimedia.org ` The log can be found in... [07:36:39] 10Operations, 10serviceops: en.planet.wikimedia.org - Certificate *.wikipedia.org valid until 2020-06-20 - https://phabricator.wikimedia.org/T251726 (10ayounsi) [07:37:40] 10Operations, 10serviceops: en.planet.wikimedia.org - Certificate *.wikipedia.org valid until 2020-06-20 - https://phabricator.wikimedia.org/T251726 (10ayounsi) Same for https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=phab.wmfusercontent.org&service=HTTPS-wmfusercontent `phab.wmfusercontent... [07:40:31] 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10elukey) [07:41:51] 10Operations, 10Wikimedia-Planet, 10serviceops: en.planet.wikimedia.org - Certificate *.wikipedia.org valid until 2020-06-20 - https://phabricator.wikimedia.org/T251726 (10Peachey88) [07:41:54] 10Operations, 10Maps: Maps - OSM synchronization lag - eqiad - https://phabricator.wikimedia.org/T251727 (10ayounsi) [07:42:13] ACKNOWLEDGEMENT - Maps - OSM synchronization lag - eqiad on icinga1001 is CRITICAL: 8.7e+06 ge 2.592e+05 Ayounsi https://phabricator.wikimedia.org/T251727 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1 [07:48:55] (03CR) 10DCausse: increment extra plugin to 6.5.4-wmf-9 (032 comments) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/593833 (owner: 10Mstyles) [07:49:15] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [07:49:24] 10Operations: snapshot of s3 in eqiad critical - https://phabricator.wikimedia.org/T251728 (10ayounsi) [07:49:48] ACKNOWLEDGEMENT - snapshot of s3 in eqiad on db1115 is CRITICAL: snapshot for s3 at eqiad taken more than 3 days ago: Most recent backup 2020-04-30 06:33:38 Ayounsi https://phabricator.wikimedia.org/T251728 https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [07:51:49] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for reimage', diff saved to https://phabricator.wikimedia.org/P11113 and previous config saved to /var/cache/conftool/dbconfig/20200504-075148-marostegui.json [07:51:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:56] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593327 (https://phabricator.wikimedia.org/T251466) (owner: 10Cwhite) [07:53:41] 10Operations, 10Wikimedia-Planet, 10serviceops: en.planet.wikimedia.org - Certificate *.wikipedia.org valid until 2020-06-20 - https://phabricator.wikimedia.org/T251726 (10Vgutierrez) `*.wmfusercontent.org` and `*.planet.wikimedia.org` are SANs of the unified cert. Currently we're using the LE unified cert o... [07:55:27] 10Operations, 10Wikimedia-Planet, 10serviceops: en.planet.wikimedia.org - Certificate *.wikipedia.org valid until 2020-06-20 - https://phabricator.wikimedia.org/T251726 (10Dzahn) This isn't specific to planet or about who owns planet, this is the general *.wikipedia.org cert. [07:56:48] 10Operations, 10Wikimedia-Planet, 10serviceops: Certificate *.wikipedia.org valid until 2020-06-20 - https://phabricator.wikimedia.org/T251726 (10Dzahn) [07:58:09] (03PS1) 10Marostegui: install_server: Fix regex on non-format srv [puppet] - 10https://gerrit.wikimedia.org/r/594095 [08:03:04] (03PS1) 10Jcrespo: transfer.py: Set timeout to 5 minutes [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/594096 (https://phabricator.wikimedia.org/T138562) [08:07:48] ACKNOWLEDGEMENT - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Ayounsi https://phabricator.wikimedia.org/T251729 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:11:09] PROBLEM - puppet last run on mw1407 is CRITICAL: CRITICAL: Puppet last ran 2 days ago https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:11:31] (03PS2) 10Jcrespo: transfer.py: Set timeout to 5 minutes [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/594096 (https://phabricator.wikimedia.org/T138562) [08:11:34] RECOVERY - snapshot of s3 in eqiad on db1115 is OK: Last snapshot for s3 at eqiad (db1095.eqiad.wmnet:3313) taken on 2020-05-04 06:00:47 (863 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups%23Alerting [08:13:29] (03CR) 10Jcrespo: [C: 03+2] transfer.py: Set timeout to 5 minutes [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/594096 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo) [08:14:43] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:14:50] (03PS1) 10Elukey: Add jetty open connections count to Druid exporter's metrics list [puppet] - 10https://gerrit.wikimedia.org/r/594097 [08:16:57] RECOVERY - puppet last run on mw1407 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun [08:17:23] !log Deploy schema change on s5 codfw - T251188 [08:17:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:27] T251188: ipb_address_unique has an extra column in the code but not in production - https://phabricator.wikimedia.org/T251188 [08:20:42] !log add 50G to prometheus-ops on prometheus100[34] [08:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:19] (03PS1) 10Jcrespo: mariadb-backups: Update transfer.py to HEAD [puppet] - 10https://gerrit.wikimedia.org/r/594099 (https://phabricator.wikimedia.org/T138562) [08:24:36] (03CR) 10jerkins-bot: [V: 04-1] mariadb-backups: Update transfer.py to HEAD [puppet] - 10https://gerrit.wikimedia.org/r/594099 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo) [08:25:27] 10Operations, 10Maps: Maps - OSM synchronization lag - eqiad - https://phabricator.wikimedia.org/T251727 (10Gehel) 05Open→03Resolved a:03Gehel Yep, expired downtime, see T249086. I've added a note to the runbook. [08:32:26] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "This LGTM. Will wait for Bryan/Brooke to approve before merging." [puppet] - 10https://gerrit.wikimedia.org/r/593969 (owner: 10Reedy) [08:33:55] (03PS2) 10Jcrespo: mariadb-backups: Update transfer.py to HEAD [puppet] - 10https://gerrit.wikimedia.org/r/594099 (https://phabricator.wikimedia.org/T138562) [08:36:16] (03PS1) 10Dzahn: icinga: replace check_ssl_http with check_ssl_http_letsencrypt [puppet] - 10https://gerrit.wikimedia.org/r/594103 (https://phabricator.wikimedia.org/T251726) [08:36:31] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Certificate *.wikipedia.org valid until 2020-06-20 - https://phabricator.wikimedia.org/T251726 (10Dzahn) [08:37:25] 10Operations, 10ops-eqiad: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1005.wikimedia.org'] ` Of which those **FAILED**: ` ['cloudelastic1005.wikimedia.org'] ` [08:37:45] (03CR) 10Dzahn: [C: 03+2] httpbb: add tests for miscweb sites [puppet] - 10https://gerrit.wikimedia.org/r/592883 (https://phabricator.wikimedia.org/T247650) (owner: 10Dzahn) [08:40:25] (03CR) 10Jcrespo: [C: 03+2] mariadb-backups: Update transfer.py to HEAD [puppet] - 10https://gerrit.wikimedia.org/r/594099 (https://phabricator.wikimedia.org/T138562) (owner: 10Jcrespo) [08:42:56] 10Operations, 10Traffic: wikiworkshop.org has Facebook button, external statcounter etc - https://phabricator.wikimedia.org/T251732 (10Dzahn) [08:43:14] 10Operations, 10Traffic: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Dzahn) [08:43:50] 10Operations, 10Privacy Engineering, 10Traffic: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Dzahn) [08:45:00] 10Operations, 10Privacy Engineering, 10Traffic, 10Patch-For-Review: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Dzahn) curl -I https://wikiworkshop.org/2020 location: http://wikiworkshop.org/2020/ [08:49:48] (03CR) 10Jbond: [C: 03+1] "lgtm" [software/spicerack] - 10https://gerrit.wikimedia.org/r/593543 (owner: 10Volans) [08:50:54] !log configure BGP peering with AS132203 [08:50:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:01] (03CR) 10Jbond: [C: 03+1] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/593544 (owner: 10Volans) [08:51:09] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview - https://phabricator.wikimedia.org/T243056 (10Dzahn) >>! In T243056#6103056, @Legoktm wrote: > This dump is static/an archive and will never increase. Alright,... [08:51:18] (03CR) 10Volans: [C: 03+2] netbox: add support for RW and RO tokens [software/spicerack] - 10https://gerrit.wikimedia.org/r/593543 (owner: 10Volans) [08:51:22] (03CR) 10Ema: [C: 03+2] ATS: add SystemTap probe for cacheable responses [puppet] - 10https://gerrit.wikimedia.org/r/593735 (https://phabricator.wikimedia.org/T251537) (owner: 10Ema) [08:51:38] (03CR) 10Volans: [C: 03+2] netbox: expose the pynetbox API object [software/spicerack] - 10https://gerrit.wikimedia.org/r/593544 (owner: 10Volans) [08:52:04] 10Operations, 10Privacy Engineering, 10Research, 10Traffic, 10Patch-For-Review: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Peachey88) [08:52:33] (03CR) 10Ema: [V: 03+2 C: 03+2] Makefile: fix 'atskafka' target, add 'test' [software/atskafka] - 10https://gerrit.wikimedia.org/r/593888 (owner: 10Ema) [08:54:29] 10Operations, 10ops-eqiad: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1006.wikimedia.org ` The log can be found in `/var/log/wmf-auto-re... [08:55:55] (03CR) 10Elukey: [C: 03+2] Add jetty open connections count to Druid exporter's metrics list [puppet] - 10https://gerrit.wikimedia.org/r/594097 (owner: 10Elukey) [08:56:13] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview - https://phabricator.wikimedia.org/T243056 (10MoritzMuehlenhoff) >>! In T243056#6103700, @Dzahn wrote: >>>! In T243056#6103056, @Legoktm wrote: >> This dump is s... [08:57:20] (03Merged) 10jenkins-bot: netbox: add support for RW and RO tokens [software/spicerack] - 10https://gerrit.wikimedia.org/r/593543 (owner: 10Volans) [08:57:43] (03PS3) 10Vgutierrez: Release 8.0.7-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/593749 [08:57:45] (03Merged) 10jenkins-bot: netbox: expose the pynetbox API object [software/spicerack] - 10https://gerrit.wikimedia.org/r/593544 (owner: 10Volans) [08:57:48] (03PS3) 10Ema: Add pprof [software/atskafka] - 10https://gerrit.wikimedia.org/r/593892 [08:57:50] (03PS2) 10Ema: Add license and copyright notices [software/atskafka] - 10https://gerrit.wikimedia.org/r/593894 [08:59:37] (03Abandoned) 10Ema: Add basic prometeus integration [software/atskafka] - 10https://gerrit.wikimedia.org/r/593891 (owner: 10Ema) [09:04:16] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/593542 (https://phabricator.wikimedia.org/T233020) (owner: 10JMeybohm) [09:07:02] (03PS4) 10Vgutierrez: Release 8.0.7-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/593749 [09:10:48] 10Operations, 10ops-eqiad: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1006.wikimedia.org'] ` Of which those **FAILED**: ` ['cloudelastic1006.wikimedia.org'] ` [09:11:06] 10Operations, 10ops-eqiad: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` cloudelastic1005.wikimedia.org ` The log can be found in `/var/log/wmf-auto-re... [09:11:16] (03CR) 10Vgutierrez: [C: 03+1] json: avoid panic on malformed fields [software/atskafka] - 10https://gerrit.wikimedia.org/r/593889 (owner: 10Ema) [09:11:35] (03CR) 10Ema: [C: 03+1] Release 8.0.7-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/593749 (owner: 10Vgutierrez) [09:13:05] (03PS1) 10Paladox: Phabricator: support undef as a value to ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/594107 [09:13:07] 10Operations: snapshot of s3 in eqiad critical - https://phabricator.wikimedia.org/T251728 (10jcrespo) 05Open→03Resolved > has been alerting for 1 days On Sunday. This is now resolved. [09:13:42] (03PS2) 10Ema: json: avoid 'index out of range' panic on malformed fields [software/atskafka] - 10https://gerrit.wikimedia.org/r/593889 [09:13:55] (03CR) 10Kormat: [C: 03+1] "Nice catch, looks good." [puppet] - 10https://gerrit.wikimedia.org/r/594095 (owner: 10Marostegui) [09:14:31] (03PS2) 10Paladox: Phabricator: support undef as a value to ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/594107 [09:14:43] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [09:14:45] (03CR) 10Ema: [C: 03+2] json: avoid 'index out of range' panic on malformed fields [software/atskafka] - 10https://gerrit.wikimedia.org/r/593889 (owner: 10Ema) [09:15:46] (03CR) 10Marostegui: [C: 03+2] install_server: Fix regex on non-format srv [puppet] - 10https://gerrit.wikimedia.org/r/594095 (owner: 10Marostegui) [09:16:36] (03PS2) 10Arturo Borrero Gonzalez: aptrepo: kubeadm-k8s: create versioned components [puppet] - 10https://gerrit.wikimedia.org/r/593499 (https://phabricator.wikimedia.org/T250866) [09:16:38] (03PS2) 10Ema: Add tests for atskafka.go [software/atskafka] - 10https://gerrit.wikimedia.org/r/593890 [09:16:40] (03PS4) 10Ema: Add pprof [software/atskafka] - 10https://gerrit.wikimedia.org/r/593892 [09:16:42] (03PS3) 10Ema: Add license and copyright notices [software/atskafka] - 10https://gerrit.wikimedia.org/r/593894 [09:18:58] 10Operations, 10Privacy Engineering, 10Research, 10Traffic, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10Aklapper) [09:22:55] !log reimaging db1101 to buster T250666 [09:22:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:59] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [09:23:43] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22728 bytes in 0.262 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [09:26:01] (03CR) 10Vgutierrez: [C: 03+2] Release 8.0.7-1wm2 [debs/trafficserver] - 10https://gerrit.wikimedia.org/r/593749 (owner: 10Vgutierrez) [09:29:15] (03CR) 10Dzahn: "I think we might as well just set it to ::1 in cloud Hiera and achieve the same result. but also not a strong opinion against it." [puppet] - 10https://gerrit.wikimedia.org/r/594107 (owner: 10Paladox) [09:30:14] (03PS1) 10Kormat: install_server: Allow reimage of db1101 [puppet] - 10https://gerrit.wikimedia.org/r/594113 (https://phabricator.wikimedia.org/T250666) [09:30:59] (03CR) 10Jbond: apereo_cas: add more timeout values (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587515 (owner: 10Jbond) [09:31:35] (03CR) 10Marostegui: [C: 03+1] install_server: Allow reimage of db1101 [puppet] - 10https://gerrit.wikimedia.org/r/594113 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [09:32:14] 10Operations, 10ops-eqiad: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10elukey) @Cmjohnson, @jclark-ctr I am unable to DHCP 1005/1006, I don't see any DHCP REQUEST landing to either install1003/2003. I checked 1005's system config and I noticed that the N... [09:32:20] (03CR) 10Kormat: [C: 03+2] install_server: Allow reimage of db1101 [puppet] - 10https://gerrit.wikimedia.org/r/594113 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [09:34:51] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [09:35:01] (03PS1) 10Kormat: install_server: switch db1101 to buster [puppet] - 10https://gerrit.wikimedia.org/r/594115 (https://phabricator.wikimedia.org/T250666) [09:35:22] (03PS1) 10Ema: Filter logs by regular expression [software/atskafka] - 10https://gerrit.wikimedia.org/r/594116 (https://phabricator.wikimedia.org/T237993) [09:35:50] 10Operations, 10MediaWiki-extensions-CodeReview, 10Patch-For-Review: Set up static-codereview.wikimedia.org to host static HTML dump of CodeReview - https://phabricator.wikimedia.org/T243056 (10Dzahn) We can add a second virtual disk and mount that but growing the existing disk is not advisable. [09:37:09] (03CR) 10Muehlenhoff: [C: 03+2] package-builder: Use GBP_PBUILDER_ variables [puppet] - 10https://gerrit.wikimedia.org/r/593542 (https://phabricator.wikimedia.org/T233020) (owner: 10JMeybohm) [09:37:29] (03CR) 10Marostegui: [C: 03+1] install_server: switch db1101 to buster [puppet] - 10https://gerrit.wikimedia.org/r/594115 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [09:38:01] (03CR) 10Kormat: [C: 03+2] install_server: switch db1101 to buster [puppet] - 10https://gerrit.wikimedia.org/r/594115 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [09:40:09] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22731 bytes in 0.273 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [09:43:41] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/593499 (https://phabricator.wikimedia.org/T250866) (owner: 10Arturo Borrero Gonzalez) [09:45:13] 10Operations, 10Analytics, 10Traffic, 10Patch-For-Review: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993 (10ema) >>! In T237993#6074940, @elukey wrote: > - the HTTP status `000` seems to be used for clients that have some trouble doing a HTTP request to ats-tls, without ev... [09:46:13] !log upload trafficserver 8.0.7-1wm2 to apt.wm.o (buster) [09:46:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:06] (03PS1) 10Ayounsi: Assign IPs for SingTel peering [dns] - 10https://gerrit.wikimedia.org/r/594118 (https://phabricator.wikimedia.org/T251224) [09:50:27] (03PS2) 10Ayounsi: Assign IPs for SingTel peering [dns] - 10https://gerrit.wikimedia.org/r/594118 (https://phabricator.wikimedia.org/T251224) [09:52:00] (03CR) 10Ayounsi: [C: 03+2] Assign IPs for SingTel peering [dns] - 10https://gerrit.wikimedia.org/r/594118 (https://phabricator.wikimedia.org/T251224) (owner: 10Ayounsi) [10:01:12] (03PS7) 10Ema: vcl: 10M cutoff for the 'exp' admission policy [puppet] - 10https://gerrit.wikimedia.org/r/589342 (https://phabricator.wikimedia.org/T249809) [10:04:14] (03PS8) 10Ema: vcl: 10M cutoff for the 'exp' admission policy [puppet] - 10https://gerrit.wikimedia.org/r/589342 (https://phabricator.wikimedia.org/T249809) [10:07:21] (03PS9) 10Ema: vcl: 10M cutoff for the 'exp' admission policy [puppet] - 10https://gerrit.wikimedia.org/r/589342 (https://phabricator.wikimedia.org/T249809) [10:10:22] (03CR) 10Ema: "updated pcc: https://puppet-compiler.wmflabs.org/compiler1002/22265/" [puppet] - 10https://gerrit.wikimedia.org/r/589342 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [10:12:15] 10Operations, 10ops-eqiad: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudelastic1005.wikimedia.org'] ` Of which those **FAILED**: ` ['cloudelastic1005.wikimedia.org'] ` [10:17:16] (03CR) 10Vgutierrez: [C: 03+1] vcl: 10M cutoff for the 'exp' admission policy [puppet] - 10https://gerrit.wikimedia.org/r/589342 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [10:23:26] (03PS1) 10Paladox: Phabricator and Gerrit: Only install profile::waf::apache2::administrative on production [puppet] - 10https://gerrit.wikimedia.org/r/594124 [10:23:57] (03CR) 10jerkins-bot: [V: 04-1] Phabricator and Gerrit: Only install profile::waf::apache2::administrative on production [puppet] - 10https://gerrit.wikimedia.org/r/594124 (owner: 10Paladox) [10:24:10] (03PS2) 10Paladox: Phabricator and Gerrit: Only install profile::waf::apache2::administrative on production [puppet] - 10https://gerrit.wikimedia.org/r/594124 [10:24:52] (03CR) 10Ema: [C: 03+2] vcl: 10M cutoff for the 'exp' admission policy [puppet] - 10https://gerrit.wikimedia.org/r/589342 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [10:25:03] (03PS3) 10Paladox: Phab & Gerrit: Only install *::apache2::administrative on production [puppet] - 10https://gerrit.wikimedia.org/r/594124 [10:26:36] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] aptrepo: kubeadm-k8s: create versioned components [puppet] - 10https://gerrit.wikimedia.org/r/593499 (https://phabricator.wikimedia.org/T250866) (owner: 10Arturo Borrero Gonzalez) [10:29:38] 10Operations, 10Security-Team: apache modsec rules deployment with scap - https://phabricator.wikimedia.org/T224887 (10Dzahn) `include ::profile::waf::apache2::administrative` breaks the puppet roles for both Phabricator and Gerrit in cloud because it adds scap deployment from a repo that is not available to... [10:30:04] jan_drewniak: It is that lovely time of the day again! You are hereby commanded to deploy Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T1030). [10:30:32] !log running `aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished` to cleanup buster-wikimedia|thirdparty/kubeadm-k8s (T250866) [10:30:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:36] T250866: Stage packages for upstream kubeadm v1.16.9 to use in Toolforge - https://phabricator.wikimedia.org/T250866 [10:31:56] (03PS4) 10Dzahn: Phab & Gerrit: Only install *::apache2::administrative on production [puppet] - 10https://gerrit.wikimedia.org/r/594124 (https://phabricator.wikimedia.org/T224887) (owner: 10Paladox) [10:32:44] (03PS1) 10Ema: vcl: pass fe_mem_gb to vcl_config [puppet] - 10https://gerrit.wikimedia.org/r/594126 (https://phabricator.wikimedia.org/T249809) [10:33:46] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime [10:33:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:50] (03PS5) 10Dzahn: Phab & Gerrit: Only install *::apache2::administrative on production [puppet] - 10https://gerrit.wikimedia.org/r/594124 (https://phabricator.wikimedia.org/T224887) (owner: 10Paladox) [10:36:06] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594128 (https://phabricator.wikimedia.org/T128546) [10:36:13] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/22267/" [puppet] - 10https://gerrit.wikimedia.org/r/594124 (https://phabricator.wikimedia.org/T224887) (owner: 10Paladox) [10:36:15] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:36:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:40] (03PS2) 10Ema: Filter logs by regular expression [software/atskafka] - 10https://gerrit.wikimedia.org/r/594116 (https://phabricator.wikimedia.org/T237993) [10:37:40] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594128 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:37:44] 10Operations, 10netops, 10Sustainability (Incident Prevention): D1<->D8 VC link failure - https://phabricator.wikimedia.org/T251663 (10ayounsi) [10:37:54] 10Operations, 10Analytics: systemd::syslog conf should use :programname equals instead of startswith - https://phabricator.wikimedia.org/T251606 (10elukey) Maybe we could add a flag to use `programname` selectively and apply to analytics timers? Then if nothing break the rest of the timers could be migrated by... [10:38:02] (03CR) 10Muehlenhoff: apereo_cas: add more timeout values (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/587515 (owner: 10Jbond) [10:38:33] (03PS1) 10Arturo Borrero Gonzalez: aptrepo: kubeadm-k8s: fix docker.io component [puppet] - 10https://gerrit.wikimedia.org/r/594129 (https://phabricator.wikimedia.org/T250866) [10:38:38] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594128 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:39:38] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: kubeadm-k8s: fix docker.io component [puppet] - 10https://gerrit.wikimedia.org/r/594129 (https://phabricator.wikimedia.org/T250866) (owner: 10Arturo Borrero Gonzalez) [10:39:50] !log rolling upgrade of ATS to version 8.0.7-1wm3 [10:39:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:41:16] (03PS2) 10Arturo Borrero Gonzalez: aptrepo: kubeadm-k8s: fix docker.io component [puppet] - 10https://gerrit.wikimedia.org/r/594129 (https://phabricator.wikimedia.org/T250866) [10:42:05] (03PS1) 10WMDE-Fisch: Enable talk page resolution suggestion on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594131 (https://phabricator.wikimedia.org/T251744) [10:43:07] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:594128| Bumping portals to master (563985)]] (duration: 01m 29s) [10:43:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:12] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:594128| Bumping portals to master (563985)]] (duration: 01m 05s) [10:44:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:51] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] aptrepo: kubeadm-k8s: fix docker.io component [puppet] - 10https://gerrit.wikimedia.org/r/594129 (https://phabricator.wikimedia.org/T250866) (owner: 10Arturo Borrero Gonzalez) [10:46:43] (03CR) 10Awight: [C: 03+1] "We should take the opportunity to tweak the config variable name..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594131 (https://phabricator.wikimedia.org/T251744) (owner: 10WMDE-Fisch) [10:49:25] !log update packages in buster-wikimedia | thirdparty/kubead-k8s-1-15 and thirdparty/kubeadm-k8s-1-16 (T250866) [10:49:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:29] T250866: Stage packages for upstream kubeadm v1.16.9 to use in Toolforge - https://phabricator.wikimedia.org/T250866 [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: May I have your attention please! European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T1100) [11:00:04] qedk, Amir1, and CFisch_WMDE: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:13] o/ [11:00:17] o/ [11:00:30] heya! [11:00:34] I added a seventh SWAT entry, because the issue was kind of urgent [11:00:49] I hope that's OK (or that someone is willing to swap) [11:00:54] maybe the satanic memory increase doesn’t have to happen today [11:00:56] o/ [11:01:08] yeah, I can go last [11:01:18] tgr: you go first please [11:01:24] thanks! [11:01:33] mine is also not urgent [11:01:49] yeah sure tgr :) [11:02:38] CFisch_WMDE: yours is labs only, it doesn't need to be at SWAT, you just need to rebase it on master [11:02:41] I merge it [11:02:46] +1 [11:03:03] (03CR) 10Ladsgroup: [C: 03+2] "labs patches don't need to go SWAT :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594131 (https://phabricator.wikimedia.org/T251744) (owner: 10WMDE-Fisch) [11:03:46] tgr: I've cherry-picked your patch as https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/594134, it conflicts with wmf.28 [11:03:52] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 (moderation enabled but nobody moderates, hence no emails get delivered) - https://phabricator.wikimedia.org/T251554 (10Adithyak1997) Actually, there are 3 list admins present. All 3 of them have gone inactive. So, the... [11:04:02] (03Merged) 10jenkins-bot: Enable talk page resolution suggestion on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594131 (https://phabricator.wikimedia.org/T251744) (owner: 10WMDE-Fisch) [11:05:15] thanks [11:05:23] Urbanecm: Can you review the config patches? [11:05:26] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 (moderation enabled but nobody moderates, hence no emails get delivered) - https://phabricator.wikimedia.org/T251554 (10Dzahn) No problem, we can reset the password after new admins have been found. Let us know the new... [11:05:46] Amir1: sure! [11:05:51] I can do the ones from qedk while it's merging [11:05:57] tgr: could you manually upload a cherry-pick for .28 as well please? [11:06:41] is .28 affected? [11:07:03] let me check [11:07:06] yes, cswiki (group2) is at .28 [11:07:29] ok, will do that [11:08:13] someone else do the config patches then, please [11:08:20] on it :) [11:08:30] cool! [11:08:59] (03PS7) 10Urbanecm: Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [11:09:15] The CFisch_WMDE change is rebased on deploy1001 [11:09:21] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [11:09:29] \o/ thx Amir1 [11:09:39] (03CR) 10Urbanecm: [C: 03+1] "PS7 is just optipng -o7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [11:10:04] CFisch_WMDE: It gets deployed automatically to beta, I think it would take at most an hour (unless when beta is broken) [11:10:33] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593977 (https://phabricator.wikimedia.org/T251683) (owner: 10QEDK) [11:11:00] Amir1: all of the config patches have my +1. Can deploy them as well, if you want [11:11:17] Urbanecm: sure please deploy [11:11:24] okay [11:11:35] (03PS4) 10Urbanecm: Enable VisualEditor for more namespaces on vecwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592427 (https://phabricator.wikimedia.org/T250419) (owner: 10QEDK) [11:11:55] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592427 (https://phabricator.wikimedia.org/T250419) (owner: 10QEDK) [11:11:57] (03CR) 10Gilles: "Looks great, thank you for your contribution! Please add a test case with an affected image that gets fixed by this patch, to ensure that " [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/593358 (https://phabricator.wikimedia.org/T236240) (owner: 10AntiCompositeNumber) [11:12:50] (03Merged) 10jenkins-bot: Enable VisualEditor for more namespaces on vecwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592427 (https://phabricator.wikimedia.org/T250419) (owner: 10QEDK) [11:12:52] (03Abandoned) 10Muehlenhoff: Create a role for the initial installation [puppet] - 10https://gerrit.wikimedia.org/r/477283 (owner: 10Muehlenhoff) [11:13:00] (03PS8) 10Urbanecm: Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [11:13:06] (03CR) 10Urbanecm: [C: 03+2] Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [11:13:28] https://phabricator.wikimedia.org/T251457 is still blocking the wmf.30 train, I think [11:13:55] (03Merged) 10jenkins-bot: Update jvwiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593172 (https://phabricator.wikimedia.org/T251050) (owner: 10QEDK) [11:14:12] qedk: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/592427 is at mwdebug1001, could you check, please? :-) [11:14:22] Urbanecm: on it [11:15:33] (03PS1) 10Paladox: phabricator: Only use RemoteIPInternalProxy if trusted_proxies is set [puppet] - 10https://gerrit.wikimedia.org/r/594136 [11:16:26] (03PS2) 10Paladox: phabricator: Only use RemoteIPInternalProxy if trusted_proxies is set [puppet] - 10https://gerrit.wikimedia.org/r/594136 [11:16:50] Urbanecm: good to go! [11:16:53] thanks! [11:17:20] syncing [11:18:22] (03CR) 10Urbanecm: [C: 03+2] Correct typo in Greek Wikiversity logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593893 (https://phabricator.wikimedia.org/T248391) (owner: 10Diomidis Spinellis) [11:18:24] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: cc94ea7: Enable VisualEditor for more namespaces on vecwiki (T250419) (duration: 01m 07s) [11:18:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:28] T250419: Changing active namespaces of VisualEditor on VEC Wikipedia - https://phabricator.wikimedia.org/T250419 [11:18:30] qedk: first one synced [11:19:08] syncing logos [11:19:14] Urbanecm: checked. [11:19:17] jvwiki logos [11:19:19] which logos? [11:19:21] (03Merged) 10jenkins-bot: Correct typo in Greek Wikiversity logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593893 (https://phabricator.wikimedia.org/T248391) (owner: 10Diomidis Spinellis) [11:19:22] gotcha [11:20:11] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: SWAT: 3b8c618: Update jvwiki logos (T251050) (duration: 01m 05s) [11:20:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:15] T251050: Change Javanese Wikipedia logo - https://phabricator.wikimedia.org/T251050 [11:20:55] !log Purge https://en.wikipedia.org/static/images/project-logos/jvwiki*.png (T251050) [11:20:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:01] qedk: jvwiki logos should be done [11:21:12] lmk if they're not [11:21:14] Urbanecm: good to go! [11:21:29] (03PS1) 10Paladox: phabricator: Add cache::nodes hiera value to devtools/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/594138 [11:21:32] (I've actually directly synced them, as there's not much stuff to go wrong for static files) [11:22:01] Urbanecm: famous last words [11:22:25] hope not :) [11:22:36] xD [11:22:46] (03PS2) 10Paladox: phabricator: Add cache::nodes hiera value to devtools/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/594138 [11:23:22] (03CR) 10Gilles: [C: 03+2] engine.vips: fall back to tinyrgb when libpng complains [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/593920 (https://phabricator.wikimedia.org/T219569) (owner: 10AntiCompositeNumber) [11:23:24] elwikiversity going out too [11:23:38] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: Update librsvg to ≥2.42.3 - https://phabricator.wikimedia.org/T193352 (10Esanders) Thanks, sounds like this isn't an SVGO problemthen , but a librsvg problem. Those fixes are workarounds, but the correct fix here is to upgrade librsvg. [11:24:19] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/593902 with elwikiversity logos [11:24:20] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: SWAT: 64556ba: Correct typo in Greek Wikiversity logo (T248391) (duration: 01m 06s) [11:24:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:24] T248391: Update the el.wikiversity.org main logo to the correct one - https://phabricator.wikimedia.org/T248391 [11:24:36] (03CR) 10Gilles: [V: 03+2 C: 03+2] engine.vips: fall back to tinyrgb when libpng complains [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/593920 (https://phabricator.wikimedia.org/T219569) (owner: 10AntiCompositeNumber) [11:25:04] !log Purge https://en.wikipedia.org/static/images/project-logos/elwikiversity*.png (T251050) [11:25:06] qedk: done [11:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:23] Urbanecm: elwikiversity logos OK! [11:25:29] great! [11:25:37] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593977 (https://phabricator.wikimedia.org/T251683) (owner: 10QEDK) [11:25:40] (03PS1) 10Arturo Borrero Gonzalez: toolforge: kubeadmrepo: temporary unbreak depdency loop [puppet] - 10https://gerrit.wikimedia.org/r/594139 [11:25:46] (03PS3) 10Urbanecm: Enable cross-project search on frwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593977 (https://phabricator.wikimedia.org/T251683) (owner: 10QEDK) [11:25:49] (03CR) 10Urbanecm: Enable cross-project search on frwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593977 (https://phabricator.wikimedia.org/T251683) (owner: 10QEDK) [11:25:54] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593977 (https://phabricator.wikimedia.org/T251683) (owner: 10QEDK) [11:26:38] Urbanecm: can you merge the other commit in the relation chain with elwikiversity logos [11:26:50] qedk: could you add it to the calendar please? [11:26:53] (03Merged) 10jenkins-bot: Enable cross-project search on frwikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593977 (https://phabricator.wikimedia.org/T251683) (owner: 10QEDK) [11:27:08] It's not a config change, but okay will do [11:27:33] qedk: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/593977 is at mwdebug1001 [11:27:48] Urbanecm: checking [11:27:54] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: kubeadmrepo: temporary unbreak depdency loop [puppet] - 10https://gerrit.wikimedia.org/r/594139 (owner: 10Arturo Borrero Gonzalez) [11:28:36] Urbanecm: good to go! [11:29:18] syncing [11:30:06] (03PS3) 10Paladox: phabricator: Add cache::nodes hiera value to devtools/common.yaml [puppet] - 10https://gerrit.wikimedia.org/r/594138 [11:30:29] (03CR) 10Dzahn: [C: 03+2] "cloud-only" [puppet] - 10https://gerrit.wikimedia.org/r/594138 (owner: 10Paladox) [11:30:42] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 4d00236: Enable cross-project search on frwikibooks (T251683) (duration: 01m 05s) [11:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:45] T251683: Enable cross-wiki (interwiki) search for French Wikibooks - https://phabricator.wikimedia.org/T251683 [11:30:53] !log rebooting ps1-a7-codfw.mgmt.eqiad.wmnet. [11:30:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:01] (03PS2) 10Urbanecm: Update Phab task for elwikiversity logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593902 (https://phabricator.wikimedia.org/T248391) (owner: 10QEDK) [11:31:10] (03CR) 10Urbanecm: [C: 03+2] "noop" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593902 (https://phabricator.wikimedia.org/T248391) (owner: 10QEDK) [11:32:04] (03Merged) 10jenkins-bot: Update Phab task for elwikiversity logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593902 (https://phabricator.wikimedia.org/T248391) (owner: 10QEDK) [11:32:21] qedk: done [11:32:45] tgr: do you want to deploy the GE patch yourself, or do you want me to do it? [11:32:57] Urbanecm: thanks, everything looks good! [11:33:03] great! [11:33:34] Urbanecm: I can deploy it, and you can check on cswiki, if that's OK [11:33:51] sure, ping me once ready [11:38:06] !log rebooting ps1-a7-codfw.mgmt.eqiad.wmnet. [11:38:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:17] Urbanecm: it's on mwdebug1002 [11:38:23] on igt [11:38:25] *it [11:39:50] tgr: seems to work! [11:40:07] thx! [11:43:24] !log tgr@deploy1001 Synchronized php-1.35.0-wmf.28/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594137|Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 10s) [11:43:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:43:28] T251589: Newcomer tasks: mobile peek erroneously in production - https://phabricator.wikimedia.org/T251589 [11:45:39] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1099:3311 T206103 to remove tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11118 and previous config saved to /var/cache/conftool/dbconfig/20200504-114539-marostegui.json [11:45:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:45:43] T206103: recentchanges table indexes: tmp1, tmp2 and tmp3 - https://phabricator.wikimedia.org/T206103 [11:46:12] !log Remove index tmp_2 from recentchanges on db1099:3311 T206103 [11:46:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:41] !log tgr@deploy1001 Synchronized php-1.35.0-wmf.30/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594134|Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 06s) [11:46:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:27] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1099:3311 T206103 after removing tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11119 and previous config saved to /var/cache/conftool/dbconfig/20200504-114727-marostegui.json [11:47:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:36] ok, I'm done. sorry again for the last-minute addition. Amir1 should I deploy the 666 patch as well? [11:48:12] tgr: sure [11:48:15] Thank you [11:48:34] (03PS3) 10Gergő Tisza: Increase wmgMemoryLimit from 660MB to 666MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592761 (owner: 10Ladsgroup) [11:49:52] (03CR) 10Gergő Tisza: [C: 03+2] Increase wmgMemoryLimit from 660MB to 666MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592761 (owner: 10Ladsgroup) [11:50:58] (03Merged) 10jenkins-bot: Increase wmgMemoryLimit from 660MB to 666MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592761 (owner: 10Ladsgroup) [11:50:58] nothing to test there, I assume [11:51:25] Yup [11:53:03] no need to double-sync anymore, right? [11:53:19] Afaik no [11:53:56] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:592761|Increase wmgMemoryLimit from 660MB to 666MB]] (duration: 01m 06s) [11:53:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:56] all done then [11:56:40] \o/ [11:56:53] Do you want to close the swat? [12:02:12] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: TDB) install additional SSDs into prometheus100[34] - https://phabricator.wikimedia.org/T251621 (10fgiunchedi) AFAIK adding the SSDs can be done without taking the hosts out of service, if that's not the case please let me know! Once installed I'll take care of... [12:02:23] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) install additional SSDs into prometheus200[34] - https://phabricator.wikimedia.org/T251622 (10fgiunchedi) AFAIK adding the SSDs can be done without taking the hosts out of service, if that's not the case please let me know! Once installed I'll take care of... [12:02:31] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: ASAP) install additional SSDs into prometheus200[34] - https://phabricator.wikimedia.org/T251622 (10fgiunchedi) [12:02:39] 10Operations, 10ops-eqiad, 10DC-Ops: (Need By: ASAP) install additional SSDs into prometheus100[34] - https://phabricator.wikimedia.org/T251621 (10fgiunchedi) [12:03:16] (03PS1) 10JMeybohm: package-builder: Use GBP_PBUILDER_ variables in HOOKS [puppet] - 10https://gerrit.wikimedia.org/r/594142 (https://phabricator.wikimedia.org/T233020) [12:04:40] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:07:51] 10Operations, 10Analytics: systemd::syslog conf should use :programname equals instead of startswith - https://phabricator.wikimedia.org/T251606 (10fgiunchedi) IIRC the startswith is there to cater for multi-instance systemd units (e.g. prometheus, elasticsearch, etc) so they all log to the same file. Having a... [12:09:12] !log EU SWAT is done [12:09:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:24] (03PS2) 10Ema: vcl: pass fe_mem_gb to vcl_config [puppet] - 10https://gerrit.wikimedia.org/r/594126 (https://phabricator.wikimedia.org/T249809) [12:09:26] (03PS1) 10Ema: vcl: test 'exp' admission policy on two nodes [puppet] - 10https://gerrit.wikimedia.org/r/594144 (https://phabricator.wikimedia.org/T249809) [12:10:00] !log Temporary enable slow query log on db1099:3311 - T206103 [12:10:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:03] T206103: recentchanges table indexes: tmp1, tmp2 and tmp3 - https://phabricator.wikimedia.org/T206103 [12:13:36] (03CR) 10Filippo Giunchedi: "LGTM overall, see inline, thanks for working on this!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/593750 (https://phabricator.wikimedia.org/T167035) (owner: 10Cwhite) [12:14:52] (03PS1) 10Arturo Borrero Gonzalez: toolforge: factorize kubeadm repo config to a different module [puppet] - 10https://gerrit.wikimedia.org/r/594145 (https://phabricator.wikimedia.org/T251297) [12:16:58] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:20:17] Amir1: thanks! didn't know that's a thing. Have we been doing it for long? [12:20:44] yeah, I think a year ish [12:21:45] The search is not great: https://sal.toolforge.org/production?p=0&q=SWAT+done&d= [12:22:55] works fine with quotes [12:23:16] aah [12:23:18] thanks [12:23:27] (03CR) 10Volans: [C: 03+1] "LGTM, thanks for the patch, let's try it." [puppet] - 10https://gerrit.wikimedia.org/r/594142 (https://phabricator.wikimedia.org/T233020) (owner: 10JMeybohm) [12:23:53] (03PS1) 10Giuseppe Lavagetto: Add the ability to consume from kafka [software/purged] - 10https://gerrit.wikimedia.org/r/594147 (https://phabricator.wikimedia.org/T133821) [12:23:55] (03PS1) 10Giuseppe Lavagetto: Add integration tests using docker-compose [software/purged] - 10https://gerrit.wikimedia.org/r/594148 (https://phabricator.wikimedia.org/T133821) [12:24:13] https://sal.toolforge.org/production?p=0&q=%22SWAT+done%22&d= and https://sal.toolforge.org/production?p=0&q=%22SWAT+is+done%22&d= [12:26:40] (03CR) 10JMeybohm: [C: 03+2] package-builder: Use GBP_PBUILDER_ variables in HOOKS [puppet] - 10https://gerrit.wikimedia.org/r/594142 (https://phabricator.wikimedia.org/T233020) (owner: 10JMeybohm) [12:27:50] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is OK: HTTP OK: HTTP/1.0 200 OK - 22732 bytes in 0.274 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:29:02] (03PS2) 10Arturo Borrero Gonzalez: toolforge: factorize kubeadm repo config to a different module [puppet] - 10https://gerrit.wikimedia.org/r/594145 (https://phabricator.wikimedia.org/T251297) [12:32:06] (03PS3) 10Arturo Borrero Gonzalez: toolforge: factorize kubeadm repo config to a different module [puppet] - 10https://gerrit.wikimedia.org/r/594145 (https://phabricator.wikimedia.org/T251297) [12:32:08] (03PS1) 10Kormat: Revert "install_server: Allow reimage of db1101" [puppet] - 10https://gerrit.wikimedia.org/r/594153 (https://phabricator.wikimedia.org/T250666) [12:32:42] (03CR) 10Kormat: [C: 03+2] Revert "install_server: Allow reimage of db1101" [puppet] - 10https://gerrit.wikimedia.org/r/594153 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [12:33:01] (03PS1) 10Paladox: phabricator: Set default for phabricator_domain to phab.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/594154 [12:33:31] (03PS2) 10Paladox: phabricator: Set default for phabricator_domain to phab.wmflabs.org [puppet] - 10https://gerrit.wikimedia.org/r/594154 [12:34:54] (03CR) 10Vgutierrez: [C: 03+1] vcl: pass fe_mem_gb to vcl_config [puppet] - 10https://gerrit.wikimedia.org/r/594126 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [12:41:28] (03CR) 10Joal: [C: 03+1] "LGTM :)" [puppet] - 10https://gerrit.wikimedia.org/r/593979 (owner: 10Elukey) [12:42:16] (03PS2) 10Giuseppe Lavagetto: Add integration tests using docker-compose [software/purged] - 10https://gerrit.wikimedia.org/r/594148 (https://phabricator.wikimedia.org/T133821) [12:47:00] !log kormat@cumin1001 dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11120 and previous config saved to /var/cache/conftool/dbconfig/20200504-124659-kormat.json [12:47:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:03] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [12:47:39] (03PS1) 10Filippo Giunchedi: icinga: ignore SOFT state by default on /alerts [puppet] - 10https://gerrit.wikimedia.org/r/594156 [12:48:53] (03PS3) 10Paladox: phabricator: Set phabricator_domain & phabricator_altdomain for devtools [puppet] - 10https://gerrit.wikimedia.org/r/594154 [12:49:03] (03PS4) 10Arturo Borrero Gonzalez: toolforge: kubeadm: use apt::package_from_repository [puppet] - 10https://gerrit.wikimedia.org/r/594145 (https://phabricator.wikimedia.org/T251297) [12:49:25] (03PS1) 10Paladox: phabricator: Install the php zip extension [puppet] - 10https://gerrit.wikimedia.org/r/594157 [12:50:17] PROBLEM - rsyslog in codfw is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops [12:50:18] (03PS2) 10Paladox: phabricator: Install the php zip extension [puppet] - 10https://gerrit.wikimedia.org/r/594157 [12:50:25] PROBLEM - rsyslog in eqiad is failing to deliver messages on icinga1001 is CRITICAL: action=fwd_centrallog2001.codfw.wmnet:6514 https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops [12:51:19] RECOVERY - rsyslog in codfw is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=codfw+prometheus/ops [12:51:29] RECOVERY - rsyslog in eqiad is failing to deliver messages on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Rsyslog https://grafana.wikimedia.org/d/000000596/rsyslog?var-datasource=eqiad+prometheus/ops [12:53:05] (03CR) 10Muehlenhoff: "Looks good, one comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/594145 (https://phabricator.wikimedia.org/T251297) (owner: 10Arturo Borrero Gonzalez) [12:56:48] jouncebot: now [12:56:48] No deployments scheduled for the next 4 hour(s) and 3 minute(s) [12:56:51] jouncebot: nex [12:56:52] jouncebot: next [12:56:52] In 4 hour(s) and 3 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T1700) [12:56:56] (03PS4) 10Addshore: Stop setting legacy wmgWikibase(Repo/Client)Repositories for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586368 (https://phabricator.wikimedia.org/T248664) [12:57:00] (03CR) 10Alexandros Kosiaris: [C: 03+1] "Not used in prod, fine from our side" [puppet] - 10https://gerrit.wikimedia.org/r/593499 (https://phabricator.wikimedia.org/T250866) (owner: 10Arturo Borrero Gonzalez) [12:58:31] Im going to go for this mw-config cleanup ^^ as there is nothing else going on and it is only for test wikis [12:58:43] (03CR) 10Ayounsi: [C: 03+1] "The idea and URL lgtm!" [puppet] - 10https://gerrit.wikimedia.org/r/594156 (owner: 10Filippo Giunchedi) [12:58:46] (03CR) 10Addshore: [C: 03+2] Stop setting legacy wmgWikibase(Repo/Client)Repositories for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586368 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [12:59:43] (03Merged) 10jenkins-bot: Stop setting legacy wmgWikibase(Repo/Client)Repositories for test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586368 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [13:00:07] RECOVERY - Check systemd state on an-launcher1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:02:06] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T248664 Stop setting legacy wmgWikibase(Repo/Client)Repositories for TEST wikis (duration: 01m 06s) [13:02:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:10] T248664: entitysources: Directly create entitySources config for WMF "test" wikis - https://phabricator.wikimedia.org/T248664 [13:05:20] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: kubeadm: use apt::package_from_repository (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/594145 (https://phabricator.wikimedia.org/T251297) (owner: 10Arturo Borrero Gonzalez) [13:05:58] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC https://puppet-compiler.wmflabs.org/compiler1003/22274/" [puppet] - 10https://gerrit.wikimedia.org/r/594145 (https://phabricator.wikimedia.org/T251297) (owner: 10Arturo Borrero Gonzalez) [13:09:16] (03PS1) 10Paladox: phabricator: Drop phd.pid-directory as it's now uneeded [puppet] - 10https://gerrit.wikimedia.org/r/594162 [13:10:34] (03PS2) 10Paladox: phabricator: Drop phd.pid-directory as it's now uneeded [puppet] - 10https://gerrit.wikimedia.org/r/594162 [13:11:03] 10Operations, 10Analytics, 10Traffic, 10Patch-For-Review: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993 (10Ottomata) > I propose renaming it to prometheus-rdkafka-exporter and using it every Sounds great! We could then use it in node instead of https://github.com/wikimed... [13:11:47] (03PS3) 10Paladox: phabricator: Drop phd.pid-directory as it's now uneeded [puppet] - 10https://gerrit.wikimedia.org/r/594162 [13:11:53] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/594162 (owner: 10Paladox) [13:14:24] (03PS4) 10Paladox: phabricator: Drop phd.pid-directory as it's now uneeded [puppet] - 10https://gerrit.wikimedia.org/r/594162 [13:14:30] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/594162 (owner: 10Paladox) [13:15:53] (03CR) 10Ema: Add the ability to consume from kafka (0312 comments) [software/purged] - 10https://gerrit.wikimedia.org/r/594147 (https://phabricator.wikimedia.org/T133821) (owner: 10Giuseppe Lavagetto) [13:16:29] (03PS5) 10Paladox: phabricator: Drop phd.pid-directory as it's now uneeded [puppet] - 10https://gerrit.wikimedia.org/r/594162 [13:16:34] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/594162 (owner: 10Paladox) [13:17:46] (03PS6) 10Paladox: phabricator: Drop phd.pid-directory as it's now uneeded [puppet] - 10https://gerrit.wikimedia.org/r/594162 [13:17:51] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/594162 (owner: 10Paladox) [13:18:56] (03CR) 10Filippo Giunchedi: "> Patch Set 1: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/594156 (owner: 10Filippo Giunchedi) [13:20:33] (03CR) 10Ema: [C: 03+2] vcl: pass fe_mem_gb to vcl_config [puppet] - 10https://gerrit.wikimedia.org/r/594126 (https://phabricator.wikimedia.org/T249809) (owner: 10Ema) [13:20:59] (03CR) 10Filippo Giunchedi: [C: 03+2] "I'm going to boldly merge this for now, and we can followup with filtering out disabled notifications later" [puppet] - 10https://gerrit.wikimedia.org/r/594156 (owner: 10Filippo Giunchedi) [13:27:44] !log kormat@cumin1001 dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 some more after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11121 and previous config saved to /var/cache/conftool/dbconfig/20200504-132744-kormat.json [13:27:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:48] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [13:34:31] !log reimaging es2025 to buster T250666 [13:34:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:35] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [13:36:08] (03PS1) 10Mforns: reportupdater::jobs.pp: Add delay to published_cx2_translations_mysql [puppet] - 10https://gerrit.wikimedia.org/r/594169 [13:36:38] (03PS1) 10Kormat: install_server: Allow reimage of es2025 [puppet] - 10https://gerrit.wikimedia.org/r/594170 (https://phabricator.wikimedia.org/T250666) [13:37:39] (03CR) 10Marostegui: [C: 03+1] install_server: Allow reimage of es2025 [puppet] - 10https://gerrit.wikimedia.org/r/594170 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [13:37:53] (03CR) 10Kormat: [C: 03+2] install_server: Allow reimage of es2025 [puppet] - 10https://gerrit.wikimedia.org/r/594170 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [13:42:35] 10Operations, 10netops, 10observability, 10User-fgiunchedi: Upgrade LibreNMS to 1.63 - https://phabricator.wikimedia.org/T251222 (10fgiunchedi) [13:46:23] 10Operations, 10netops: Flowspec controller PoC - https://phabricator.wikimedia.org/T251767 (10ayounsi) 05Open→03Resolved p:05Triage→03Low [13:49:29] 10Operations, 10DBA: Make enabling reimaging for db hosts more humane - https://phabricator.wikimedia.org/T251392 (10Kormat) 05Open→03Resolved a:03Kormat Closing this, and opened T251768 to cover fixing the partman recipe. [13:50:40] !log kormat@cumin1001 dbctl commit (dc=all): 'Depool es2025 for reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11122 and previous config saved to /var/cache/conftool/dbconfig/20200504-135039-kormat.json [13:50:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:44] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [13:54:53] 10Operations, 10Analytics, 10Traffic, 10Patch-For-Review: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993 (10ema) >>! In T237993#6104570, @Ottomata wrote: >> I propose renaming it to prometheus-rdkafka-exporter and using it every > Sounds great! We could then use it in nod... [13:55:08] (03PS1) 10Jbond: cookbook sre.hosts.rotate-pdu-password: use request.Session and response.raise_for_status [cookbooks] - 10https://gerrit.wikimedia.org/r/594173 (https://phabricator.wikimedia.org/T246890) [14:00:58] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.33 [software/spicerack] - 10https://gerrit.wikimedia.org/r/594177 [14:03:37] 10Operations, 10Analytics, 10Traffic, 10Patch-For-Review: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993 (10Ottomata) > We need to essentially add custom metrics to the data structure dumped to disk as JSON, Wouldn't a prometheus-rdkafka-exporter expose the metrics via HTT... [14:08:03] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [14:08:46] (03PS1) 10Kormat: install_server: switch es2025 to buster [puppet] - 10https://gerrit.wikimedia.org/r/594190 (https://phabricator.wikimedia.org/T250666) [14:10:45] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.33 [software/spicerack] - 10https://gerrit.wikimedia.org/r/594177 (owner: 10Volans) [14:10:58] (03CR) 10Marostegui: [C: 03+1] install_server: switch es2025 to buster [puppet] - 10https://gerrit.wikimedia.org/r/594190 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [14:11:07] (03CR) 10Kormat: [C: 03+2] install_server: switch es2025 to buster [puppet] - 10https://gerrit.wikimedia.org/r/594190 (https://phabricator.wikimedia.org/T250666) (owner: 10Kormat) [14:12:11] 10Operations, 10ops-codfw, 10DC-Ops: db2082 mgmt iface flapping - https://phabricator.wikimedia.org/T251724 (10Papaul) p:05Triage→03Medium [14:14:31] (03PS1) 10Elukey: kerberos: return subprocess' exit code [puppet] - 10https://gerrit.wikimedia.org/r/594191 [14:15:48] !log add static nat for fran1001 - T251763 [14:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:06] (03CR) 10Elukey: [C: 03+2] kerberos: return subprocess' exit code [puppet] - 10https://gerrit.wikimedia.org/r/594191 (owner: 10Elukey) [14:17:34] (03CR) 10Joal: [C: 03+1] "LGTM - Let's merge!" [puppet] - 10https://gerrit.wikimedia.org/r/594191 (owner: 10Elukey) [14:18:04] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.33 [software/spicerack] - 10https://gerrit.wikimedia.org/r/594177 (owner: 10Volans) [14:18:30] (03CR) 10Giuseppe Lavagetto: Add the ability to consume from kafka (0312 comments) [software/purged] - 10https://gerrit.wikimedia.org/r/594147 (https://phabricator.wikimedia.org/T133821) (owner: 10Giuseppe Lavagetto) [14:19:19] !log kormat@cumin1001 dbctl commit (dc=all): 'Repool db1101:3317 fully and db1101:3318 to 75% after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11123 and previous config saved to /var/cache/conftool/dbconfig/20200504-141919-kormat.json [14:19:20] (03CR) 10Giuseppe Lavagetto: [C: 04-1] Add the ability to consume from kafka (031 comment) [software/purged] - 10https://gerrit.wikimedia.org/r/594147 (https://phabricator.wikimedia.org/T133821) (owner: 10Giuseppe Lavagetto) [14:19:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:22] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [14:20:21] (03PS1) 10Volans: Upstream release v0.0.33 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/594195 [14:21:17] (03PS2) 10Giuseppe Lavagetto: Add the ability to consume from kafka [software/purged] - 10https://gerrit.wikimedia.org/r/594147 (https://phabricator.wikimedia.org/T133821) [14:21:19] (03PS3) 10Giuseppe Lavagetto: Add integration tests using docker-compose [software/purged] - 10https://gerrit.wikimedia.org/r/594148 (https://phabricator.wikimedia.org/T133821) [14:32:12] (03PS1) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 [14:32:43] (03PS2) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) [14:39:47] (03PS1) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [14:41:33] 10Operations, 10CommRel-Specialists-Support (Apr-Jun-2020): CommRel support for FY2019-2020 Q4 DC switchover - https://phabricator.wikimedia.org/T244808 (10Elitre) a:03Trizek-WMF [14:41:34] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [14:45:03] (03PS3) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) [14:47:44] (03PS1) 10Joal: Add hadoop yarn queue to analytics sqoop scripts [puppet] - 10https://gerrit.wikimedia.org/r/594202 [14:47:52] elukey: --^ please :) [14:48:10] sure [14:50:18] !log joal@deploy1001 Started deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [3396279] [14:50:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:35] (03PS1) 10Ottomata: [WIP] Initial debian commit [debs/anaconda] (debian) - 10https://gerrit.wikimedia.org/r/594204 (https://phabricator.wikimedia.org/T251006) [14:53:23] (03CR) 10BryanDavis: [C: 03+1] Add cf-request-id as a cf new header. [puppet] - 10https://gerrit.wikimedia.org/r/593969 (owner: 10Reedy) [14:57:12] !log ppchelko@deploy1001 Started deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints [14:57:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:03] (03PS2) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [15:01:20] (03CR) 10Volans: [C: 03+2] Upstream release v0.0.33 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/594195 (owner: 10Volans) [15:02:13] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [15:02:50] !log kormat@cumin1001 START - Cookbook sre.hosts.downtime [15:02:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:20] !log kormat@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [15:05:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:26] !log joal@deploy1001 Finished deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [3396279] (duration: 15m 07s) [15:05:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:44] !log joal@deploy1001 Started deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [3396279] [15:05:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:54] !log joal@deploy1001 Finished deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [3396279] (duration: 00m 10s) [15:05:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:32] elukey: refinery is ready for sqoop when you want, if we can have the patch I wrote that'd be awesome :) [15:07:17] (03PS3) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [15:07:19] (03CR) 10Elukey: [C: 03+2] Add hadoop yarn queue to analytics sqoop scripts [puppet] - 10https://gerrit.wikimedia.org/r/594202 (owner: 10Joal) [15:08:40] (03Merged) 10jenkins-bot: Upstream release v0.0.33 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/594195 (owner: 10Volans) [15:09:21] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [15:11:16] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:11:48] !log ppchelko@deploy1001 Finished deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints (duration: 14m 36s) [15:11:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:30] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [15:12:43] !log kormat@cumin1001 dbctl commit (dc=all): 'Repool db1101:3318 fully after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11125 and previous config saved to /var/cache/conftool/dbconfig/20200504-151243-kormat.json [15:12:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:12:45] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [15:19:34] 10Operations, 10ops-eqiad: (Need by: TDB) rack/setup/install cloudelastic100[56] - https://phabricator.wikimedia.org/T249062 (10Cmjohnson) @elukey Thanks, looks like the dac cable is in the wrong nic port. This will require an on-site visit. [15:22:35] jouncebot: next [15:22:35] In 1 hour(s) and 37 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T1700) [15:22:40] I'm deploying UBNs. [15:23:03] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] Add cf-request-id as a cf new header. [puppet] - 10https://gerrit.wikimedia.org/r/593969 (owner: 10Reedy) [15:23:13] !log deploy1001: deleted old .hhvm.hhbc files moved from tin (/home/*/home-tin/.hhvm.hhbc) https://phabricator.wikimedia.org/P11126 [15:23:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:25] (03PS1) 10Muehlenhoff: Ship /etc/sysusers.d base directory for systemd-sysusers [puppet] - 10https://gerrit.wikimedia.org/r/594211 [15:23:27] (03PS1) 10Muehlenhoff: Also specify system user range for systemd-sysusers [puppet] - 10https://gerrit.wikimedia.org/r/594212 [15:25:30] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: db1140 (backup source) crashed - https://phabricator.wikimedia.org/T250602 (10Cmjohnson) a:05Cmjohnson→03Jclark-ctr @Jclark-ctr can you start the process with HPE please. [15:26:23] (03CR) 10jerkins-bot: [V: 04-1] Also specify system user range for systemd-sysusers [puppet] - 10https://gerrit.wikimedia.org/r/594212 (owner: 10Muehlenhoff) [15:26:51] !log deploy1001: deleted old .hhvm.hhbc files (/home/*/.hhvm.hhbc) https://phabricator.wikimedia.org/P11127 [15:26:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:36] (03PS4) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [15:33:18] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:33:25] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [15:34:42] !log uploaded spicerack_0.0.33-1_amd64.deb to apt.wikimedia.org stretch-wikimedia [15:34:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:52] (03PS1) 10Jforrester: Revert "dblists: Remove "do not modify" note from all.dblist" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594214 [15:35:57] 10Operations, 10Privacy Engineering, 10Research, 10Traffic, and 2 others: wikiworkshop.org has Facebook button, external statcounter, https to http redirect - https://phabricator.wikimedia.org/T251732 (10JFishback_WMF) p:05Triage→03High [15:37:28] (03PS4) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) [15:39:12] (03PS1) 10Jforrester: buildDBLists: Remove circular dependency on all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594216 [15:39:53] (03CR) 10jerkins-bot: [V: 04-1] cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [15:40:22] * James_F twiddles thumbs waiting for merge. Still! [15:40:38] hrm - .30 i/l/r/l/LBFactoryMulti:177 Unknown cluster 'cluster14' - odd. [15:40:58] ES issue? [15:41:18] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22715 bytes in 3.506 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:41:53] There's no cluster14 in db-eqiad... [15:42:02] That'd not help. [15:42:52] 10Operations, 10ops-codfw, 10DC-Ops: db2082 mgmt iface flapping - https://phabricator.wikimedia.org/T251724 (10Papaul) @Marostegui cable looks good [15:43:00] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.30/resources/src/mediawiki.diff.styles/diff.less: T250393 Follow-up I07dd6f7: Fix font size in diff (duration: 01m 05s) [15:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:03] T250393: Use monospace font (or editfont preference) for diffs - https://phabricator.wikimedia.org/T250393 [15:45:05] 10Operations, 10ops-codfw, 10DC-Ops: db2082 mgmt iface flapping - https://phabricator.wikimedia.org/T251724 (10Marostegui) Could it be the switch? Does it show the link flapping? [15:45:54] !log jforrester@deploy1001 Synchronized php-1.35.0-wmf.30/includes/libs/rdbms/database/DatabaseMysqlBase.php: T251457 rdbms: don't treat lock() as a write operation (duration: 01m 04s) [15:45:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:57] T251457: LoadBalancer: Transaction spent [n] second(s) in writes, exceeding the limit of [n] - https://phabricator.wikimedia.org/T251457 [15:46:25] liw: OK, train should now be good to roll. [15:47:48] !log kormat@cumin1001 dbctl commit (dc=all): 'Repool es2025 after reimaging T250666', diff saved to https://phabricator.wikimedia.org/P11128 and previous config saved to /var/cache/conftool/dbconfig/20200504-154747-kormat.json [15:47:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:50] T250666: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 [15:49:14] 10Operations, 10ops-codfw, 10DC-Ops: db2082 mgmt iface flapping - https://phabricator.wikimedia.org/T251724 (10Marostegui) a:05Papaul→03Marostegui @papaul from what I can see, ping works fine, it is SSH what doesn't connect. IPMI works locally and not remotely, so maybe it just needs a reboot or a passwo... [15:50:54] James_F, thank you [15:52:36] !log root@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset [15:52:36] !log root@cumin1001 END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99) [15:52:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:01] 10Operations, 10DBA, 10User-notice: Upgrade and restart s4 (commonswiki) primary database master: Tue 12th May - https://phabricator.wikimedia.org/T251502 (10Trizek-WMF) [15:53:08] !log root@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset [15:53:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:17] !log root@cumin1001 Updating IPMI password on 1 hosts - root@cumin1001 [15:53:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:40] !log root@cumin1001 END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0) [15:53:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:32] 10Operations, 10ops-codfw, 10DC-Ops: db2082 mgmt iface flapping - https://phabricator.wikimedia.org/T251724 (10Marostegui) 05Open→03Resolved Resetting the card + re-syncing the password fixed it Thanks! [15:55:19] (03PS1) 10Lars Wirzenius: group1 wikis to 1.35.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594219 [15:55:22] (03CR) 10Lars Wirzenius: [C: 03+2] group1 wikis to 1.35.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594219 (owner: 10Lars Wirzenius) [15:55:42] 10Operations, 10ops-eqiad, 10serviceops: (Need by: TBD) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10Cmjohnson) [15:56:15] (03Merged) 10jenkins-bot: group1 wikis to 1.35.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594219 (owner: 10Lars Wirzenius) [15:57:05] (03PS5) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [15:57:18] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) rack/setup/install thanos-fe200[123] - https://phabricator.wikimedia.org/T251635 (10Papaul) [15:58:14] !log liw@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.30 [15:58:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:09] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [15:59:20] !log liw@deploy1001 Synchronized php: group1 wikis to 1.35.0-wmf.30 (duration: 01m 05s) [15:59:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:14] group1 seems to have survived five minutes and survived a day last week, going to group2 [16:05:17] (03PS5) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) [16:05:29] (03PS1) 10Lars Wirzenius: group2 wikis to 1.35.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594220 [16:05:31] (03CR) 10Lars Wirzenius: [C: 03+2] group2 wikis to 1.35.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594220 (owner: 10Lars Wirzenius) [16:05:38] (03PS6) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [16:06:27] (03Merged) 10jenkins-bot: group2 wikis to 1.35.0-wmf.30 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594220 (owner: 10Lars Wirzenius) [16:07:25] (03CR) 10jerkins-bot: [V: 04-1] cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [16:07:46] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [16:08:14] !log liw@deploy1001 rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.30 [16:08:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:20] 10Operations, 10Traffic, 10Patch-For-Review: ATS: Add the ability to check if origin server responses can be cached and their lifetime to the Lua plugin - https://phabricator.wikimedia.org/T251537 (10ema) [16:08:29] 10Operations, 10Traffic, 10Patch-For-Review: ATS: Add the ability to check if origin server responses can be cached and their lifetime to the Lua plugin - https://phabricator.wikimedia.org/T251537 (10ema) p:05Triage→03Medium [16:09:43] (03PS1) 10Arturo Borrero Gonzalez: toolforge: k8s: client: fix typo in package declaration [puppet] - 10https://gerrit.wikimedia.org/r/594222 [16:09:49] (03PS6) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) [16:14:54] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) rack/setup/install thanos-fe200[123] - https://phabricator.wikimedia.org/T251635 (10Papaul) [16:14:59] (03PS1) 10Cmjohnson: Adding production dns ipv4 only kubernetes1007-1014 [dns] - 10https://gerrit.wikimedia.org/r/594224 (https://phabricator.wikimedia.org/T241850) [16:15:05] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: k8s: client: fix typo in package declaration [puppet] - 10https://gerrit.wikimedia.org/r/594222 (owner: 10Arturo Borrero Gonzalez) [16:15:50] (03CR) 10Cmjohnson: [C: 03+2] Adding production dns ipv4 only kubernetes1007-1014 [dns] - 10https://gerrit.wikimedia.org/r/594224 (https://phabricator.wikimedia.org/T241850) (owner: 10Cmjohnson) [16:16:32] (03CR) 10Ema: Add the ability to consume from kafka (031 comment) [software/purged] - 10https://gerrit.wikimedia.org/r/594147 (https://phabricator.wikimedia.org/T133821) (owner: 10Giuseppe Lavagetto) [16:16:50] 10Operations, 10ops-eqiad, 10serviceops, 10Patch-For-Review: (Need by: TBD) rack/setup/install kubernetes10[07-14].eqiad.wmnet - https://phabricator.wikimedia.org/T241850 (10Cmjohnson) [16:19:57] (03PS2) 10Mstyles: increment extra plugin to 6.5.4-wmf-9 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/593833 [16:19:59] (03PS7) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [16:21:16] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [16:22:08] (03CR) 10Mstyles: "when I ran the prepare commit script, ./debian/rules prepare_commit, it failed because it said it couldn't find the zip file in the maven " [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/593833 (owner: 10Mstyles) [16:23:26] (03PS8) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [16:25:41] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [16:26:31] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TDB) rack/setup/install thanos-fe200[123] - https://phabricator.wikimedia.org/T251635 (10Papaul) [16:26:35] (03PS1) 10Elukey: cdh::hadoop: remove extra sudo in check_hdfs_active_namenode.py [puppet] - 10https://gerrit.wikimedia.org/r/594226 [16:26:46] (03CR) 10DCausse: "> Patch Set 2:" (032 comments) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/593833 (owner: 10Mstyles) [16:27:05] (03CR) 10jerkins-bot: [V: 04-1] cdh::hadoop: remove extra sudo in check_hdfs_active_namenode.py [puppet] - 10https://gerrit.wikimedia.org/r/594226 (owner: 10Elukey) [16:27:12] ufffff [16:27:54] PROBLEM - Check systemd state on deneb is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:28:28] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22728 bytes in 0.271 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [16:30:44] (03PS1) 10Arturo Borrero Gonzalez: toolforge: k8s: client: specify buster as the distro for kubectl [puppet] - 10https://gerrit.wikimedia.org/r/594229 [16:30:48] (03PS2) 10Elukey: cdh::hadoop: remove extra sudo in check_hdfs_active_namenode.py [puppet] - 10https://gerrit.wikimedia.org/r/594226 [16:33:11] liw, thcipriani: not sure about this `PageConfigFactory:96 Not an available content version.`. maybe covered under T205936? [16:33:11] T205936: Unable to view some pages due to fatal RevisionAccessException: "Failed to load data blob from tt" - https://phabricator.wikimedia.org/T205936 [16:34:08] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install cloudceph200[123]-dev - https://phabricator.wikimedia.org/T250846 (10Papaul) [16:34:27] brennen, urf, I don't feel qualified to even guess :( [16:34:30] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: k8s: client: specify buster as the distro for kubectl [puppet] - 10https://gerrit.wikimedia.org/r/594229 (owner: 10Arturo Borrero Gonzalez) [16:37:01] (03CR) 10Elukey: [C: 03+2] cdh::hadoop: remove extra sudo in check_hdfs_active_namenode.py [puppet] - 10https://gerrit.wikimedia.org/r/594226 (owner: 10Elukey) [16:40:19] (03CR) 10Hashar: "Cas would you mind +2ing this change and the following one? :)" [debs/pynetbox] (debian) - 10https://gerrit.wikimedia.org/r/553741 (owner: 10Hashar) [16:42:30] (03PS1) 10Jgreen: add analytics.frdev.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/594235 (https://phabricator.wikimedia.org/T245755) [16:42:37] brennen: hrm, not sure about that one either, not new, but maybe worse? not happening at a high rate, but a higher-than-recent rate, certainly [16:42:58] (03PS9) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [16:43:26] (03PS1) 10Cmjohnson: Adding new kubernetes servers kubernetes100[7-9]|101[0-4] to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/594236 (https://phabricator.wikimedia.org/T241850) [16:43:42] thcipriani, liw: RevisionAccessExceptions generally, if i'm doing this right, look like they're about where they have been: https://logstash.wikimedia.org/goto/3173966f34d7cf4f20e17571d6e4fb21 [16:44:22] !log joal@deploy1001 Started deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [2252f9a] [16:44:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:29] (03CR) 10jerkins-bot: [V: 04-1] Adding new kubernetes servers kubernetes100[7-9]|101[0-4] to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/594236 (https://phabricator.wikimedia.org/T241850) (owner: 10Cmjohnson) [16:44:33] i guess my suspicion is that the level we see for .30 is essentially noise? [16:44:59] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [16:45:50] brennen: that seems plausible based on that dashboard. [16:47:13] particularly since all these errors are cataloged as bugs, in the backlog, and not happening at high rates [16:48:06] (03CR) 10Dwisehaupt: [C: 03+1] "Looks good." [dns] - 10https://gerrit.wikimedia.org/r/594235 (https://phabricator.wikimedia.org/T245755) (owner: 10Jgreen) [16:48:19] So, um, https://en.wikipedia.org/w/index.php?diff=954815861&oldid=938143566&diffmode=source loads the new monospaced diff, and then the font changes [16:49:54] DannyS712: i don't think i'm seeing that, but I have very little context for the problem... [16:51:33] disappears with safemode=1, so I'm guessing some sitewide js/css change (because people didn't like the monospaced diff?) [16:51:35] (03CR) 10Alexandros Kosiaris: [C: 03+1] Make configuration of envoy a ConfigMap [deployment-charts] - 10https://gerrit.wikimedia.org/r/582777 (https://phabricator.wikimedia.org/T244843) (owner: 10Giuseppe Lavagetto) [16:51:45] (03PS3) 10AntiCompositeNumber: engine.ghostscript: use -sstdout=%stderr with gs [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/593358 (https://phabricator.wikimedia.org/T236240) [16:52:16] (03CR) 10Hashar: [C: 03+1] "Indeed, lets upgrade Docker on the CI instances now that the source of slow down has been indentified ( T236675 )." [puppet] - 10https://gerrit.wikimedia.org/r/593806 (https://phabricator.wikimedia.org/T236675) (owner: 10Jforrester) [16:52:25] (03CR) 10AntiCompositeNumber: "> Patch Set 2:" [software/thumbor-plugins] - 10https://gerrit.wikimedia.org/r/593358 (https://phabricator.wikimedia.org/T236240) (owner: 10AntiCompositeNumber) [16:53:11] DannyS712: I’m probably missing context too, but for me the diff remains monospaced [16:53:36] it remains monospaced, but the font changes [16:53:50] (03CR) 10DCausse: increment extra plugin to 6.5.4-wmf-9 (031 comment) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/593833 (owner: 10Mstyles) [16:54:03] Its starts with the same font as https://www.mediawiki.org/w/index.php?title=MediaWiki_1.35/Roadmap&diff=prev&oldid=3823699 but then changes to something else [16:54:38] (03CR) 10Jgreen: [C: 03+2] add analytics.frdev.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/594235 (https://phabricator.wikimedia.org/T245755) (owner: 10Jgreen) [16:55:02] (03CR) 10RLazarus: [C: 03+2] contint: On stretch, use the docker we have [puppet] - 10https://gerrit.wikimedia.org/r/593806 (https://phabricator.wikimedia.org/T236675) (owner: 10Jforrester) [16:55:08] DannyS712: I get no change. [17:00:04] gehel and onimisionipe: #bothumor I � Unicode. All rise for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T1700). [17:01:07] !log joal@deploy1001 Finished deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [2252f9a] (duration: 16m 45s) [17:01:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:01:09] jouncebot: maryum should take care of that deployment if there is anything to deploy [17:01:59] we should probably update https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T1700 so that it pings you instead [17:02:21] !log joal@deploy1001 Started deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [2252f9a] [17:02:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:02:31] !log joal@deploy1001 Finished deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [2252f9a] (duration: 00m 09s) [17:02:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:02:48] (03PS1) 10Cmjohnson: Add netboot.cfg and dhcpd file for kubernetes100[7-9]|1010-14 [puppet] - 10https://gerrit.wikimedia.org/r/594237 (https://phabricator.wikimedia.org/T241850) [17:03:44] 10Operations, 10Core Platform Team, 10MediaWiki-General, 10serviceops, 10Sustainability (Incident Prevention): Revisit timeouts, concurrency limits in remote HTTP calls from MediaWiki - https://phabricator.wikimedia.org/T245170 (10AMooney) p:05High→03Unbreak! [17:06:07] 10Operations, 10DBA, 10DC-Ops, 10Sustainability (Incident Prevention): PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 (10jcrespo) [17:06:36] (03PS1) 10Volans: spicerack: add both RO and RW Netbox tokens [puppet] - 10https://gerrit.wikimedia.org/r/594238 [17:06:39] (03PS1) 10RLazarus: mediawiki: Clean up $use_gutter now that it's true everywhere. [puppet] - 10https://gerrit.wikimedia.org/r/594239 (https://phabricator.wikimedia.org/T244852) [17:11:05] (03CR) 10CRusnov: [C: 03+1] "> Patch Set 1:" [debs/pynetbox] (debian) - 10https://gerrit.wikimedia.org/r/553741 (owner: 10Hashar) [17:11:36] (03CR) 10CRusnov: [C: 03+2] Revert local hack to sources [debs/pynetbox] (debian) - 10https://gerrit.wikimedia.org/r/553741 (owner: 10Hashar) [17:11:58] (03CR) 10CRusnov: [C: 03+2] Configuration for gbp buildpackage [debs/pynetbox] (debian) - 10https://gerrit.wikimedia.org/r/553735 (owner: 10Hashar) [17:13:03] (03PS2) 10RLazarus: mcrouter_wancache: Clean up $use_gutter now that it's true everywhere. [puppet] - 10https://gerrit.wikimedia.org/r/594239 (https://phabricator.wikimedia.org/T244852) [17:14:31] (03CR) 10Cmjohnson: [C: 03+2] Add netboot.cfg and dhcpd file for kubernetes100[7-9]|1010-14 [puppet] - 10https://gerrit.wikimedia.org/r/594237 (https://phabricator.wikimedia.org/T241850) (owner: 10Cmjohnson) [17:14:53] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Discovery-Search (Current work): SRE Onboarding - Ryan Kemper, Search Platform team - https://phabricator.wikimedia.org/T251572 (10RKemper) [17:15:26] 10Operations, 10ops-eqiad, 10DBA, 10DC-Ops: (Need By: TBD) rack/setup/install backup1002 + array - https://phabricator.wikimedia.org/T250816 (10jcrespo) Thanks, will take it from here, I should be able to handle this on my own unless unexpected issues arise. [17:15:28] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Discovery-Search (Current work): SRE Onboarding - Ryan Kemper, Search Platform team - https://phabricator.wikimedia.org/T251572 (10RKemper) Phabricator 2FA enabled [17:16:30] (03CR) 10Thcipriani: [C: 03+1] "Nice! This seems really useful." [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) (owner: 10Brennen Bearnes) [17:16:37] (03Abandoned) 10Cmjohnson: Adding new kubernetes servers kubernetes100[7-9]|101[0-4] to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/594236 (https://phabricator.wikimedia.org/T241850) (owner: 10Cmjohnson) [17:17:57] (03CR) 10CRusnov: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/594238 (owner: 10Volans) [17:22:43] 10Operations, 10ops-codfw, 10DC-Ops: host rename: labtestservices2003.wikimedia.org -> cloudservices2003-dev.wikimedia.org - https://phabricator.wikimedia.org/T251576 (10Papaul) 05Open→03Resolved Complete ` [edit interfaces ge-1/0/13] - description labtestservices2003; + description cloudservice2003... [17:25:36] (03CR) 10Bartosz Dziewoński: "Good to go now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592630 (https://phabricator.wikimedia.org/T249376) (owner: 10Esanders) [17:26:57] (03PS3) 10Bartosz Dziewoński: Load DiscussionTools on en.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592630 (https://phabricator.wikimedia.org/T249376) (owner: 10Esanders) [17:27:05] (03CR) 10Volans: [C: 03+2] spicerack: add both RO and RW Netbox tokens [puppet] - 10https://gerrit.wikimedia.org/r/594238 (owner: 10Volans) [17:27:17] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Discovery-Search (Current work): SRE Onboarding - Ryan Kemper, Search Platform team - https://phabricator.wikimedia.org/T251572 (10herron) [17:27:47] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Discovery-Search (Current work): SRE Onboarding - Ryan Kemper, Search Platform team - https://phabricator.wikimedia.org/T251572 (10herron) [17:28:19] (03CR) 10Jforrester: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/594238 (owner: 10Volans) [17:29:05] James_F: want to be extra sure? :-P ^^^ [17:30:18] (03PS4) 10Hashar: Merge tag 'debian/1.8.17-1_exp1' into debian/buster-wikimedia [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/589416 (https://phabricator.wikimedia.org/T242155) [17:30:20] volans: I want to see that CI is still working whilst I re-build it, rather than have SRE scream at me. ;-) [17:30:49] (03PS5) 10Hashar: Merge tag 'debian/1.8.17-1' into debian/buster-wikimedia [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/589416 (https://phabricator.wikimedia.org/T242155) [17:30:58] ehehe I imagined but was a too easy joke to pass on it ;) [17:32:39] (03CR) 10Jforrester: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) (owner: 10Brennen Bearnes) [17:33:01] (03CR) 10Jforrester: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) (owner: 10Brennen Bearnes) [17:36:26] !log upgraded spicerack on cumin[12]001 to 0.0.33-1 [17:36:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:39:14] (03PS1) 10Volans: sre.hosts.decommission: use Netbox RW token [cookbooks] - 10https://gerrit.wikimedia.org/r/594251 [17:41:40] (03CR) 10Krinkle: [C: 03+1] Update path to CirrusSearch maintenance scripts [puppet] - 10https://gerrit.wikimedia.org/r/591335 (https://phabricator.wikimedia.org/T250806) (owner: 10Reedy) [17:44:19] 10Operations, 10Discovery-Search, 10SDC General, 10Structured Data Engineering, and 2 others: Create CQS puppet configs by applying query_service module - https://phabricator.wikimedia.org/T237089 (10Gehel) a:03EBernhardson [17:46:51] (03CR) 10Volans: [C: 03+2] "Merging to avoid any broken run of the decom cookbook, LMK if you have any comment post-merge too." [cookbooks] - 10https://gerrit.wikimedia.org/r/594251 (owner: 10Volans) [17:47:02] 10Operations, 10ops-codfw, 10DC-Ops: host rename: labtestservices2003.wikimedia.org -> cloudservices2003-dev.wikimedia.org - https://phabricator.wikimedia.org/T251576 (10Andrew) thank you! [17:48:56] (03Merged) 10jenkins-bot: sre.hosts.decommission: use Netbox RW token [cookbooks] - 10https://gerrit.wikimedia.org/r/594251 (owner: 10Volans) [17:53:26] brennen, I think the wmf.30 is not looking bad right now, and it's getting to the end of my work day; are you OK taking over? [17:54:22] liw: yep, all good. [17:54:38] brennen, reassignin task to you [17:56:27] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Duplicate "moderator request(s) waiting" emails sent to list admins - https://phabricator.wikimedia.org/T250032 (10MarcoAurelio) Hello. I am having the same issue with `metawiki-admins` since April 2020. Mailman keeps sending me reminders about subscription re... [17:57:44] !log configure singtel interface on cr1-eqsin [17:57:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:04] RoanKattouw, Niharika, and Urbanecm: Time to snap out of that daydream and deploy Morning SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T1800). [18:00:04] MatmaRex and kaldari: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:10] hi [18:00:45] despite the appearances, my two patches are independent and can be done in any order [18:00:56] I can SWAT today! [18:01:48] (03CR) 10Hashar: "Upstream has uploaded to unstable with tag debian/1.8.17-1 . I repurposed my change using:" [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/589416 (https://phabricator.wikimedia.org/T242155) (owner: 10Hashar) [18:02:22] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592630 (https://phabricator.wikimedia.org/T249376) (owner: 10Esanders) [18:02:28] RECOVERY - Check systemd state on an-launcher1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:03:22] (03Merged) 10jenkins-bot: Load DiscussionTools on en.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/592630 (https://phabricator.wikimedia.org/T249376) (owner: 10Esanders) [18:04:04] MatmaRex: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/592630 is at mwdebug1001, could you check, please? [18:04:20] PROBLEM - mailman_qrunner on fermium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner https://wikitech.wikimedia.org/wiki/Mailman [18:04:24] PROBLEM - Router interfaces on cr1-eqsin is CRITICAL: CRITICAL: host 103.102.166.129, interfaces up: 85, down: 3, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [18:04:37] looking [18:05:26] Urbanecm: seems good [18:05:34] PROBLEM - mailman_ctl on fermium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl https://wikitech.wikimedia.org/wiki/Mailman [18:05:34] thanks MatmaRex, syncing [18:07:04] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 18c1efb: Load DiscussionTools on en.wiki (T249376) (duration: 00m 58s) [18:07:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:07:06] T249376: Deploy Replying v1.0 via query string parameter to en.wiki - https://phabricator.wikimedia.org/T249376 [18:07:54] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:08:01] MatmaRex: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/594240 is ready at mwdebug1001 [18:08:50] (03PS2) 10Urbanecm: Adding upload_by_url user right to all registered users on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593357 (https://phabricator.wikimedia.org/T251474) (owner: 10Kaldari) [18:09:59] Urbanecm: also seems good [18:10:03] syncing [18:10:10] (03CR) 10Mstyles: "> Patch Set 2:" (032 comments) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/593833 (owner: 10Mstyles) [18:11:39] !log urbanecm@deploy1001 Synchronized php-1.35.0-wmf.30/extensions/DiscussionTools/includes/DiscussionToolsHooks.php: SWAT: b85fc16: Enable on all ExtraSignaturesNamespaces (T249036) (duration: 01m 00s) [18:11:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:11:43] T249036: Enable the Reply tool in the Project namespace (Wikipedia:) and other "wgExtraSignatureNamespaces" - https://phabricator.wikimedia.org/T249036 [18:12:03] done MatmaRex :) [18:12:13] thanks! [18:12:16] happy to help! [18:12:25] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593357 (https://phabricator.wikimedia.org/T251474) (owner: 10Kaldari) [18:13:17] (03Merged) 10jenkins-bot: Adding upload_by_url user right to all registered users on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593357 (https://phabricator.wikimedia.org/T251474) (owner: 10Kaldari) [18:14:35] (03CR) 10RLazarus: [C: 03+2] hieradata: Remove obsolete deployment-prep overrides [puppet] - 10https://gerrit.wikimedia.org/r/590530 (owner: 10Krinkle) [18:15:02] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: c04fbdd: Adding upload_by_url user right to all registered users on Commons (T251474) (duration: 00m 57s) [18:15:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:07] T251474: Add upload_by_url user right to all registered users on Commons - https://phabricator.wikimedia.org/T251474 [18:15:14] RECOVERY - mailman_qrunner on fermium is OK: PROCS OK: 8 processes with UID = 38 (list), regex args /mailman/bin/qrunner https://wikitech.wikimedia.org/wiki/Mailman [18:16:16] !log Morning SWAT done [18:16:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:28] RECOVERY - mailman_ctl on fermium is OK: PROCS OK: 1 process with UID = 38 (list), regex args /mailman/bin/mailmanctl https://wikitech.wikimedia.org/wiki/Mailman [18:16:58] mailman was me, forgot to downtime [18:19:55] !log dpifke@deploy1001 Started deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - T249822, T238086 [18:19:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:59] T238086: Edge cache response time per server should be monitored - https://phabricator.wikimedia.org/T238086 [18:19:59] T249822: Collect First Input Delay with Prometheus - https://phabricator.wikimedia.org/T249822 [18:20:00] !log dpifke@deploy1001 Finished deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - T249822, T238086 (duration: 00m 05s) [18:20:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:16] (03CR) 10Hashar: "Thank you :]" [debs/pynetbox] (debian) - 10https://gerrit.wikimedia.org/r/553741 (owner: 10Hashar) [18:23:10] (03PS10) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [18:24:23] (03CR) 10Mstyles: increment extra plugin to 6.5.4-wmf-9 (031 comment) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/593833 (owner: 10Mstyles) [18:25:14] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [18:25:16] (03PS3) 10Mstyles: increment extra plugin to 6.5.4-wmf-9 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/593833 [18:25:41] (03PS3) 10Krinkle: hieradata: Remove obsolete deployment-prep overrides [puppet] - 10https://gerrit.wikimedia.org/r/590530 [18:25:56] (03PS14) 10Krinkle: hieradata: Include cache-text in Beta Cluster 'cache_hosts' [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) [18:27:02] (03CR) 10RLazarus: [C: 03+2] hieradata: Remove obsolete deployment-prep overrides [puppet] - 10https://gerrit.wikimedia.org/r/590530 (owner: 10Krinkle) [18:27:20] (03CR) 10RLazarus: [C: 03+2] hieradata: Include cache-text in Beta Cluster 'cache_hosts' [puppet] - 10https://gerrit.wikimedia.org/r/530773 (https://phabricator.wikimedia.org/T158837) (owner: 10Krinkle) [18:29:36] :w [18:29:44] (03PS3) 10Brennen Bearnes: logspam-watch: add time & sortable columns, improve formatting [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) [18:31:21] (03PS11) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [18:33:31] (03CR) 10jerkins-bot: [V: 04-1] (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [18:34:02] (03PS4) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: small refactor [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) [18:34:15] (03CR) 10Jbond: "ready for review" [cookbooks] - 10https://gerrit.wikimedia.org/r/593476 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [18:34:16] 10Operations, 10ops-eqiad: Degraded RAID on kafka-jumbo1001 - https://phabricator.wikimedia.org/T251586 (10wiki_willy) Looks like the warranty on kafka-jumbo1001 is going to end in a few weeks. @Jclark-ctr or @Cmjohnson - can one of you guys troubleshoot and submit the RMA for this part before then? Thanks,... [18:34:24] (03PS2) 10Jbond: cookbook sre.hosts.rotate-pdu-password: use request.Session and response.raise_for_status [cookbooks] - 10https://gerrit.wikimedia.org/r/594173 (https://phabricator.wikimedia.org/T246890) [18:34:35] (03CR) 10Jbond: "ready for review" [cookbooks] - 10https://gerrit.wikimedia.org/r/594173 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [18:34:44] (03PS7) 10Jbond: cookbooks sre.hosts.rotate-pdu-password: update to raise exceptions [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) [18:34:53] (03CR) 10Jbond: "ready for review" [cookbooks] - 10https://gerrit.wikimedia.org/r/594197 (https://phabricator.wikimedia.org/T246890) (owner: 10Jbond) [18:35:10] (03PS12) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [18:35:59] (03CR) 10Jforrester: [C: 03+1] "Love it." [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) (owner: 10Brennen Bearnes) [18:36:03] (03CR) 10Brennen Bearnes: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) (owner: 10Brennen Bearnes) [18:37:14] (03PS13) 10Jbond: (WIP) cookbooks sre.hosts.rotate-pdu-password: reset SNMP [cookbooks] - 10https://gerrit.wikimedia.org/r/594199 (https://phabricator.wikimedia.org/T246890) [18:38:01] (03CR) 10CRusnov: [C: 03+1] "Looks good!" [cookbooks] - 10https://gerrit.wikimedia.org/r/594251 (owner: 10Volans) [18:48:12] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Duplicate "moderator request(s) waiting" emails sent to list admins - https://phabricator.wikimedia.org/T250032 (10herron) Could you please send me the full received headers from a dupe notification? If you prefer to keep that off task email works, kherron (a... [19:01:57] (03PS1) 10Andrew Bogott: Openstack/Buster: removed some python2 packages that we don't actually need. [puppet] - 10https://gerrit.wikimedia.org/r/594269 (https://phabricator.wikimedia.org/T251294) [19:01:59] (03PS1) 10Andrew Bogott: OpenStack on Buster: remove requirement for python-mysql.connector [puppet] - 10https://gerrit.wikimedia.org/r/594270 (https://phabricator.wikimedia.org/T251294) [19:02:36] RECOVERY - Check systemd state on an-launcher1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:03:23] 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020): CommRel support for FY2019-2020 Q4 DC switchover - https://phabricator.wikimedia.org/T244808 (10RLazarus) We've ruled out a switchover in Q4. We'll continue to do all the non-user-impacting prep work we can, so we might be ready to go early in Q1, if... [19:03:40] (03CR) 10Andrew Bogott: [C: 03+2] Openstack/Buster: removed some python2 packages that we don't actually need. [puppet] - 10https://gerrit.wikimedia.org/r/594269 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:03:46] 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2020): CommRel support for FY2020-2021 Q1 DC switchover - https://phabricator.wikimedia.org/T244808 (10RLazarus) [19:03:52] (03CR) 10Andrew Bogott: [C: 03+2] OpenStack on Buster: remove requirement for python-mysql.connector [puppet] - 10https://gerrit.wikimedia.org/r/594270 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:04:14] (03PS1) 10Nuria: Count automated traffic as bots in turnilo's homescreen [puppet] - 10https://gerrit.wikimedia.org/r/594272 (https://phabricator.wikimedia.org/T238357) [19:04:19] 10Operations, 10Goal: FY2020-2021 Q1 DC switchover and switchback - https://phabricator.wikimedia.org/T243314 (10RLazarus) [19:04:52] 10Operations: FY2020-2021 Q1 eqiad -> codfw switchover - https://phabricator.wikimedia.org/T243316 (10RLazarus) [19:05:07] 10Operations: FY2020-2021 Q1 codfw -> eqiad switchback - https://phabricator.wikimedia.org/T243318 (10RLazarus) [19:05:55] (03PS1) 10Volans: changelog: specify breaking change [software/spicerack] - 10https://gerrit.wikimedia.org/r/594273 [19:05:57] (03PS1) 10Volans: doc: set min version of sphinx_rtd_theme to 0.1.9 [software/spicerack] - 10https://gerrit.wikimedia.org/r/594274 [19:05:59] (03PS1) 10Volans: doc: fix documentation generation for Sphinx 3 [software/spicerack] - 10https://gerrit.wikimedia.org/r/594275 [19:06:10] (03CR) 10Hashar: [V: 03+1 C: 03+1] "I ran it locally, that seems to do the magic ;)" [debs/doxygen] (debian/buster-wikimedia) - 10https://gerrit.wikimedia.org/r/589416 (https://phabricator.wikimedia.org/T242155) (owner: 10Hashar) [19:08:04] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:14:57] 10Operations, 10Wikimedia-Mailing-lists: Mailing-list sending notifications for inexistent spam messages - https://phabricator.wikimedia.org/T251816 (10Teles) [19:14:59] (03PS1) 10Andrew Bogott: Openstack/Buster: remove component for python-oath [puppet] - 10https://gerrit.wikimedia.org/r/594276 [19:15:01] (03PS1) 10Andrew Bogott: Openstack/Buster: remove explicit import of python-ldap from Buster [puppet] - 10https://gerrit.wikimedia.org/r/594277 (https://phabricator.wikimedia.org/T251294) [19:15:36] (03PS1) 10Andrew Bogott: Openstack/Buster: remove include of python-mwclient [puppet] - 10https://gerrit.wikimedia.org/r/594278 (https://phabricator.wikimedia.org/T251294) [19:15:50] (03CR) 10Andrew Bogott: [C: 03+2] Openstack/Buster: remove component for python-oath [puppet] - 10https://gerrit.wikimedia.org/r/594276 (owner: 10Andrew Bogott) [19:16:03] (03CR) 10Andrew Bogott: [C: 03+2] Openstack/Buster: remove explicit import of python-ldap from Buster [puppet] - 10https://gerrit.wikimedia.org/r/594277 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:16:18] (03CR) 10Andrew Bogott: [C: 03+2] Openstack/Buster: remove include of python-mwclient [puppet] - 10https://gerrit.wikimedia.org/r/594278 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:18:41] 10Operations, 10Wikimedia-Mailing-lists: Mailing-list sending notifications for inexistent spam messages - https://phabricator.wikimedia.org/T251816 (10Teles) {F31802449} This is the email I receive. [19:19:39] (03PS1) 10Andrew Bogott: Python/Buster: removed install of python-mysql.connector [puppet] - 10https://gerrit.wikimedia.org/r/594281 (https://phabricator.wikimedia.org/T251294) [19:20:58] (03CR) 10Andrew Bogott: [C: 03+2] Python/Buster: removed install of python-mysql.connector [puppet] - 10https://gerrit.wikimedia.org/r/594281 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:23:52] (03Abandoned) 10Cmjohnson: Add netboot.cfg and dhcpd file for kubernetes100[7-9]|1010-14 [puppet] - 10https://gerrit.wikimedia.org/r/594237 (https://phabricator.wikimedia.org/T241850) (owner: 10Cmjohnson) [19:30:04] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Duplicate "moderator request(s) waiting" emails sent to list admins - https://phabricator.wikimedia.org/T250032 (10bd808) Raw message from the latest of 22 identical content messages to cloud-bounces@lists.wikimedia.org. Possibly interesting that the messages... [19:33:28] 10Operations, 10Wikimedia-Mailing-lists: Mailing-list sending notifications for inexistent spam messages - https://phabricator.wikimedia.org/T251816 (10Quiddity) I've gotten 4 duplicate reminder-emails for "moderator request(s) waiting" within the last ~3 hours. (In PST@: 10:06, 10:32, 11:52, 12:18) I dealt w... [19:39:25] 10Operations, 10Wikimedia-Mailing-lists: Mailing-list sending notifications for inexistent spam messages - https://phabricator.wikimedia.org/T251816 (10Teles) {F31802489} Two messages within less than an hour. [19:41:14] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Duplicate "moderator request(s) waiting" emails sent to list admins - https://phabricator.wikimedia.org/T250032 (10Quiddity) See also {T251816} (Partial duplicate, but it also covers the new aspect today that "mailman is still sending reminders even after the... [19:48:57] 10Operations, 10Wikimedia-Mailing-lists: Simple-Admin-l Add Mail List Admins - https://phabricator.wikimedia.org/T251821 (10Enfcer) [19:52:13] 10Operations, 10ops-eqiad, 10DC-Ops: hw troubleshooting: Memory correctable errors -EDAC- for elastic1029.eqiad.wmnet - https://phabricator.wikimedia.org/T233578 (10Cmjohnson) @gehel This server is 2 years out of warranty and the memory can be reseated but doubtful it will correct the issues. Are there plan... [19:55:24] (03PS1) 10Andrew Bogott: Glance/Rocky/Buster: explicitly great 'glance' user and group [puppet] - 10https://gerrit.wikimedia.org/r/594287 (https://phabricator.wikimedia.org/T251294) [19:55:38] 10Operations, 10ops-eqiad: Degraded RAID on kafka-jumbo1001 - https://phabricator.wikimedia.org/T251586 (10wiki_willy) a:03Cmjohnson [19:55:45] (03CR) 10jerkins-bot: [V: 04-1] Glance/Rocky/Buster: explicitly great 'glance' user and group [puppet] - 10https://gerrit.wikimedia.org/r/594287 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:56:16] (03PS2) 10Andrew Bogott: Glance/Rocky/Buster: explicitly create 'glance' user and group [puppet] - 10https://gerrit.wikimedia.org/r/594287 (https://phabricator.wikimedia.org/T251294) [19:56:36] (03CR) 10jerkins-bot: [V: 04-1] Glance/Rocky/Buster: explicitly create 'glance' user and group [puppet] - 10https://gerrit.wikimedia.org/r/594287 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [19:58:20] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Duplicate "moderator request(s) waiting" emails sent to list admins - https://phabricator.wikimedia.org/T250032 (10bd808) >>! In T250032#6106394, @bd808 wrote: > I haven't checked the headers on all of them to see if things just got backed up somewhere due to... [19:58:57] (03PS3) 10Andrew Bogott: Glance/Rocky/Buster: explicitly create 'glance' user and group [puppet] - 10https://gerrit.wikimedia.org/r/594287 (https://phabricator.wikimedia.org/T251294) [19:59:30] (03CR) 10jerkins-bot: [V: 04-1] Glance/Rocky/Buster: explicitly create 'glance' user and group [puppet] - 10https://gerrit.wikimedia.org/r/594287 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [20:00:04] halfak and accraze: That opportune time is upon us again. Time for a Services – Graphoid / Citoid / ORES deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T2000). [20:01:29] (03PS4) 10Andrew Bogott: Glance/Rocky/Buster: explicitly create 'glance' user and group [puppet] - 10https://gerrit.wikimedia.org/r/594287 (https://phabricator.wikimedia.org/T251294) [20:02:38] RECOVERY - Check systemd state on an-launcher1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:03:02] (03CR) 10Andrew Bogott: [C: 03+2] Glance/Rocky/Buster: explicitly create 'glance' user and group [puppet] - 10https://gerrit.wikimedia.org/r/594287 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [20:03:26] 10Operations, 10ops-eqiad: Netbox report PuppetDB PhysicalHosts critical - https://phabricator.wikimedia.org/T251725 (10wiki_willy) Error for elastic1029.eqiad.wmnet tied into T233578 and error for restbase1029 error tied in with T241784 [20:04:17] 10Operations, 10ops-eqiad, 10DC-Ops: hw troubleshooting: Memory correctable errors -EDAC- for elastic1029.eqiad.wmnet - https://phabricator.wikimedia.org/T233578 (10wiki_willy) [20:04:20] 10Operations, 10ops-eqiad, 10Core Platform Team Workboards (Clinic Duty Team): (Need by: TBD) rack/setup/install restbase1028, restbase1029, restbase1030 - https://phabricator.wikimedia.org/T241784 (10wiki_willy) [20:05:46] 10Operations, 10ops-eqiad: Netbox report PuppetDB PhysicalHosts critical - https://phabricator.wikimedia.org/T251725 (10wiki_willy) a:03Cmjohnson [20:06:39] (03PS1) 10Andrew Bogott: Glance/Buster/Rocky: remove explicit uid/gid for glance service user [puppet] - 10https://gerrit.wikimedia.org/r/594290 (https://phabricator.wikimedia.org/T251294) [20:08:02] (03CR) 10Andrew Bogott: [C: 03+2] Glance/Buster/Rocky: remove explicit uid/gid for glance service user [puppet] - 10https://gerrit.wikimedia.org/r/594290 (https://phabricator.wikimedia.org/T251294) (owner: 10Andrew Bogott) [20:08:08] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [20:21:10] 10Operations, 10ops-eqiad, 10DC-Ops: Netbox report accounting icinga alert - https://phabricator.wikimedia.org/T250053 (10Jclark-ctr) corrected msw-a2-eqiad on [20:27:34] 10Operations, 10Wikimedia-Mailing-lists: Simple-Admin-l Add Mail List Admins - https://phabricator.wikimedia.org/T251821 (10Enfcer) [20:34:34] (03PS1) 10Cwhite: profile,gerrit: add enable_monitoring flag for gerrit-test [puppet] - 10https://gerrit.wikimedia.org/r/594293 (https://phabricator.wikimedia.org/T239151) [20:36:00] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Hardware): Degraded RAID on cloudvirt1024 - https://phabricator.wikimedia.org/T241884 (10Jclark-ctr) called Dell after no response from email. Dell is sending out new backplane and new raid card. [20:38:02] 10Operations, 10Wikimedia-Mailing-lists: Simple-Admin-l Add Mail List Admins - https://phabricator.wikimedia.org/T251821 (10Operator873) I agree with enfcer's statement above and volunteer to serve as a mailing list admin/POC/etc for the project mentioned as needed. [20:43:44] (03CR) 10Krinkle: [C: 03+1] Replace AuthManagerStatsdHandler with namespaced class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593652 (owner: 10Reedy) [20:58:03] (03PS2) 10Ottomata: [WIP] Initial debian commit [debs/anaconda] (debian) - 10https://gerrit.wikimedia.org/r/594204 (https://phabricator.wikimedia.org/T251006) [20:58:17] (03CR) 10Reedy: "Should be good now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 (owner: 10Ppchelko) [20:58:33] (03PS3) 10Ottomata: [WIP] Initial debian commit [debs/anaconda] (debian) - 10https://gerrit.wikimedia.org/r/594204 (https://phabricator.wikimedia.org/T251006) [21:00:04] Reedy and sbassett: #bothumor My software never has bugs. It just develops random features. Rise for Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T2100). [21:02:38] RECOVERY - Check systemd state on an-launcher1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:02:59] (03CR) 10ArielGlenn: [C: 03+1] "These look fine to me, but I wonder if you want to rename 'SaneitizeJobs' to 'SanitizeJobs' while you're at it." [puppet] - 10https://gerrit.wikimedia.org/r/591335 (https://phabricator.wikimedia.org/T250806) (owner: 10Reedy) [21:04:25] (03CR) 10Reedy: "It's not Sanitisation, it's Saneitisation - ie making things sane. Apparently :D" [puppet] - 10https://gerrit.wikimedia.org/r/591335 (https://phabricator.wikimedia.org/T250806) (owner: 10Reedy) [21:08:06] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [21:14:13] (03CR) 10ArielGlenn: "> It's not Sanitisation, it's Saneitisation - ie making things sane." [puppet] - 10https://gerrit.wikimedia.org/r/591335 (https://phabricator.wikimedia.org/T250806) (owner: 10Reedy) [21:32:03] sbassett: could you do https://phabricator.wikimedia.org/T250887#6102375, please? 🙂 [21:32:37] 10Operations, 10LDAP-Access-Requests: Add Eamedina to `wmf` LDAF group - https://phabricator.wikimedia.org/T251358 (10eamedina) Hello, happy to provide more context. Following the LDAP-Access-Requests project link above: > Username: (The user name used on Wikitech.) Eamedina > Shell access: Yes/No (Whether... [21:32:39] Urbanecm: yeah, I think I can commit and deploy that now. [21:32:56] thank you sbassett! [21:33:14] Are you around to help test a bit? Not sure if that's really needed as long as logs don't blow up. [21:33:24] sure [21:34:24] Ok, give me a minute to deploy. [21:38:30] 10Operations, 10Wikimedia-Mailing-lists: Simple-Admin-l Add Mail List Admins - https://phabricator.wikimedia.org/T251821 (10Quiddity) a:03Quiddity [21:41:58] !log sbassett@deploy1001 Synchronized private/PrivateSettings.php: Deploy partial mitigation for T250887 (duration: 00m 57s) [21:42:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:42:50] (03PS1) 10Cwhite: mtail: update varnishrls compatibility with rc35 [puppet] - 10https://gerrit.wikimedia.org/r/594316 (https://phabricator.wikimedia.org/T251466) [21:44:50] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Duplicate "moderator request(s) waiting" emails sent to list admins - https://phabricator.wikimedia.org/T250032 (10herron) >>! In T250032#6106545, @bd808 wrote: > Received: from lists1001.wikimedia.org ([2620:0:861:1:208:80:154:31]:51355 helo=lists.wikimedia.o... [21:45:35] !log sbassett@deploy1001 Synchronized private/PrivateSettings.php: Revert partial mitigation for T250887 (duration: 00m 57s) [21:45:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:48:36] 10Operations, 10Wikimedia-Mailing-lists: Simple-Admins-l Add Mail List Admins - https://phabricator.wikimedia.org/T251821 (10Quiddity) [21:50:42] 10Operations, 10Wikimedia-Mailing-lists: Simple-Admins-l Add Mail List Admins - https://phabricator.wikimedia.org/T251821 (10Quiddity) @Enfcer Please do provide the specific email addresses. You can email them to me at nwilson@wikimedia.org if preferred. [21:51:06] (03PS1) 10Cwhite: mtail: update mtail testing to python3 [puppet] - 10https://gerrit.wikimedia.org/r/594320 (https://phabricator.wikimedia.org/T251466) [21:54:33] 10Operations, 10Wikimedia-Mailing-lists: Simple-Admins-l Add Mail List Admins - https://phabricator.wikimedia.org/T251821 (10Enfcer) @Quiddity Email with those addresses sent. Thanks [22:00:05] gehel and maryum: It is that lovely time of the day again! You are hereby commanded to deploy Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T2200). [22:00:29] hashar: right on time! [22:02:33] 10Operations, 10Wikimedia-Mailing-lists: Simple-Admins-l Add Mail List Admins - https://phabricator.wikimedia.org/T251821 (10Operator873) mine is operator873@gmail.com [22:02:48] RECOVERY - Check systemd state on an-launcher1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:06:50] !log sbassett@deploy1001 Synchronized private/PrivateSettings.php: Partial mitigation for T250887 (duration: 00m 57s) [22:06:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:14] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [22:09:24] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:13:06] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [22:14:39] 10Operations, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team-TODO, and 2 others: TEC3:O6:O:6.1:Q3: Deployment Pipeline Documentation - https://phabricator.wikimedia.org/T213090 (10thcipriani) 05Open→03Resolved a:03jeena @LarsWirzenius and @Jdforrester-WMF worked on the initial d... [22:14:44] 10Operations, 10Release Pipeline, 10Release-Engineering-Team-TODO, 10Epic, and 2 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10thcipriani) [22:14:48] 10Operations, 10Wikimedia-Mailing-lists: Simple-Admins-l Add Mail List Admins - https://phabricator.wikimedia.org/T251821 (10Quiddity) 05Open→03Resolved [22:21:29] (03CR) 10Urbanecm: "I get this may not be the right fix, but could somebody please explain what is wrong?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594214 (owner: 10Jforrester) [22:21:40] 10Operations, 10Release Pipeline, 10Release-Engineering-Team-TODO, 10Epic, and 2 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10thcipriani) [22:22:20] (03PS4) 10Brennen Bearnes: logspam-watch: add time & sortable columns, improve formatting [puppet] - 10https://gerrit.wikimedia.org/r/593936 (https://phabricator.wikimedia.org/T242882) [22:24:32] 10Operations, 10Release Pipeline, 10Release-Engineering-Team-TODO, 10Epic, and 2 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10thcipriani) [22:30:00] (03PS2) 10Jforrester: buildDBLists: Remove circular dependency on all.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594216 (https://phabricator.wikimedia.org/T251715) [22:36:34] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:38:24] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [22:42:36] 10Operations: Migrate Cumin hosts to Buster - https://phabricator.wikimedia.org/T245114 (10Jdforrester-WMF) [22:42:51] !log sbassett@deploy1001 Synchronized private/PrivateSettings.php: T251835: Restore dc752af1e94684faacbe9662789815c6edbbdf46 (duration: 00m 57s) [22:42:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:43:12] (03CR) 10Jforrester: "> Patch Set 1:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/594214 (owner: 10Jforrester) [23:00:04] RoanKattouw, Niharika, and Urbanecm: That opportune time is upon us again. Time for a Evening SWAT(Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:02:54] RECOVERY - Check systemd state on an-launcher1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:05:02] 10Operations, 10Icinga, 10observability: Icinga notifications didn't get applied after a puppet run - https://phabricator.wikimedia.org/T251407 (10colewhite) p:05Triage→03Medium [23:06:57] 10Operations, 10Wikimedia-Mailing-lists: Mailing-list sending notifications for inexistent spam messages - https://phabricator.wikimedia.org/T251816 (10colewhite) p:05Triage→03High a:03colewhite [23:07:50] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:08:26] PROBLEM - Check systemd state on an-launcher1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:09:10] 10Operations, 10Wikimedia-Mailing-lists: Mailing-list sending notifications for inexistent spam messages - https://phabricator.wikimedia.org/T251816 (10colewhite) The linked task sounds very similar to this one. The response to the other task may have resolved it. Give it a bit of time and confirm? [23:09:40] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:09:50] 10Operations, 10Wikimedia-Mailing-lists: Mailing-list sending notifications for inexistent spam messages - https://phabricator.wikimedia.org/T251816 (10colewhite) [23:09:53] 10Operations, 10Mail, 10Wikimedia-Mailing-lists: Duplicate "moderator request(s) waiting" emails sent to list admins - https://phabricator.wikimedia.org/T250032 (10colewhite) [23:11:46] 10Operations, 10Wikimedia-Mailing-lists: Wikiml-l mail archives are empty after August 2019 (moderation enabled but nobody moderates, hence no emails get delivered) - https://phabricator.wikimedia.org/T251554 (10colewhite) p:05Triage→03Medium [23:12:55] 10Operations, 10InternetArchiveBot, 10Traffic: Support TLSv1.3 - https://phabricator.wikimedia.org/T251414 (10colewhite) p:05Triage→03Medium [23:13:18] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:13:36] 10Operations, 10serviceops, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10colewhite) p:05Triage→03Medium [23:14:46] 10Operations, 10Analytics, 10observability: systemd::syslog conf should use :programname equals instead of startswith - https://phabricator.wikimedia.org/T251606 (10colewhite) p:05Triage→03Medium [23:15:50] 10Operations, 10Discovery-Search: Also use java::security on elasticsearch/relforge - https://phabricator.wikimedia.org/T251540 (10colewhite) p:05Triage→03Medium [23:16:24] (03PS3) 10Ppchelko: EventBus: Switch to namespaced class names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 [23:16:26] 10Operations, 10Traffic, 10serviceops, 10Patch-For-Review: Certificate *.wikipedia.org valid until 2020-06-20 - https://phabricator.wikimedia.org/T251726 (10colewhite) p:05Triage→03Medium [23:16:55] (03PS4) 10Ppchelko: EventBus: Switch to namespaced class names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 [23:23:53] !log mstyles@deploy1001 Started deploy [wdqs/wdqs@6518a8d]: v.0.3.26 [23:23:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:25:21] jouncebot: now [23:25:21] For the next 0 hour(s) and 34 minute(s): Evening SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200504T2300) [23:25:48] (03PS2) 10Reedy: Replace AuthManagerStatsdHandler with namespaced class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593652 [23:26:45] (03CR) 10Reedy: [C: 03+2] Replace AuthManagerStatsdHandler with namespaced class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593652 (owner: 10Reedy) [23:27:35] (03Merged) 10jenkins-bot: Replace AuthManagerStatsdHandler with namespaced class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593652 (owner: 10Reedy) [23:29:06] !log reedy@deploy1001 Synchronized wmf-config/logging.php: Replace AuthManagerStatsdHandler with WikimediaEventsAuthManagerStatsdHandler::class (duration: 00m 57s) [23:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:31:19] (03PS5) 10Reedy: EventBus: Switch to namespaced class names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 (owner: 10Ppchelko) [23:31:23] (03CR) 10Reedy: [C: 03+2] EventBus: Switch to namespaced class names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 (owner: 10Ppchelko) [23:32:05] (03Merged) 10jenkins-bot: EventBus: Switch to namespaced class names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 (owner: 10Ppchelko) [23:32:20] (03CR) 10Ppchelko: "> Patch Set 5: Code-Review+2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/590437 (owner: 10Ppchelko) [23:33:22] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:33:57] !log reedy@deploy1001 Synchronized rpc/RunSingleJob.php: Use namespaced EventBus classes (duration: 00m 58s) [23:33:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:35:52] !log reedy@deploy1001 Synchronized wmf-config/logging.php: Use namespaced EventBus classes (duration: 00m 56s) [23:35:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:37:26] !log reedy@deploy1001 Synchronized wmf-config/CommonSettings.php: Use namespaced EventBus classes (duration: 00m 57s) [23:37:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:32] !log mstyles@deploy1001 Finished deploy [wdqs/wdqs@6518a8d]: v.0.3.26 (duration: 14m 39s) [23:38:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:40:38] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:42:28] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:42:58] PROBLEM - Check systemd state on wdqs1006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [23:43:03] (03CR) 10Bstorm: "> All seem to work now. I'm not sure if we should try to make stop/restart wait for the pods to terminate or not. That seems to be the mai" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/586162 (https://phabricator.wikimedia.org/T197930) (owner: 10BryanDavis) [23:43:47] (03PS5) 10Reedy: Replace stringified class names with ::class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 (https://phabricator.wikimedia.org/T251841) [23:43:57] (03PS6) 10Reedy: Replace stringified class names with ::class [mediawiki-config] - 10https://gerrit.wikimedia.org/r/593654 (https://phabricator.wikimedia.org/T251841) [23:46:06] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:47:56] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:51:38] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site={codfw,eqiad} https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [23:53:28] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets