[00:03:38] 10Operations, 10ops-codfw, 10ops-eqiad: Document PDU models - https://phabricator.wikimedia.org/T227632 (10ayounsi) [00:23:43] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1005 (now stat1007), and stat1006] and groups for Mayakpwiki - https://phabricator.wikimedia.org/T227633 (10Mayakp.wiki) [01:04:49] (03CR) 10Krinkle: [C: 03+1] "I don't think this needs an isset() indeed. But I would recommend splitting the patch so that IS is deployed first. That should be good en" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518239 (https://phabricator.wikimedia.org/T225212) (owner: 10Lucas Werkmeister (WMDE)) [01:18:11] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [01:34:19] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [01:41:01] (03PS19) 10CRusnov: Add LibreNMS parity check report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/510256 (https://phabricator.wikimedia.org/T221507) [01:42:58] (03CR) 10CRusnov: "PS19 fixes a majority of device parities (PDUs are now checked)." [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/510256 (https://phabricator.wikimedia.org/T221507) (owner: 10CRusnov) [01:50:52] (03PS20) 10CRusnov: Add LibreNMS parity check report [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/510256 (https://phabricator.wikimedia.org/T221507) [02:02:05] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 46.67% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [02:25:27] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [02:43:19] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 107397392 and 6 seconds [02:46:17] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 172036304 and 9 seconds [02:48:51] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 109318128 and 7 seconds [02:50:41] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 125319544 and 9 seconds [02:52:24] what's up maps [02:53:39] RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 139488 and 7 seconds [02:54:45] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 282384 and 74 seconds [02:54:50] mmhmm [02:59:22] (03CR) 10Aaron Schulz: "Is there a link to a documented procedure for the task format and tagged teams?" [puppet] - 10https://gerrit.wikimedia.org/r/519941 (owner: 10Aaron Schulz) [05:01:54] (03PS1) 10Marostegui: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521806 [05:03:00] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521806 (owner: 10Marostegui) [05:03:51] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521806 (owner: 10Marostegui) [05:04:07] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521806 (owner: 10Marostegui) [05:05:13] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1079 for upgrade (duration: 00m 59s) [05:05:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:05:52] !log Upgrade db1079 [05:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:13:43] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521808 [05:15:12] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521808 (owner: 10Marostegui) [05:16:01] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521808 (owner: 10Marostegui) [05:16:29] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521808 (owner: 10Marostegui) [05:17:15] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1079 after upgrade (duration: 00m 58s) [05:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:28:27] (03PS1) 10Marostegui: db-eqiad.php: More weight to db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521810 [05:29:42] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More weight to db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521810 (owner: 10Marostegui) [05:30:42] (03Merged) 10jenkins-bot: db-eqiad.php: More weight to db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521810 (owner: 10Marostegui) [05:31:37] (03CR) 10jenkins-bot: db-eqiad.php: More weight to db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521810 (owner: 10Marostegui) [05:32:09] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1079 after upgrade (duration: 00m 57s) [05:32:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:36:08] (03CR) 10Vgutierrez: [C: 03+2] ncredir: Move last resource return to a location block [puppet] - 10https://gerrit.wikimedia.org/r/521473 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [05:36:15] (03PS3) 10Vgutierrez: ncredir: Move last resource return to a location block [puppet] - 10https://gerrit.wikimedia.org/r/521473 (https://phabricator.wikimedia.org/T133548) [05:37:10] 08Warning Alert for device cr2-esams.wikimedia.org - Memory over 85% [05:37:28] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521812 [05:39:37] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521812 (owner: 10Marostegui) [05:40:39] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521812 (owner: 10Marostegui) [05:40:55] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1079 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521812 (owner: 10Marostegui) [05:41:50] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1079 after upgrade (duration: 00m 57s) [05:41:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:08:18] (03PS1) 10Vgutierrez: ncredir: Provide /_status endpoint even when a redirection rule matches [puppet] - 10https://gerrit.wikimedia.org/r/521814 (https://phabricator.wikimedia.org/T133548) [06:17:10] 08̶W̶a̶r̶n̶i̶n̶g Device cr2-esams.wikimedia.org recovered from Memory over 85% [06:20:36] 10Operations, 10Analytics, 10netops, 10LDAP: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN - https://phabricator.wikimedia.org/T227611 (10elukey) From puppet I can see that the change for ldap-ro was reverted: ` elukey@notebook1003:~$ sudo grep ldap /var/log/puppet.log Jul 9 17:46:07... [06:27:47] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey: Import AMD rocm packages in wikimedia-buster - https://phabricator.wikimedia.org/T224723 (10elukey) [06:28:00] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Import AMD rocm packages in wikimedia-buster - https://phabricator.wikimedia.org/T224723 (10elukey) [06:28:43] PROBLEM - puppet last run on etcd1003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [06:33:33] (03PS1) 10Vgutierrez: Add DNS entries for ncredir[12]002 [dns] - 10https://gerrit.wikimedia.org/r/521817 (https://phabricator.wikimedia.org/T133548) [06:33:52] 10Operations, 10MediaWiki-extensions-CentralAuth, 10TimedMediaHandler, 10Traffic, and 3 others: Consistent HTTP 503 Error on some urls for some logged-in users (CentralAuth Set-Cookie storm) - https://phabricator.wikimedia.org/T226840 (10TheDJ) Maybe an update to the class documentation to make it easier t... [06:34:11] (03CR) 10jerkins-bot: [V: 04-1] Add DNS entries for ncredir[12]002 [dns] - 10https://gerrit.wikimedia.org/r/521817 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [06:34:20] uh [06:36:46] layer8 issue :) [06:36:47] (03PS2) 10Vgutierrez: Add DNS entries for ncredir[12]002 [dns] - 10https://gerrit.wikimedia.org/r/521817 (https://phabricator.wikimedia.org/T133548) [06:47:04] (03CR) 10Volans: "> Patch Set 8:" [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) (owner: 10CRusnov) [06:47:57] (03PS1) 10Urbanecm: Remove autopromote to patroller on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521819 [06:48:45] (03PS2) 10Urbanecm: Remove autopromote to patroller on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521819 (https://phabricator.wikimedia.org/T168718) [06:53:17] (03CR) 10Muehlenhoff: "@Eric: Yes, we default to Stretch currently." [puppet] - 10https://gerrit.wikimedia.org/r/521586 (owner: 10Muehlenhoff) [06:55:59] RECOVERY - puppet last run on etcd1003 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:57:42] (03PS2) 10Urbanecm: Optimalize unoptimalized logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521632 (https://phabricator.wikimedia.org/T227635) [06:58:26] (03PS3) 10Urbanecm: Remove fawikiquote HD logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521527 (https://phabricator.wikimedia.org/T211413) [06:58:58] (03PS17) 10Urbanecm: Fix several incorrect logo sizes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521316 (https://phabricator.wikimedia.org/T211413) [06:59:30] (03PS28) 10Urbanecm: Test if 2x logo version is 2 times bigger than 1x logo version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521181 (https://phabricator.wikimedia.org/T211413) [07:02:17] 10Operations, 10Phabricator, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 (10MoritzMuehlenhoff) >>! In T190568#5319370, @Dzahn wrote: > Next we need to make a decision whether we keep phab1003 as the p... [07:12:16] 10Operations, 10Analytics, 10netops, 10LDAP: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN - https://phabricator.wikimedia.org/T227611 (10MoritzMuehlenhoff) There are two issues here: 1. We'll need to fix the ACLs so that the analytics VLAN can access the ldap-ro replicas, there's a w... [07:15:23] (03PS1) 10Elukey: Add prometheus node exporter for AMD ROCm's GPU stats [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) [07:18:12] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey: Investigate if a Prometheus exporter for the AMD GPU(s) can be easily created - https://phabricator.wikimedia.org/T220784 (10elukey) 05Stalled→03Open [07:18:18] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) [07:20:57] 10Operations, 10Analytics, 10Patch-For-Review, 10User-Elukey: Investigate if a Prometheus exporter for the AMD GPU(s) can be easily created - https://phabricator.wikimedia.org/T220784 (10elukey) I filed a code review to create the initial version of the node exporter, with the following metrics: * usage pe... [07:21:13] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Investigate if a Prometheus exporter for the AMD GPU(s) can be easily created - https://phabricator.wikimedia.org/T220784 (10elukey) [07:32:22] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/521586 (owner: 10Muehlenhoff) [07:32:25] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Investigate if a Prometheus exporter for the AMD GPU(s) can be easily created - https://phabricator.wikimedia.org/T220784 (10MoritzMuehlenhoff) Parsing "radeontop -d" might also be an interesting data source. [07:36:02] 10Operations, 10Analytics, 10netops, 10LDAP: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN - https://phabricator.wikimedia.org/T227611 (10elukey) About 1. ` elukey@re0.cr1-eqiad# show | compare [edit firewall family inet filter analytics-in4 term ldap from destination-address]... [07:42:24] (03CR) 10Filippo Giunchedi: "LGTM overall, see comment about metric naming" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) (owner: 10Elukey) [07:46:46] (03CR) 10Volans: "I did a quick pass on the Python only and I've left a couple of optional comments inline. I'll leave the prometheus part of the Python to " (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) (owner: 10Elukey) [07:47:29] 10Operations, 10SRE-Access-Requests: Apply updated YubiKey SSH keys for aaron - https://phabricator.wikimedia.org/T227638 (10aaron) [07:47:45] (03PS3) 10Aaron Schulz: Update my obsolete YubiKey-stored SSH keys [puppet] - 10https://gerrit.wikimedia.org/r/519941 (https://phabricator.wikimedia.org/T227638) [07:53:49] (03CR) 10Elukey: "Thanks for the reviews, will amend my code!" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) (owner: 10Elukey) [07:59:04] (03CR) 10Vgutierrez: [C: 03+2] Add DNS entries for ncredir[12]002 [dns] - 10https://gerrit.wikimedia.org/r/521817 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:04:19] (03PS2) 10Muehlenhoff: Switch restbase1017 to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/521586 [08:06:03] (03CR) 10Filippo Giunchedi: set up debian packaging (036 comments) [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/521580 (https://phabricator.wikimedia.org/T196066) (owner: 10Cwhite) [08:06:05] (03CR) 10Muehlenhoff: [C: 03+2] Switch restbase1017 to Stretch [puppet] - 10https://gerrit.wikimedia.org/r/521586 (owner: 10Muehlenhoff) [08:06:23] !log vgutierrez@cumin1001 START - Cookbook sre.ganeti.makevm [08:06:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:06:34] !log vgutierrez@cumin1001 START - Cookbook sre.ganeti.makevm [08:06:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:07] (03CR) 10Filippo Giunchedi: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/520296 (https://phabricator.wikimedia.org/T209182) (owner: 10CRusnov) [08:10:38] 10Operations, 10Patch-For-Review: Decommission servermon - https://phabricator.wikimedia.org/T198939 (10akosiaris) There was some discussions during the SRE offsite regarding this. @faidon and @Volans have the details, but the gist of it is that servermon still provides 1 functionality that puppetboard does no... [08:12:26] (03PS3) 10Muehlenhoff: Add orespoolcounter[12]00[34] to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/521498 (https://phabricator.wikimedia.org/T227640) [08:12:56] (03CR) 10Jcrespo: prometheus-mysqld-exporter: Automate targets based on zarcillo db (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/519203 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [08:13:12] (03PS1) 10Fsero: registry, swift: some images are not replicated. [puppet] - 10https://gerrit.wikimedia.org/r/521828 (https://phabricator.wikimedia.org/T227570) [08:14:16] (03PS4) 10Muehlenhoff: Add orespoolcounter[12]00[34] to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/521498 (https://phabricator.wikimedia.org/T227640) [08:14:41] (03CR) 10Jcrespo: "> Patch Set 14:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/519203 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [08:14:53] (03PS2) 10Vgutierrez: ncredir: Provide /_status endpoint even when a redirection rule matches [puppet] - 10https://gerrit.wikimedia.org/r/521814 (https://phabricator.wikimedia.org/T133548) [08:16:02] !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [08:16:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:06] !log vgutierrez@cumin1001 END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) [08:16:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:53] (03CR) 10Muehlenhoff: [C: 03+2] Add orespoolcounter[12]00[34] to site.pp [puppet] - 10https://gerrit.wikimedia.org/r/521498 (https://phabricator.wikimedia.org/T227640) (owner: 10Muehlenhoff) [08:25:22] 10Operations, 10Analytics, 10Analytics-Kanban, 10Discovery, and 2 others: Make hadoop cluster able to push to swift - https://phabricator.wikimedia.org/T219544 (10fgiunchedi) Sounds good to me, note that there are rate limits in place for write operations (`modules/swift/templates/proxy-server.conf.erb`) i... [08:28:51] 10Operations, 10ops-codfw, 10ops-eqiad: Document PDU models - https://phabricator.wikimedia.org/T227632 (10fgiunchedi) Not sure if relevant at the netbox level, but we have at least two different types/models of PDUs, namely those that expose the `sentry3` SNMP MIB vs `sentry4` (newer, PDUs in ulsfo and the... [08:33:57] (03PS2) 10Alexandros Kosiaris: Add Niklas to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/521539 (owner: 10KartikMistry) [08:34:03] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Add Niklas to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/521539 (owner: 10KartikMistry) [08:40:56] (03CR) 10Ema: [C: 03+1] ncredir: Provide /_status endpoint even when a redirection rule matches [puppet] - 10https://gerrit.wikimedia.org/r/521814 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:41:34] (03CR) 10Vgutierrez: [C: 03+2] ncredir: Provide /_status endpoint even when a redirection rule matches [puppet] - 10https://gerrit.wikimedia.org/r/521814 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [08:41:44] (03PS3) 10Vgutierrez: ncredir: Provide /_status endpoint even when a redirection rule matches [puppet] - 10https://gerrit.wikimedia.org/r/521814 (https://phabricator.wikimedia.org/T133548) [08:44:32] (03CR) 10Filippo Giunchedi: "LGTM as far as Prometheus part goes (untested though)" [puppet] - 10https://gerrit.wikimedia.org/r/519203 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [08:44:36] (03CR) 10Filippo Giunchedi: [C: 03+1] prometheus-mysqld-exporter: Automate targets based on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/519203 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [08:45:03] (03PS2) 10Elukey: Add prometheus node exporter for AMD ROCm's GPU stats [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) [08:46:59] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/17287/stat1005.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) (owner: 10Elukey) [08:48:05] (03CR) 10Volans: "Some general comments inline. I'm also wondering if we should keep the device type free or allow for a default one maybe in the config. So" (035 comments) [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) (owner: 10CRusnov) [08:49:21] (03PS1) 10Vgutierrez: install_server: Handle installation of ncredir[12]002 [puppet] - 10https://gerrit.wikimedia.org/r/521829 (https://phabricator.wikimedia.org/T133548) [08:49:23] (03PS1) 10Vgutierrez: hieradata: Grant access to ncredir[12]002 to non-canonical-redirect certs [puppet] - 10https://gerrit.wikimedia.org/r/521830 (https://phabricator.wikimedia.org/T133548) [08:49:25] (03PS1) 10Vgutierrez: site: Set ncredir role for ncredir[12]002 instances [puppet] - 10https://gerrit.wikimedia.org/r/521831 (https://phabricator.wikimedia.org/T133548) [08:55:18] (03PS1) 10Elukey: Use the ldap-ro endpoint for Hue and Jupyter [puppet] - 10https://gerrit.wikimedia.org/r/521832 (https://phabricator.wikimedia.org/T227611) [09:11:00] (03PS1) 10Vgutierrez: Release 0.19 [software/acme-chief] - 10https://gerrit.wikimedia.org/r/521834 (https://phabricator.wikimedia.org/T225945) [09:12:15] (03PS1) 10Muehlenhoff: Switch ORES pool counters for codfw to 2003/2004 [puppet] - 10https://gerrit.wikimedia.org/r/521835 (https://phabricator.wikimedia.org/T227640) [09:16:14] (03PS2) 10Elukey: profile::swap: use the ldap-ro endpoint [puppet] - 10https://gerrit.wikimedia.org/r/521832 (https://phabricator.wikimedia.org/T227611) [09:19:46] !log disabled puppet on hosts using acme_chief::cert for reboots of acmechief hosts [09:19:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:53] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [09:21:56] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [09:21:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:54] !log rebooting acmechief2001 to pick up MDS-enabled qemu [09:22:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:30] !log rearmed keyholder on acmechief2001 [09:25:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:20] moritzm: nice, we already forget about that one :) [09:26:24] s/already/always/ [09:26:45] (03PS1) 10Fsero: k8s: putting a deprecation notice on scap-helm [puppet] - 10https://gerrit.wikimedia.org/r/521836 (https://phabricator.wikimedia.org/T212130) [09:27:17] (03CR) 10Alexandros Kosiaris: [C: 03+1] k8s: putting a deprecation notice on scap-helm [puppet] - 10https://gerrit.wikimedia.org/r/521836 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero) [09:27:48] vgutierrez: 2001 looks all fine to me, I'll proceed with 1001 [09:27:59] nice [09:28:06] 10Operations, 10Analytics, 10Analytics-Kanban, 10Cleanup, 10Patch-For-Review: Archive zookeeper puppet submodule - https://phabricator.wikimedia.org/T227164 (10elukey) [09:28:09] (03CR) 10Fsero: [C: 03+2] "Thanks Alex!" [puppet] - 10https://gerrit.wikimedia.org/r/521836 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero) [09:28:21] (03CR) 10Ema: [C: 03+1] install_server: Handle installation of ncredir[12]002 [puppet] - 10https://gerrit.wikimedia.org/r/521829 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:29:05] !log rebooting acmechief1001 to pick up MDS-enabled qemu [09:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:11] 10Operations, 10Analytics, 10Analytics-Kanban, 10Cleanup, 10Patch-For-Review: Archive zookeeper puppet submodule - https://phabricator.wikimedia.org/T227164 (10elukey) >>! In T227164#5302590, @elukey wrote: > There are some pull requests to close in https://github.com/wikimedia/puppet-zookeeper/pulls and... [09:33:48] (03CR) 10Vgutierrez: [C: 03+2] install_server: Handle installation of ncredir[12]002 [puppet] - 10https://gerrit.wikimedia.org/r/521829 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:33:57] (03PS2) 10Vgutierrez: install_server: Handle installation of ncredir[12]002 [puppet] - 10https://gerrit.wikimedia.org/r/521829 (https://phabricator.wikimedia.org/T133548) [09:35:20] fsero: mind if I merge your change? [09:35:29] please<# [09:35:31] <3 [09:35:40] merging.. :) [09:36:01] i know you have a battle to win to marostegui [09:36:06] go go go [09:36:15] (03PS1) 10Jcrespo: prometheus: Add fake prometheus labs password [labs/private] - 10https://gerrit.wikimedia.org/r/521839 (https://phabricator.wikimedia.org/T143896) [09:36:37] fsero: merged [09:36:39] !log rearmed keyholder on acmechief1001 [09:36:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:09] (03CR) 10Ema: hieradata: Grant access to ncredir[12]002 to non-canonical-redirect certs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/521830 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:37:19] moritzm: acmechief1001 looking good :D [09:37:29] (03CR) 10Marostegui: [C: 03+1] prometheus: Add fake prometheus labs password [labs/private] - 10https://gerrit.wikimedia.org/r/521839 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [09:37:39] !log docker-registry: running manual only once swift-container-sync on ms-be2019 [09:37:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:46] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 43 probes of 437 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [09:38:37] vgutierrez: yep, I'll reenable puppet shortly [09:38:48] !log doing the same on ms-be1030 [09:38:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:39] (03CR) 10Ema: [C: 03+1] site: Set ncredir role for ncredir[12]002 instances [puppet] - 10https://gerrit.wikimedia.org/r/521831 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:40:05] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] prometheus: Add fake prometheus labs password [labs/private] - 10https://gerrit.wikimedia.org/r/521839 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [09:40:28] (03PS2) 10Vgutierrez: hieradata: Grant access to ncredir[12]002 to non-canonical-redirect certs [puppet] - 10https://gerrit.wikimedia.org/r/521830 (https://phabricator.wikimedia.org/T133548) [09:40:30] (03PS2) 10Vgutierrez: site: Set ncredir role for ncredir[12]002 instances [puppet] - 10https://gerrit.wikimedia.org/r/521831 (https://phabricator.wikimedia.org/T133548) [09:40:32] fsero: lazy logging? ;P [09:41:04] i prefer to call it contextual logging :D [09:41:17] (03PS5) 10Jbond: puppet: refactor remove puppetdb_major_version [puppet] - 10https://gerrit.wikimedia.org/r/521514 (https://phabricator.wikimedia.org/T227587) [09:41:41] (03PS6) 10Jbond: puppet: refactor remove puppetdb_major_version [puppet] - 10https://gerrit.wikimedia.org/r/521514 (https://phabricator.wikimedia.org/T227587) [09:42:07] (03CR) 10Filippo Giunchedi: [C: 03+1] "> Patch Set 4: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/521514 (https://phabricator.wikimedia.org/T227587) (owner: 10Jbond) [09:43:14] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 21 probes of 437 (alerts on 35) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts [09:44:58] fsero: :) [09:45:57] (03CR) 10Vgutierrez: "Fixed, thanks!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/521830 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:48:20] (03CR) 10Ema: [C: 03+1] hieradata: Grant access to ncredir[12]002 to non-canonical-redirect certs [puppet] - 10https://gerrit.wikimedia.org/r/521830 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [09:50:48] (03CR) 10ArielGlenn: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) (owner: 10ArielGlenn) [09:51:55] (03CR) 10Reedy: [C: 04-1] Optimalize unoptimalized logos (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521632 (https://phabricator.wikimedia.org/T227635) (owner: 10Urbanecm) [09:54:15] (03PS3) 10Urbanecm: Optimalize logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521632 (https://phabricator.wikimedia.org/T227635) [09:54:33] (03PS4) 10Urbanecm: Optimise logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521632 (https://phabricator.wikimedia.org/T227635) [09:54:44] !log disabling puppet on prometheus* hosts for upcoming deploy [09:54:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:01] (03CR) 10Urbanecm: "> Patch Set 2: Code-Review-1" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521632 (https://phabricator.wikimedia.org/T227635) (owner: 10Urbanecm) [09:55:37] (03PS17) 10Jcrespo: prometheus-mysqld-exporter: Automate targets based on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/519203 (https://phabricator.wikimedia.org/T143896) [09:57:52] !log re-enabled puppet on hosts using acme_chief::cert for reboots of acmechief hosts (actually did that 20 minutes ago, but missed to log earlier) [09:57:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:58] (03CR) 10Volans: "I've checked mostly the Python part, looks good overall, a general comment and some nits inline" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/520337 (owner: 10Ayounsi) [09:58:21] (03CR) 10Jcrespo: [C: 03+2] prometheus-mysqld-exporter: Automate targets based on zarcillo db [puppet] - 10https://gerrit.wikimedia.org/r/519203 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [10:04:52] PROBLEM - puppet last run on prometheus2003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [10:05:29] ^ this is being handled [10:06:19] (03PS1) 10Jcrespo: mysql-prometheus-exporter: Fix typo on puppet requirement [puppet] - 10https://gerrit.wikimedia.org/r/521845 (https://phabricator.wikimedia.org/T143896) [10:06:51] (03CR) 10Marostegui: [C: 03+1] mysql-prometheus-exporter: Fix typo on puppet requirement [puppet] - 10https://gerrit.wikimedia.org/r/521845 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [10:07:06] (03CR) 10Jcrespo: [C: 03+2] mysql-prometheus-exporter: Fix typo on puppet requirement [puppet] - 10https://gerrit.wikimedia.org/r/521845 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [10:07:08] (03CR) 10jerkins-bot: [V: 04-1] mysql-prometheus-exporter: Fix typo on puppet requirement [puppet] - 10https://gerrit.wikimedia.org/r/521845 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [10:09:26] (03CR) 10MarcoAurelio: [C: 03+1] Remove autopromote to patroller on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521819 (https://phabricator.wikimedia.org/T168718) (owner: 10Urbanecm) [10:10:17] (03CR) 10Vgutierrez: [C: 03+2] hieradata: Grant access to ncredir[12]002 to non-canonical-redirect certs [puppet] - 10https://gerrit.wikimedia.org/r/521830 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [10:10:47] (03PS3) 10Vgutierrez: hieradata: Grant access to ncredir[12]002 to non-canonical-redirect certs [puppet] - 10https://gerrit.wikimedia.org/r/521830 (https://phabricator.wikimedia.org/T133548) [10:11:34] PROBLEM - puppet last run on bast5001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [10:12:22] (03PS2) 10Jcrespo: mysql-prometheus-exporter: Fix typo on puppet requirement [puppet] - 10https://gerrit.wikimedia.org/r/521845 (https://phabricator.wikimedia.org/T143896) [10:13:37] (03PS17) 10ArielGlenn: refactor wikidata entity dumps into wikibase + wikidata specific bits [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) [10:14:29] (03CR) 10Jcrespo: [C: 03+2] mysql-prometheus-exporter: Fix typo on puppet requirement [puppet] - 10https://gerrit.wikimedia.org/r/521845 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [10:15:14] (03CR) 10MarcoAurelio: [C: 03+1] Disable local uploads on wuuwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521497 (https://phabricator.wikimedia.org/T226764) (owner: 10Urbanecm) [10:15:25] (03CR) 10MarcoAurelio: [C: 03+1] Enable StopForumSpam on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521233 (https://phabricator.wikimedia.org/T181217) (owner: 10Reedy) [10:15:58] (03CR) 10MarcoAurelio: [C: 03+1] "(only if it works, otherwise there's no point - cfr. open Phab tasks about this extension)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521233 (https://phabricator.wikimedia.org/T181217) (owner: 10Reedy) [10:18:37] (03CR) 10Vgutierrez: [C: 03+2] site: Set ncredir role for ncredir[12]002 instances [puppet] - 10https://gerrit.wikimedia.org/r/521831 (https://phabricator.wikimedia.org/T133548) (owner: 10Vgutierrez) [10:18:47] (03PS3) 10Vgutierrez: site: Set ncredir role for ncredir[12]002 instances [puppet] - 10https://gerrit.wikimedia.org/r/521831 (https://phabricator.wikimedia.org/T133548) [10:19:37] (03PS3) 10Jcrespo: mysql-prometheus-exporter: Fix typo on puppet requirement [puppet] - 10https://gerrit.wikimedia.org/r/521845 (https://phabricator.wikimedia.org/T143896) [10:19:55] 10Operations, 10Analytics, 10netops, 10LDAP, 10Patch-For-Review: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN - https://phabricator.wikimedia.org/T227611 (10elukey) I am a little bit lost with LDAP config, since we use: 1) ldap-labs.eqiad.wikimedia.org in Jupyterhub's config withou... [10:22:05] (03PS4) 10Vgutierrez: site: Set ncredir role for ncredir[12]002 instances [puppet] - 10https://gerrit.wikimedia.org/r/521831 (https://phabricator.wikimedia.org/T133548) [10:22:30] (03PS3) 10Elukey: Add prometheus node exporter for AMD ROCm's GPU stats [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) [10:24:13] (03PS4) 10Elukey: Add prometheus node exporter for AMD ROCm's GPU stats [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) [10:28:53] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [10:31:36] (03CR) 10Reedy: "Yeah... We kinda need to setup a cronjob to pull the file first, and then run the import script, otherwise it's kinda pointless as is" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521233 (https://phabricator.wikimedia.org/T181217) (owner: 10Reedy) [10:35:45] (03PS1) 10Jcrespo: mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs [puppet] - 10https://gerrit.wikimedia.org/r/521847 (https://phabricator.wikimedia.org/T143896) [10:37:27] (03PS2) 10Jcrespo: mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs [puppet] - 10https://gerrit.wikimedia.org/r/521847 (https://phabricator.wikimedia.org/T143896) [10:38:02] (03CR) 10Faidon Liambotis: Add rpkicounter (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/520337 (owner: 10Ayounsi) [10:38:23] (03CR) 10Jcrespo: [C: 03+2] mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs [puppet] - 10https://gerrit.wikimedia.org/r/521847 (https://phabricator.wikimedia.org/T143896) (owner: 10Jcrespo) [10:40:19] PROBLEM - puppet last run on bast4002 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [10:49:16] (03PS4) 10Faidon Liambotis: Add rpkicounter [puppet] - 10https://gerrit.wikimedia.org/r/520337 (owner: 10Ayounsi) [10:50:49] (03PS1) 10Jcrespo: Revert "mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs" [puppet] - 10https://gerrit.wikimedia.org/r/521848 [10:51:40] (03CR) 10Jcrespo: [C: 03+2] Revert "mysqld-prometheus-exporter: Require python3 pymysql and yaml pkgs" [puppet] - 10https://gerrit.wikimedia.org/r/521848 (owner: 10Jcrespo) [10:52:10] 08Warning Alert for device cr2-esams.wikimedia.org - Memory over 85% [10:52:20] (03PS1) 10Jcrespo: Revert "mysql-prometheus-exporter: Fix typo on puppet requirement" [puppet] - 10https://gerrit.wikimedia.org/r/521849 [10:52:46] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] Revert "mysql-prometheus-exporter: Fix typo on puppet requirement" [puppet] - 10https://gerrit.wikimedia.org/r/521849 (owner: 10Jcrespo) [10:52:58] (03PS2) 10Jcrespo: Revert "mysql-prometheus-exporter: Fix typo on puppet requirement" [puppet] - 10https://gerrit.wikimedia.org/r/521849 [10:53:09] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] Revert "mysql-prometheus-exporter: Fix typo on puppet requirement" [puppet] - 10https://gerrit.wikimedia.org/r/521849 (owner: 10Jcrespo) [10:54:18] (03PS1) 10Jcrespo: Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db" [puppet] - 10https://gerrit.wikimedia.org/r/521850 [10:54:30] (03PS2) 10Jcrespo: Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db" [puppet] - 10https://gerrit.wikimedia.org/r/521850 [10:54:44] (03PS1) 10Ema: cache: fix text VTC tests [puppet] - 10https://gerrit.wikimedia.org/r/521851 [10:54:46] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db" [puppet] - 10https://gerrit.wikimedia.org/r/521850 (owner: 10Jcrespo) [10:55:25] (03PS1) 10Jcrespo: Revert "Revert "prometheus-mysqld-exporter: Automate targets based on zarcillo db"" [puppet] - 10https://gerrit.wikimedia.org/r/521852 [10:56:42] 10Operations, 10Traffic, 10Goal, 10HTTPS, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Vgutierrez) [10:58:16] 10Operations, 10LDAP: Migrate web services using LDAP authentication towards the readonly LDAP replicas - https://phabricator.wikimedia.org/T227650 (10MoritzMuehlenhoff) [10:58:30] (03CR) 10Volans: [C: 03+1] "LGTM, although I'm not very familiar with the kafkatee and prometheus puppetization, so better if someone could have a look at it too." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/520337 (owner: 10Ayounsi) [11:00:04] Amir1, Lucas_WMDE, and Urbanecm: #bothumor My software never has bugs. It just develops random features. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190710T1100). [11:00:04] No GERRIT patches in the queue for this window AFAICS. [11:00:19] 10Operations, 10ops-eqiad, 10Cassandra, 10Core Platform Team (Security, stability, performance and scalability (TEC1)), and 4 others: Fix restbase1017's physical rack - https://phabricator.wikimedia.org/T222960 (10MoritzMuehlenhoff) >>! In T222960#5319707, @Eevans wrote: > Stretch would be preferable. I'v... [11:01:10] * Urbanecm has few things to deploy [11:01:36] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521527 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [11:01:39] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521632 (https://phabricator.wikimedia.org/T227635) (owner: 10Urbanecm) [11:01:44] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521316 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [11:01:52] why is gerrit SO slow at git fetch grr [11:02:29] (03PS2) 10Ema: cache: refresh text VTC tests [puppet] - 10https://gerrit.wikimedia.org/r/521851 [11:02:31] (03Merged) 10jenkins-bot: Remove fawikiquote HD logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521527 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [11:02:35] (03Merged) 10jenkins-bot: Optimise logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521632 (https://phabricator.wikimedia.org/T227635) (owner: 10Urbanecm) [11:02:39] RECOVERY - puppet last run on prometheus2003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:02:49] (03Merged) 10jenkins-bot: Fix several incorrect logo sizes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521316 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [11:02:58] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521576 (https://phabricator.wikimedia.org/T227606) (owner: 10Aklapper) [11:03:08] (03CR) 10jenkins-bot: Remove fawikiquote HD logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521527 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [11:03:14] (03CR) 10jenkins-bot: Optimise logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521632 (https://phabricator.wikimedia.org/T227635) (owner: 10Urbanecm) [11:03:19] (03CR) 10jenkins-bot: Fix several incorrect logo sizes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521316 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [11:03:24] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521819 (https://phabricator.wikimedia.org/T168718) (owner: 10Urbanecm) [11:03:53] (03Merged) 10jenkins-bot: Fix non-working "raw text" links on noc.wikimedia.org web pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521576 (https://phabricator.wikimedia.org/T227606) (owner: 10Aklapper) [11:03:55] (03CR) 10Ema: [C: 03+2] cache: refresh text VTC tests [puppet] - 10https://gerrit.wikimedia.org/r/521851 (owner: 10Ema) [11:04:10] (03CR) 10jenkins-bot: Fix non-working "raw text" links on noc.wikimedia.org web pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521576 (https://phabricator.wikimedia.org/T227606) (owner: 10Aklapper) [11:05:22] (03PS5) 10Ema: vcl: remove WP Zero code [puppet] - 10https://gerrit.wikimedia.org/r/521488 (https://phabricator.wikimedia.org/T213769) [11:05:41] (03PS3) 10Urbanecm: Remove autopromote to patroller on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521819 (https://phabricator.wikimedia.org/T168718) [11:05:45] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:05:48] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521819 (https://phabricator.wikimedia.org/T168718) (owner: 10Urbanecm) [11:06:06] (03CR) 10Jcrespo: [C: 04-1] "So the script itself worked, now I will analyze the generated config is correct. But it generated the files, and it didn't write them on s" [puppet] - 10https://gerrit.wikimedia.org/r/521852 (owner: 10Jcrespo) [11:06:17] !log urbanecm@deploy1001 Synchronized docroot/noc/conf/highlight.php: SWAT: [[:gerrit:521576|Fix non-working "raw text" links on noc.wikimedia.org web pages]] (T227606) (duration: 01m 02s) [11:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:06:24] T227606: Bug at noc.wikimedia.org - "raw text" links are broken - https://phabricator.wikimedia.org/T227606 [11:06:43] (03Merged) 10jenkins-bot: Remove autopromote to patroller on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521819 (https://phabricator.wikimedia.org/T168718) (owner: 10Urbanecm) [11:06:59] (03CR) 10jenkins-bot: Remove autopromote to patroller on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521819 (https://phabricator.wikimedia.org/T168718) (owner: 10Urbanecm) [11:07:43] !log urbanecm@deploy1001 sync-file aborted: SWAT: Several logo changes (T227635 T211413) (duration: 00m 20s) [11:07:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:49] T211413: Test if 2x/1x logo version is 2 times bigger than 1x logo version - https://phabricator.wikimedia.org/T211413 [11:07:49] T227635: Optimise logos - https://phabricator.wikimedia.org/T227635 [11:08:51] RECOVERY - puppet last run on bast5001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:09:03] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove fawikiquote HD logo (T211413) (duration: 00m 57s) [11:09:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:07] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: SWAT: Several logo changes (T227635 T211413) (duration: 01m 00s) [11:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:37] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove autopromote to patroller on testwiki (T168718) (duration: 00m 58s) [11:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:42] T168718: Testwiki user rights cleanup request - https://phabricator.wikimedia.org/T168718 [11:11:57] !log EU SWAT done [11:12:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:16] RECOVERY - puppet last run on bast4002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [11:28:06] (03PS2) 10Muehlenhoff: Remove obsolete comments [puppet] - 10https://gerrit.wikimedia.org/r/521458 [11:30:07] (03CR) 10Muehlenhoff: [C: 03+2] Remove obsolete comments [puppet] - 10https://gerrit.wikimedia.org/r/521458 (owner: 10Muehlenhoff) [11:46:38] (03CR) 10BBlack: [C: 03+1] varnish: remove WP Zero puppetization [puppet] - 10https://gerrit.wikimedia.org/r/521510 (https://phabricator.wikimedia.org/T213769) (owner: 10Ema) [11:46:45] (03CR) 10BBlack: [C: 03+1] vcl: do not set WP Zero X-Carrier headers [puppet] - 10https://gerrit.wikimedia.org/r/521511 (https://phabricator.wikimedia.org/T213769) (owner: 10Ema) [11:46:50] (03CR) 10BBlack: [C: 03+1] vcl: remove WP Zero code [puppet] - 10https://gerrit.wikimedia.org/r/521488 (https://phabricator.wikimedia.org/T213769) (owner: 10Ema) [11:47:10] 08̶W̶a̶r̶n̶i̶n̶g Device cr2-esams.wikimedia.org recovered from Memory over 85% [11:51:32] !log Purged 24 urls for T227635 [11:51:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:38] T227635: Optimise logos - https://phabricator.wikimedia.org/T227635 [11:53:37] !log Purged 14 urls for T211413 [11:53:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:41] T211413: Test if 2x/1x logo version is 2 times bigger than 1x logo version - https://phabricator.wikimedia.org/T211413 [11:54:30] (03PS29) 10Urbanecm: Test if 2x logo version is 2 times bigger than 1x logo version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521181 (https://phabricator.wikimedia.org/T211413) [11:54:55] (03CR) 10Urbanecm: "FTR, 1x logo will be handled separately, given the probable need of other underlying changes." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521181 (https://phabricator.wikimedia.org/T211413) (owner: 10Urbanecm) [11:57:20] (03PS1) 10Muehlenhoff: Remove Chase from secteam-users [puppet] - 10https://gerrit.wikimedia.org/r/521858 [12:27:10] 08Warning Alert for device cr2-esams.wikimedia.org - Memory over 85% [12:28:25] 10Operations: Reduce memory allocation for ldap-eqiad-replica instances - https://phabricator.wikimedia.org/T227657 (10MoritzMuehlenhoff) [12:31:55] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/521858 (owner: 10Muehlenhoff) [12:34:48] (03PS2) 10Muehlenhoff: Remove Chase from secteam-users [puppet] - 10https://gerrit.wikimedia.org/r/521858 [12:39:03] (03CR) 10Muehlenhoff: [C: 03+2] Remove Chase from secteam-users [puppet] - 10https://gerrit.wikimedia.org/r/521858 (owner: 10Muehlenhoff) [12:41:35] gehel: when you get a chance, what do you think of https://phabricator.wikimedia.org/T184942#5306028 and following messages ? [12:42:26] godog: looking [12:47:48] (03PS1) 10Urbanecm: Change bawikibooks logo to correct one according to community [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521865 (https://phabricator.wikimedia.org/T227418) [12:48:03] 10Operations, 10User-fgiunchedi: CPU scaling governor audit - https://phabricator.wikimedia.org/T225713 (10fgiunchedi) For some reasons I don't seem to be able to set `oemhp_powerreg` on ms-be2022, I'll try rebooting ` hpiLO-> show /system1/oemhp_power1 status=0 status... [12:49:14] gehel: thank you! [12:49:17] (03PS2) 10Urbanecm: Change bawikibooks logo to correct one according to community [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521865 (https://phabricator.wikimedia.org/T227418) [12:49:27] !log reboot ms-be2022 - T225713 [12:49:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:40] T225713: CPU scaling governor audit - https://phabricator.wikimedia.org/T225713 [12:50:12] godog: btw, do you know why we have no data on the graph after July 5? [12:50:30] gehel: on which graph? [12:50:38] https://grafana.wikimedia.org/d/kcAMMw4Wk/maps-performances-filippo-t184942?orgId=1&panelId=18&fullscreen&edit [12:51:24] mmhh no, I'm looking into it though [12:52:07] godog: I don't entirely understand how histogream_quantile() works, or how it would handle the sum() [12:52:37] I suspect the difference is that we don't actually measure the same thing, but I really don't know how those metrics are collected [12:55:30] (03PS6) 10Ema: vcl: remove WP Zero code [puppet] - 10https://gerrit.wikimedia.org/r/521488 (https://phabricator.wikimedia.org/T213769) [12:56:11] (03CR) 10Ema: [C: 03+2] vcl: remove WP Zero code [puppet] - 10https://gerrit.wikimedia.org/r/521488 (https://phabricator.wikimedia.org/T213769) (owner: 10Ema) [12:56:54] (03CR) 10Elukey: [C: 03+1] Add rpkicounter [puppet] - 10https://gerrit.wikimedia.org/r/520337 (owner: 10Ayounsi) [12:57:32] gehel: ah yeah I got it why the metrics stop on the 5th, that's when we retired the last varnish from cache upload [12:57:54] ok different problem in this case, thinking about what the answer is here [12:59:08] gehel: to answer your question, in theory we should measure the same thing because the source is the same (namely reading from varnish's ring buffer as requests come in) [12:59:42] (03CR) 10MarcoAurelio: [C: 03+1] "> Yeah... We kinda need to setup a cronjob to pull the file first," [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521233 (https://phabricator.wikimedia.org/T181217) (owner: 10Reedy) [13:00:03] gehel: btw the same graphs on https://grafana.wikimedia.org/d/000000305/maps-performances?orgId=1 also stop for the same reason as the prometheus ones do [13:04:46] (03PS2) 10Ema: varnish: remove WP Zero puppetization [puppet] - 10https://gerrit.wikimedia.org/r/521510 (https://phabricator.wikimedia.org/T213769) [13:05:09] godog: how can we recalculate histograms after the fact ? Does Prometheus keep some kind of decaying reservoir of all requests ? [13:06:56] (03CR) 10Ema: [C: 03+2] varnish: remove WP Zero puppetization [puppet] - 10https://gerrit.wikimedia.org/r/521510 (https://phabricator.wikimedia.org/T213769) (owner: 10Ema) [13:07:01] gehel: no decay, there are buckets for requests that took <= bucket value, and those are incremented as requests come in on each hosts [13:07:36] then those are summed and percentiles calculated, within an error of course [13:07:52] https://prometheus.io/docs/concepts/metric_types/#histogram and https://prometheus.io/docs/practices/histograms/ explain this better than I do [13:10:21] (03PS2) 10Ema: vcl: do not set WP Zero X-Carrier headers [puppet] - 10https://gerrit.wikimedia.org/r/521511 (https://phabricator.wikimedia.org/T213769) [13:14:23] PROBLEM - puppet last run on cp1079 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/logrotate.d/zerofetch] [13:14:36] (03CR) 10Ema: [C: 03+2] vcl: do not set WP Zero X-Carrier headers [puppet] - 10https://gerrit.wikimedia.org/r/521511 (https://phabricator.wikimedia.org/T213769) (owner: 10Ema) [13:15:14] godog: what is doing the bucketing? [13:15:58] mtail in this particular case, IOW ourselves by configuring mtail [13:16:03] looks like the buckets we have are 10ms, 50ms, 100ms, 500ms, 1s, 5s, +Inf [13:16:40] so everything that is +5s ends up in the +inf bucket? [13:17:18] yes for that metric [13:17:20] so p99 is probably quite skewed by our choice of bucket [13:17:35] since maps does have a p99 > 5s [13:18:23] I want to program a cron job to run only on the beta cluster. Do I add it to mediawiki/ on puppet as with all other cron jobs or do I need to follow additional steps? [13:18:26] * gehel isn't really sure how the math works [13:19:21] RECOVERY - puppet last run on cp1079 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:21:37] gehel: quite possible yeah, those are generic buckets for all varnish backends but we could add more if warranted [13:22:10] It does not make sense to change them for maps, unless we are actively trying to improve our p99. [13:22:24] which we should, but that's a whole other story :( [13:22:50] !log reset ilo on ms-be2022 - bios can't talk to it on boot [13:22:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:33] yeah unfortunately afaik it is all or nothing, as in all varnish backend metrics need to have the same buckets [13:23:46] anyways, I'm filing a task now to investigate the same metrics for ATS [13:23:51] (03PS1) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521868 [13:23:55] gehel: thanks for taking a look! [13:23:57] godog: thanks! [13:24:31] (03CR) 10jerkins-bot: [V: 04-1] Jenkins job validation (DO NOT SUBMIT) [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521868 (owner: 10Hashar) [13:26:01] (03CR) 10Hashar: "recheck" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521868 (owner: 10Hashar) [13:26:57] (03CR) 10jerkins-bot: [V: 04-1] Jenkins job validation (DO NOT SUBMIT) [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521868 (owner: 10Hashar) [13:33:24] (03CR) 10Hashar: "recheck" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521868 (owner: 10Hashar) [13:44:44] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521868 (owner: 10Hashar) [13:45:37] (03PS1) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/521872 [13:47:58] !log cp hosts: cleanup WP zero leftovers T213769 [13:48:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:04] T213769: Zero VCL removal - https://phabricator.wikimedia.org/T213769 [13:52:10] 08Warning Alert for device cr2-esams.wikimedia.org - Memory over 85% [13:53:50] XioNoX: ^ [13:56:08] (03PS7) 10CRusnov: netbox: Add parameters and settings for storing things in Swift [puppet] - 10https://gerrit.wikimedia.org/r/520296 (https://phabricator.wikimedia.org/T209182) [13:56:30] ema gehel opened https://phabricator.wikimedia.org/T227668 [13:57:39] (03CR) 10CRusnov: [C: 03+2] netbox: Add parameters and settings for storing things in Swift [puppet] - 10https://gerrit.wikimedia.org/r/520296 (https://phabricator.wikimedia.org/T209182) (owner: 10CRusnov) [13:57:46] (03CR) 10MarcoAurelio: [C: 03+1] "Although I think we need to resolve T227454 first right?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521233 (https://phabricator.wikimedia.org/T181217) (owner: 10Reedy) [14:00:11] thanks godog [14:03:58] !log copy puppetdb-termini 4.4.0-1~wmf2 from stretch-wikimedia to jessie-wikimedia [14:04:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:34] (03CR) 10Paladox: [C: 03+2] "Builds successfully!" [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/521872 (owner: 10Paladox) [14:17:43] (03CR) 10Ottomata: "Agree with your comment in T227611, it'd be nice if we had a more global hiera var to use for this, rather than creating one of our own." [puppet] - 10https://gerrit.wikimedia.org/r/521832 (https://phabricator.wikimedia.org/T227611) (owner: 10Elukey) [14:18:20] (03Merged) 10jenkins-bot: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/521872 (owner: 10Paladox) [14:20:57] (03PS2) 10Jbond: puppet: update puppet-temini package name on buster [puppet] - 10https://gerrit.wikimedia.org/r/521536 (https://phabricator.wikimedia.org/T227587) [14:25:46] (03CR) 10Jbond: [C: 03+2] puppet: refactor remove puppetdb_major_version [puppet] - 10https://gerrit.wikimedia.org/r/521514 (https://phabricator.wikimedia.org/T227587) (owner: 10Jbond) [14:26:18] (03PS7) 10Jbond: puppet: refactor remove puppetdb_major_version [puppet] - 10https://gerrit.wikimedia.org/r/521514 (https://phabricator.wikimedia.org/T227587) [14:28:54] (03PS3) 10Jbond: puppet: update puppet-termini package name on buster [puppet] - 10https://gerrit.wikimedia.org/r/521536 (https://phabricator.wikimedia.org/T227587) [14:37:10] jouncebot, next [14:37:10] In 1 hour(s) and 22 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190710T1600) [14:37:53] (03PS1) 10Herron: kafka-main100[1-5]: add forward/reverse ipv4 dns entries [dns] - 10https://gerrit.wikimedia.org/r/521882 (https://phabricator.wikimedia.org/T226274) [14:38:23] (03CR) 10Jbond: [C: 03+2] wmflib: add new dirtree function. [puppet] - 10https://gerrit.wikimedia.org/r/521295 (owner: 10Jbond) [14:38:26] (03CR) 10Muehlenhoff: [C: 03+1] puppet: update puppet-termini package name on buster [puppet] - 10https://gerrit.wikimedia.org/r/521536 (https://phabricator.wikimedia.org/T227587) (owner: 10Jbond) [14:38:36] (03PS4) 10Jbond: wmflib: add new dirtree function. [puppet] - 10https://gerrit.wikimedia.org/r/521295 [14:42:21] (03CR) 10Jbond: [C: 03+2] puppet: update puppet-termini package name on buster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/521536 (https://phabricator.wikimedia.org/T227587) (owner: 10Jbond) [14:42:23] !log reimage ms-be2022 - T227667 [14:42:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:28] T227667: ms-be2022 misbehaving / error on boot - https://phabricator.wikimedia.org/T227667 [14:42:29] (03PS4) 10Jbond: puppet: update puppet-termini package name on buster [puppet] - 10https://gerrit.wikimedia.org/r/521536 (https://phabricator.wikimedia.org/T227587) [14:45:46] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/508088 (https://phabricator.wikimedia.org/T222017) (owner: 10Ammarpad) [14:51:18] !log upload varnish 5.1.3-1wm11 to stretch-wikimedia T227672 [14:51:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:24] T227672: Upgrade Varnish to 5.1.3-1wm11 - https://phabricator.wikimedia.org/T227672 [15:06:04] (03CR) 10Cwhite: set up debian packaging (032 comments) [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/521580 (https://phabricator.wikimedia.org/T196066) (owner: 10Cwhite) [15:07:38] Krinkle: does this look better? https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/521528 [15:08:20] !log restart wb2-phab wikibugs job [15:08:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:22] 10Operations, 10Traffic: Upgrade Varnish to 5.1.3-1wm11 - https://phabricator.wikimedia.org/T227672 (10ema) [15:14:26] 10Operations, 10ops-codfw, 10User-fgiunchedi: ms-be2022 misbehaving / error on boot - https://phabricator.wikimedia.org/T227667 (10fgiunchedi) a:03Papaul I think only power drain is left, not urgent because the host is back up now, when you get a chance! thanks [15:16:31] 10Operations, 10User-fgiunchedi: CPU scaling governor audit - https://phabricator.wikimedia.org/T225713 (10fgiunchedi) Ok now all codfw row D for ms-be hosts is running with `powersave`, will leave it like that for a little while, no adverse effects observed so far. If the trend continue I'll do all ms-be host... [15:21:46] 10Operations, 10serviceops: Confd died on bast3002 - https://phabricator.wikimedia.org/T227592 (10Aklapper) [15:23:20] !log cp-ulsfo: upgrade varnish to 5.1.3-1wm11 T227672 [15:23:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:23:25] T227672: Upgrade Varnish to 5.1.3-1wm11 - https://phabricator.wikimedia.org/T227672 [15:25:16] (03PS1) 10Jforrester: Drop zero.wikimedia.org (and .wikipedia.org) [dns] - 10https://gerrit.wikimedia.org/r/521886 (https://phabricator.wikimedia.org/T187716) [15:27:11] ottomata: seems fine yeah [15:27:47] James_F: :fire: [15:28:09] thanks! [15:35:01] (03PS1) 10Andrew Bogott: pam/sssd: enable mkhomedir if needed [puppet] - 10https://gerrit.wikimedia.org/r/521888 (https://phabricator.wikimedia.org/T227475) [15:35:42] (03Abandoned) 10Andrew Bogott: sssd: manage /etc/pam.d/common-session to ensure homedir creation [puppet] - 10https://gerrit.wikimedia.org/r/521793 (https://phabricator.wikimedia.org/T227475) (owner: 10Andrew Bogott) [15:36:11] (03PS1) 10Krinkle: beta: Remove use of wgOverrideHostname in CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521889 [15:39:54] ahoy, anyone around to help debug phab problems? there's a person on #wikimedia-dev reporting they can't create tasks (they get an exception) [15:40:04] (it works for me) [15:48:34] (03PS4) 10Jforrester: Disable ZeroBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 [15:48:36] (03PS5) 10Jforrester: Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) [15:48:38] (03PS5) 10Jforrester: robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) [15:48:40] (03PS5) 10Jforrester: Drop the ability to use ZeroBanner and ZeroPortal from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482102 (https://phabricator.wikimedia.org/T212865) [15:48:42] (03PS5) 10Jforrester: Stop configuring ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482103 (https://phabricator.wikimedia.org/T212865) [15:48:44] (03PS5) 10Jforrester: Stop loading i18n for ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482104 (https://phabricator.wikimedia.org/T212865) [15:48:46] (03PS1) 10Jforrester: Mark zerowiki as deleted [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) [15:48:48] (03PS1) 10Jforrester: Drop references to zerowiki configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521894 (https://phabricator.wikimedia.org/T187716) [15:49:32] (03CR) 10jerkins-bot: [V: 04-1] Disable ZeroBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 (owner: 10Jforrester) [15:49:42] (03Abandoned) 10Jforrester: zerowiki: Stop whitelisting ZeroPortal to logged out users, no longer available [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482101 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:49:49] (03CR) 10jerkins-bot: [V: 04-1] Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:50:18] (03CR) 10jerkins-bot: [V: 04-1] robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:50:48] (03CR) 10jerkins-bot: [V: 04-1] Drop the ability to use ZeroBanner and ZeroPortal from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482102 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:51:06] (03CR) 10jerkins-bot: [V: 04-1] Stop configuring ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482103 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:51:48] (03CR) 10jerkins-bot: [V: 04-1] Stop loading i18n for ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482104 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:51:59] Krinkle: That's the plan, yes. [15:52:06] (03CR) 10jerkins-bot: [V: 04-1] Mark zerowiki as deleted [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [15:52:26] (03CR) 10jerkins-bot: [V: 04-1] Drop references to zerowiki configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521894 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [15:54:08] PROBLEM - HHVM rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [15:54:36] (03PS2) 10Cwhite: set up debian packaging [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/521580 (https://phabricator.wikimedia.org/T196066) [15:54:38] (03CR) 10Cwhite: set up debian packaging (034 comments) [debs/prometheus-varnishkafka-exporter] - 10https://gerrit.wikimedia.org/r/521580 (https://phabricator.wikimedia.org/T196066) (owner: 10Cwhite) [15:54:40] PROBLEM - Nginx local proxy to apache on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [15:54:44] PROBLEM - Apache HTTP on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [15:54:46] (03PS2) 10Jforrester: Mark zerowiki as deleted and drop all configuration(!) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) [15:54:48] (03PS5) 10Jforrester: Disable ZeroBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 [15:54:50] (03PS6) 10Jforrester: Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) [15:54:52] (03PS6) 10Jforrester: robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) [15:54:56] (03PS6) 10Jforrester: Drop the ability to use ZeroBanner and ZeroPortal from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482102 (https://phabricator.wikimedia.org/T212865) [15:54:58] (03PS6) 10Jforrester: Stop configuring ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482103 (https://phabricator.wikimedia.org/T212865) [15:55:00] (03PS6) 10Jforrester: Stop loading i18n for ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482104 (https://phabricator.wikimedia.org/T212865) [15:55:05] (03CR) 10jerkins-bot: [V: 04-1] Mark zerowiki as deleted and drop all configuration(!) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [15:55:16] (03CR) 10jerkins-bot: [V: 04-1] Disable ZeroBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 (owner: 10Jforrester) [15:55:41] (03CR) 10jerkins-bot: [V: 04-1] Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:55:52] (03CR) 10Krinkle: [C: 03+2] beta: Remove use of wgOverrideHostname in CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521889 (owner: 10Krinkle) [15:56:01] (03Abandoned) 10Jforrester: Drop references to zerowiki configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521894 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [15:56:03] (03CR) 10jerkins-bot: [V: 04-1] robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:56:05] (03CR) 10jerkins-bot: [V: 04-1] Drop the ability to use ZeroBanner and ZeroPortal from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482102 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:56:08] (03PS3) 10Jforrester: Mark zerowiki as deleted and drop all configuration(!) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) [15:56:10] (03PS6) 10Jforrester: Disable ZeroBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 [15:56:14] (03PS7) 10Jforrester: Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) [15:56:16] (03PS7) 10Jforrester: robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) [15:56:18] (03PS7) 10Jforrester: Drop the ability to use ZeroBanner and ZeroPortal from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482102 (https://phabricator.wikimedia.org/T212865) [15:56:20] (03PS7) 10Jforrester: Stop configuring ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482103 (https://phabricator.wikimedia.org/T212865) [15:56:22] (03PS7) 10Jforrester: Stop loading i18n for ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482104 (https://phabricator.wikimedia.org/T212865) [15:56:24] (03CR) 10jerkins-bot: [V: 04-1] Stop configuring ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482103 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:56:26] (03CR) 10jerkins-bot: [V: 04-1] Stop loading i18n for ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482104 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [15:57:02] (03Merged) 10jenkins-bot: beta: Remove use of wgOverrideHostname in CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521889 (owner: 10Krinkle) [15:57:47] (03PS1) 10Muehlenhoff: Add DNS entries for ldap-codfw-replica* [dns] - 10https://gerrit.wikimedia.org/r/521895 [15:58:31] (03CR) 10jerkins-bot: [V: 04-1] Add DNS entries for ldap-codfw-replica* [dns] - 10https://gerrit.wikimedia.org/r/521895 (owner: 10Muehlenhoff) [16:00:05] MaxSem, RoanKattouw, and Niharika: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Morning SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190710T1600). [16:00:05] Urbanecm: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:19] I'll SWAT my own patches [16:00:51] (03PS2) 10Muehlenhoff: Add DNS entries for ldap-codfw-replica* [dns] - 10https://gerrit.wikimedia.org/r/521895 [16:00:58] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521497 (https://phabricator.wikimedia.org/T226764) (owner: 10Urbanecm) [16:01:03] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521865 (https://phabricator.wikimedia.org/T227418) (owner: 10Urbanecm) [16:02:04] (03Merged) 10jenkins-bot: Disable local uploads on wuuwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521497 (https://phabricator.wikimedia.org/T226764) (owner: 10Urbanecm) [16:02:12] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [140.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [16:02:13] (03Merged) 10jenkins-bot: Change bawikibooks logo to correct one according to community [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521865 (https://phabricator.wikimedia.org/T227418) (owner: 10Urbanecm) [16:03:17] Krinkle, noticed https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/521889 while fetching stuff to deploy on deploy1001 [16:04:39] Urbanecm: Thanks, I hadn't pulled it down yet. [16:04:44] It's beta-only, can be ignored. [16:04:52] ok, thanks [16:04:53] !log urbanecm@deploy1001 Synchronized dblists/commonsuploads.dblist: SWAT: [[:gerrit:521497|Disable local uploads on wuuwiki]] (T226764) (duration: 00m 58s) [16:04:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:58] T226764: Figure out where can be local uploads disabled for wikis that were in commonsuploads.dblist, but the config didn't apply to them - https://phabricator.wikimedia.org/T226764 [16:06:47] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: SWAT: [[:gerrit:521865|Change bawikibooks logo to correct one according to community]] (1/2, T227418) (duration: 01m 16s) [16:06:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:53] T227418: Several projects has an entry in wgLogoHD, but no entry in wgLogo - https://phabricator.wikimedia.org/T227418 [16:07:15] (03CR) 10Andrew Bogott: [C: 03+2] pam/sssd: enable mkhomedir if needed [puppet] - 10https://gerrit.wikimedia.org/r/521888 (https://phabricator.wikimedia.org/T227475) (owner: 10Andrew Bogott) [16:07:32] !log Purged two urls for T227418 [16:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:01] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:521865|Change bawikibooks logo to correct one according to community wish]] (2/2, T227418) (duration: 00m 58s) [16:09:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:08] That's all from me [16:09:09] 10Operations, 10MediaWiki-Cache, 10serviceops-radar, 10Core Platform Team (Mainstash Multi-DC), and 3 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10EvanProdromou) >>>! In T212129#5211137, @EvanProdromou wrote: >> I'd assume there w... [16:09:11] !log Morning SWAT done [16:09:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:09:22] PROBLEM - Check systemd state on cp4031 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:11:50] 10Operations, 10MediaWiki-Cache, 10serviceops-radar, 10Core Platform Team (Mainstash Multi-DC), and 3 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10EvanProdromou) @Krinkle Also, it sounds like we'd be changing the contract as loosel... [16:20:29] (03CR) 10jenkins-bot: beta: Remove use of wgOverrideHostname in CommonSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521889 (owner: 10Krinkle) [16:22:36] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [16:22:37] (03CR) 10jenkins-bot: Disable local uploads on wuuwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521497 (https://phabricator.wikimedia.org/T226764) (owner: 10Urbanecm) [16:33:36] PROBLEM - HHVM rendering on mw1347 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [16:34:54] RECOVERY - HHVM rendering on mw1347 is OK: HTTP OK: HTTP/1.1 200 OK - 77918 bytes in 2.872 second response time https://wikitech.wikimedia.org/wiki/Application_servers [16:51:06] 10Operations, 10Patch-For-Review: Decommission servermon - https://phabricator.wikimedia.org/T198939 (10MoritzMuehlenhoff) We discussed this in the SRE Infrastructure Foundations meeting; given that there are other issues with Servermon blocking the Buster migration of the Puppet masters, servermon/netmon1003... [16:59:22] 10Operations, 10MediaWiki-Cache, 10serviceops-radar, 10Core Platform Team (Mainstash Multi-DC), and 3 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Krinkle) Yeah, I think we need to choose between making the Stash a 1) non-ephemeral... [16:59:28] (03PS1) 10Elukey: profile::spark2: specify the spark.ui.port and its firewall range [puppet] - 10https://gerrit.wikimedia.org/r/521900 (https://phabricator.wikimedia.org/T170826) [17:03:27] (03CR) 10Elukey: [C: 03+2] profile::spark2: specify the spark.ui.port and its firewall range [puppet] - 10https://gerrit.wikimedia.org/r/521900 (https://phabricator.wikimedia.org/T170826) (owner: 10Elukey) [17:05:13] (03CR) 10Ayounsi: [C: 03+2] Add rpkicounter [puppet] - 10https://gerrit.wikimedia.org/r/520337 (owner: 10Ayounsi) [17:05:25] (03PS5) 10Ayounsi: Add rpkicounter [puppet] - 10https://gerrit.wikimedia.org/r/520337 [17:09:38] (03CR) 10Elukey: "Riccardo/Filippo: let me know (when you have time) if there are any outstanding problems :)" [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) (owner: 10Elukey) [17:09:41] 10Operations, 10ops-eqiad: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) p:05Triage→03Normal [17:10:05] 10Operations, 10ops-eqiad: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) [17:15:41] (03CR) 10Volans: [C: 03+1] "LGTM, I've only one question: exception handling. Given I don't see any, how does it behave if an exception is raised? It's of to fail the" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/521826 (https://phabricator.wikimedia.org/T220784) (owner: 10Elukey) [17:18:48] (03PS1) 10Krinkle: Sort wmgMonologChannels alphabetically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521901 [17:18:50] (03PS1) 10Krinkle: Remove dead 'wmgMonologChannels' entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521902 [17:18:53] (03PS1) 10Jbond: puppetmaster: remove severmon custom reporter [puppet] - 10https://gerrit.wikimedia.org/r/521903 (https://phabricator.wikimedia.org/T198939) [17:24:31] (03PS2) 10Krinkle: Sort wmgMonologChannels alphabetically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521901 [17:24:33] (03PS2) 10Krinkle: Remove dead 'wmgMonologChannels' entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521902 [17:25:30] (03CR) 10Krinkle: "lol, git-review fished "Bug39996" random from the commit message and decided to use it as a topic." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521902 (owner: 10Krinkle) [17:27:53] (03PS1) 10RobH: gerrit1001 dns setup [dns] - 10https://gerrit.wikimedia.org/r/521904 (https://phabricator.wikimedia.org/T227685) [17:28:15] (03CR) 10jerkins-bot: [V: 04-1] gerrit1001 dns setup [dns] - 10https://gerrit.wikimedia.org/r/521904 (https://phabricator.wikimedia.org/T227685) (owner: 10RobH) [17:28:25] 10Operations, 10ops-eqiad: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) [17:31:46] (03PS8) 10Ppchelko: RESTRouter: Add initial Helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/512923 (https://phabricator.wikimedia.org/T223953) (owner: 10Mobrovac) [17:32:13] (03PS1) 10Ayounsi: rpkicounter, fix https proxy typo [puppet] - 10https://gerrit.wikimedia.org/r/521906 [17:33:10] (03CR) 10Ayounsi: [C: 03+2] rpkicounter, fix https proxy typo [puppet] - 10https://gerrit.wikimedia.org/r/521906 (owner: 10Ayounsi) [17:34:55] 10Operations, 10Release Pipeline, 10serviceops, 10Core Platform Team (RESTBase Split (CDP2)), and 4 others: Deploy the RESTBase front-end service (RESTRouter) to Kubernetes - https://phabricator.wikimedia.org/T223953 (10Pchelolo) [17:35:19] (03PS2) 10RobH: gerrit1001 dns setup [dns] - 10https://gerrit.wikimedia.org/r/521904 (https://phabricator.wikimedia.org/T227685) [17:36:23] (03CR) 10RobH: [C: 03+2] gerrit1001 dns setup [dns] - 10https://gerrit.wikimedia.org/r/521904 (https://phabricator.wikimedia.org/T227685) (owner: 10RobH) [17:39:22] 10Operations, 10ops-eqiad: apply hostname label for WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227692 (10RobH) [17:39:38] 10Operations, 10ops-eqiad: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) [17:45:01] (03PS9) 10Ppchelko: RESTRouter: Add initial Helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/512923 (https://phabricator.wikimedia.org/T223953) (owner: 10Mobrovac) [17:45:04] (03CR) 10Jforrester: "Removal of the Zero channel is being done in Ibaeff25fa5ebc29 but sure." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521902 (owner: 10Krinkle) [17:49:52] 10Operations, 10ops-eqiad: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) [17:50:21] (03PS1) 10BryanDavis: cloud: Disable DNSSEC for pdns-recursor [puppet] - 10https://gerrit.wikimedia.org/r/521910 (https://phabricator.wikimedia.org/T226088) [17:54:03] !log phabricator: hotfixing fatal error by pulling upstream fix ( see https://secure.phabricator.com/D20644 ) [17:54:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:55:24] 10Operations, 10SRE-Access-Requests: Requesting access to analytics-privatedata-users for mbsantos - https://phabricator.wikimedia.org/T227695 (10MSantos) [17:56:55] (03PS1) 10RobH: setting install params for gerrit1001 [puppet] - 10https://gerrit.wikimedia.org/r/521913 (https://phabricator.wikimedia.org/T227685) [17:57:36] (03CR) 10jerkins-bot: [V: 04-1] setting install params for gerrit1001 [puppet] - 10https://gerrit.wikimedia.org/r/521913 (https://phabricator.wikimedia.org/T227685) (owner: 10RobH) [17:58:18] (03PS1) 10Andrew Bogott: nova-fullstack: change to use Buster by default [puppet] - 10https://gerrit.wikimedia.org/r/521915 [17:58:41] (03PS2) 10RobH: setting install params for gerrit1001 [puppet] - 10https://gerrit.wikimedia.org/r/521913 (https://phabricator.wikimedia.org/T227685) [17:59:07] (03PS2) 10Andrew Bogott: nova-fullstack: change to use Buster by default [puppet] - 10https://gerrit.wikimedia.org/r/521915 [17:59:46] (03CR) 10RobH: [C: 03+2] setting install params for gerrit1001 [puppet] - 10https://gerrit.wikimedia.org/r/521913 (https://phabricator.wikimedia.org/T227685) (owner: 10RobH) [18:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190710T1800) [18:01:26] (03PS3) 10Andrew Bogott: nova-fullstack: change to use Buster by default [puppet] - 10https://gerrit.wikimedia.org/r/521915 [18:02:47] (03CR) 10Andrew Bogott: [C: 03+2] nova-fullstack: change to use Buster by default [puppet] - 10https://gerrit.wikimedia.org/r/521915 (owner: 10Andrew Bogott) [18:04:23] 10Operations, 10Security-Team: Remove Brian Wolff from security@ alias in exim - https://phabricator.wikimedia.org/T227697 (10sbassett) [18:04:34] 10Operations, 10Security-Team: Remove Brian Wolff from security@ alias in exim - https://phabricator.wikimedia.org/T227697 (10sbassett) p:05Triage→03Normal [18:04:59] 10Operations, 10Reading-Infrastructure-Team-Backlog, 10Wikimedia-Mailing-lists: Create new product-infrastructure mailing list - https://phabricator.wikimedia.org/T227698 (10Mholloway) [18:05:26] (03CR) 10BryanDavis: "PCC run at https://puppet-compiler.wmflabs.org/compiler1001/17304/" [puppet] - 10https://gerrit.wikimedia.org/r/521910 (https://phabricator.wikimedia.org/T226088) (owner: 10BryanDavis) [18:07:33] 10Operations, 10Maps (Kartotherian), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Wikimedia-Incident: Create test in spec.yaml for the kartotherian / geoshape service - https://phabricator.wikimedia.org/T217910 (10MSantos) [18:09:24] (03CR) 10BBlack: [C: 03+1] cloud: Disable DNSSEC for pdns-recursor [puppet] - 10https://gerrit.wikimedia.org/r/521910 (https://phabricator.wikimedia.org/T226088) (owner: 10BryanDavis) [18:10:07] (03PS2) 10Andrew Bogott: cloud: Disable DNSSEC for pdns-recursor [puppet] - 10https://gerrit.wikimedia.org/r/521910 (https://phabricator.wikimedia.org/T226088) (owner: 10BryanDavis) [18:10:36] I'm on mwdebug1002. [18:11:22] (03CR) 10Andrew Bogott: [C: 03+2] cloud: Disable DNSSEC for pdns-recursor [puppet] - 10https://gerrit.wikimedia.org/r/521910 (https://phabricator.wikimedia.org/T226088) (owner: 10BryanDavis) [18:14:58] (03PS2) 10Andrew Bogott: encapi proxy: Allow all of localhost [puppet] - 10https://gerrit.wikimedia.org/r/521054 (owner: 10Alex Monk) [18:15:36] 10Operations, 10ops-eqiad: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) [18:15:52] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.13/includes/Linker.php: T227656 Fix visibility of IPs that aren't suppressed (duration: 00m 59s) [18:16:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:03] T227656: action=history shows "username removed" for anonymous editors on mediawiki.org instead of IP addresses - https://phabricator.wikimedia.org/T227656 [18:16:35] (03CR) 10Andrew Bogott: [C: 03+2] encapi proxy: Allow all of localhost [puppet] - 10https://gerrit.wikimedia.org/r/521054 (owner: 10Alex Monk) [18:16:56] 10Operations, 10ops-eqiad: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) [18:17:42] 10Operations, 10Security-Team: Remove Brian Wolff from security@ alias in exim - https://phabricator.wikimedia.org/T227697 (10Dzahn) a:03Dzahn [18:24:20] 10Operations, 10Security-Team: Remove Brian Wolff from security@ alias in exim - https://phabricator.wikimedia.org/T227697 (10Dzahn) 05Open→03Resolved [master 97b9217] (dzahn) exim: remove bawolff@ from security@ (T227697) done and applied on mx1001. bounces should have stopped now. [18:27:02] (03PS1) 10BBlack: Disable DNSSEC for pdns-recursor in prod as well [puppet] - 10https://gerrit.wikimedia.org/r/521921 (https://phabricator.wikimedia.org/T226088) [18:27:33] 10Operations, 10ops-eqiad: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) [18:28:29] 10Operations, 10ops-eqiad: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10Dzahn) @RobH Feel free to assign this to me once you are done with your steps. and thanks! [18:28:54] (03CR) 10BryanDavis: [C: 03+1] Disable DNSSEC for pdns-recursor in prod as well [puppet] - 10https://gerrit.wikimedia.org/r/521921 (https://phabricator.wikimedia.org/T226088) (owner: 10BBlack) [18:28:59] 10Operations, 10ops-eqiad, 10Gerrit: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10Dzahn) [18:31:59] (03CR) 10BBlack: [C: 03+2] Disable DNSSEC for pdns-recursor in prod as well [puppet] - 10https://gerrit.wikimedia.org/r/521921 (https://phabricator.wikimedia.org/T226088) (owner: 10BBlack) [18:44:18] (03PS1) 10RobH: set gerrit1001 to spare role [puppet] - 10https://gerrit.wikimedia.org/r/521922 (https://phabricator.wikimedia.org/T227685) [18:45:32] (03PS1) 10RobH: Revert "setting install params for gerrit1001" [puppet] - 10https://gerrit.wikimedia.org/r/521925 [18:47:54] 10Operations, 10ops-eqiad: apply hostname label for WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227692 (10RobH) 05Open→03Declined [18:48:01] 10Operations, 10ops-eqiad, 10Gerrit, 10Patch-For-Review: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) [18:48:14] (03CR) 10RobH: [C: 03+2] Revert "setting install params for gerrit1001" [puppet] - 10https://gerrit.wikimedia.org/r/521925 (owner: 10RobH) [18:48:17] 10Operations, 10Cassandra, 10Core Platform Team, 10RESTBase: RESTBase k-r-v as Cassandra anti-pattern - https://phabricator.wikimedia.org/T144431 (10Eevans) [18:48:22] (03PS2) 10RobH: Revert "setting install params for gerrit1001" [puppet] - 10https://gerrit.wikimedia.org/r/521925 [18:51:45] (03PS1) 10RobH: Revert "gerrit1001 dns setup" [dns] - 10https://gerrit.wikimedia.org/r/521926 [18:52:25] (03CR) 10RobH: [C: 03+2] Revert "gerrit1001 dns setup" [dns] - 10https://gerrit.wikimedia.org/r/521926 (owner: 10RobH) [18:55:52] (03PS4) 10Ottomata: Use wgEventServiceStreamConfig to configure wgRCFeeds['eventbus'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521528 (https://phabricator.wikimedia.org/T211248) [18:57:17] hiya longma! I was about to do a config deploy, but i see the train is about to start. i'll wait til you are done. can you let me know when you are? [19:00:04] longma: That opportune time is upon us again. Time for a MediaWiki train - American version deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190710T1900). [19:00:33] PROBLEM - puppet last run on gerrit1001 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 25 minutes ago with 4 failures. Failed resources (up to 3 shown): Package[gerrit/gerrit],Package[gervert/deploy],Package[apache2modsec/apache2modsec],File[/etc/acmecerts/gerrit] [19:01:22] 10Operations, 10ops-eqiad, 10Gerrit, 10Patch-For-Review: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) [19:01:34] ^ that gerrit1001 is a new server that wasn't quite ready yet for the role [19:01:35] 10Operations, 10ops-eqiad, 10Gerrit, 10Patch-For-Review: setup WMF5173 as gerrit1001 - https://phabricator.wikimedia.org/T227685 (10RobH) 05Open→03Declined reverted this, since it doesn't have enough ram [19:01:52] it's not the current prod server, so no worries about it [19:02:38] ottomata: The train is currently blocked so I guess that means you can go ahead? [19:02:39] ACKNOWLEDGEMENT - puppet last run on gerrit1001 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 25 minutes ago with 4 failures. Failed resources (up to 3 shown): Package[gerrit/gerrit],Package[gervert/deploy],Package[apache2modsec/apache2modsec],File[/etc/acmecerts/gerrit] daniel_zahn https://phabricator.wikimedia.org/T227685 [19:02:39] ACKNOWLEDGEMENT - DNS gerrit1001.mgmt on gerrit1001.mgmt is CRITICAL: Domain gerrit1001.mgmt.eqiad.wmnet was not found by the server daniel_zahn https://phabricator.wikimedia.org/T227685 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [19:03:03] PROBLEM - Host gerrit1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:03:41] ACKNOWLEDGEMENT - Host gerrit1001 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T227685 [19:04:17] oh ok [19:04:17] thanks! [19:04:32] !log jiji@deploy1001 Started deploy [cpjobqueue/deploy@8761480]: Migrating rest of hightraffic jobs to PHP7 - T219150 [19:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:04:37] T219150: Ramp up percentage of users on php7.2 to 100% on both API and appserver clusters - https://phabricator.wikimedia.org/T219150 [19:04:56] (03CR) 10Jbond: [C: 03+1] "LGTM but no IPv6 ;)" [dns] - 10https://gerrit.wikimedia.org/r/521895 (owner: 10Muehlenhoff) [19:05:32] !log jiji@deploy1001 Finished deploy [cpjobqueue/deploy@8761480]: Migrating rest of hightraffic jobs to PHP7 - T219150 (duration: 01m 00s) [19:05:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:08:59] (03PS4) 10Dzahn: phabricator: add forensic apache logging and enable on phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/511955 [19:09:44] (03CR) 10Dzahn: [C: 03+2] phabricator: add forensic apache logging and enable on phab1003 [puppet] - 10https://gerrit.wikimedia.org/r/511955 (owner: 10Dzahn) [19:10:10] (03CR) 10Ottomata: [C: 03+2] Use wgEventServiceStreamConfig to configure wgRCFeeds['eventbus'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521528 (https://phabricator.wikimedia.org/T211248) (owner: 10Ottomata) [19:11:10] (03Merged) 10jenkins-bot: Use wgEventServiceStreamConfig to configure wgRCFeeds['eventbus'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521528 (https://phabricator.wikimedia.org/T211248) (owner: 10Ottomata) [19:13:51] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [19:15:49] 10Operations, 10serviceops, 10Core Platform Team Backlog (Watching / External), 10Patch-For-Review, and 2 others: Use PHP7 to run all async jobs - https://phabricator.wikimedia.org/T219148 (10jijiki) [19:17:30] (03CR) 10jenkins-bot: Change bawikibooks logo to correct one according to community [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521865 (https://phabricator.wikimedia.org/T227418) (owner: 10Urbanecm) [19:17:32] (03CR) 10jenkins-bot: Use wgEventServiceStreamConfig to configure wgRCFeeds['eventbus'] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521528 (https://phabricator.wikimedia.org/T211248) (owner: 10Ottomata) [19:17:51] (03CR) 10Dzahn: "confirmed it works. there is a forensic.log and it gets content" [puppet] - 10https://gerrit.wikimedia.org/r/511955 (owner: 10Dzahn) [19:21:01] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Missing modules/profile/manifests/puppetmaster/common.pp: reports => 'servermon,puppetdb'," [puppet] - 10https://gerrit.wikimedia.org/r/521903 (https://phabricator.wikimedia.org/T198939) (owner: 10Jbond) [19:22:53] (03PS3) 10Ottomata: Migrate mediawiki.recentchange stream to eventgate-main [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521517 (https://phabricator.wikimedia.org/T211248) [19:24:45] RECOVERY - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports [19:25:02] (03PS2) 10Jbond: puppetmaster: remove severmon custom reporter [puppet] - 10https://gerrit.wikimedia.org/r/521903 (https://phabricator.wikimedia.org/T198939) [19:26:06] (03CR) 10MaxSem: [C: 04-1] "This should be swapped with the next patch because if you nuke the wiki before ZeroBanner it will continue trying to load configuration fr" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [19:26:08] (03CR) 10Jbond: "> Patch Set 1: Code-Review-1" [puppet] - 10https://gerrit.wikimedia.org/r/521903 (https://phabricator.wikimedia.org/T198939) (owner: 10Jbond) [19:26:28] !log otto@deploy1001 Synchronized wmf-config/CommonSettings.php: Use wgEventServiceStreamConfig to configure wgRCFeeds eventbus. No-op in prod. - T211248 (duration: 00m 58s) [19:26:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:33] T211248: Modern Event Platform: Stream Intake Service: Migrate Mediawiki Eventbus events to eventgate-main - https://phabricator.wikimedia.org/T211248 [19:27:52] (03CR) 10Ottomata: [C: 03+2] Migrate mediawiki.recentchange stream to eventgate-main [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521517 (https://phabricator.wikimedia.org/T211248) (owner: 10Ottomata) [19:28:56] (03Merged) 10jenkins-bot: Migrate mediawiki.recentchange stream to eventgate-main [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521517 (https://phabricator.wikimedia.org/T211248) (owner: 10Ottomata) [19:29:11] (03CR) 10jenkins-bot: Migrate mediawiki.recentchange stream to eventgate-main [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521517 (https://phabricator.wikimedia.org/T211248) (owner: 10Ottomata) [19:31:04] (03CR) 10Dzahn: "i merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/511955 where this same code is used but just on the phabricator server and i" [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh) [19:32:22] !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Produce recentchange stream to eventgate-main - T211248 (duration: 00m 57s) [19:32:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:28] T211248: Modern Event Platform: Stream Intake Service: Migrate Mediawiki Eventbus events to eventgate-main - https://phabricator.wikimedia.org/T211248 [19:32:57] 10Operations, 10Puppet, 10Packaging, 10Patch-For-Review: upgrade puppet master servers - https://phabricator.wikimedia.org/T227587 (10jbond) p:05Triage→03Normal [19:33:17] (03CR) 10MaxSem: "zero.wikipedia.org is not an alias, it's a b/c redirect to www.wikipedia.org. Before nuking it, we need to assess current traffic levels." [dns] - 10https://gerrit.wikimedia.org/r/521886 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [19:38:27] 10Operations, 10SRE-Access-Requests: Requesting access to machines [stat1004, stat1005 (now stat1007), and stat1006] and groups for Mayakpwiki - https://phabricator.wikimedia.org/T227633 (10kzimmerman) Approved as Maya's manager! [19:45:04] !log Updated the Wikidata property suggester with data from the 2019-07-01 JSON dump and applied the T132839 workarounds [19:45:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:45:11] T132839: [RfC] Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839 [19:45:23] sjoerddebruin: FYI ^ :) [19:46:00] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (201907): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10greg) [19:47:17] 10Operations, 10netops: RPKI Validation - https://phabricator.wikimedia.org/T220669 (10ayounsi) > This has been fixed now in https://www.juniper.net/documentation/en_US/junos/topics/topic-map/bgp-origin-as-validation.html Good news, thanks. > Next step is to review/merge https://gerrit.wikimedia.org/r/c/52033... [19:48:58] (03CR) 10CDanis: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh) [19:51:06] (03PS1) 10Urbanecm: Remove usergroup communityapps from officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521933 (https://phabricator.wikimedia.org/T227680) [19:54:37] (03PS1) 10Dzahn: phabricator: disable forensic logging on phab1003 but keep hiera key [puppet] - 10https://gerrit.wikimedia.org/r/521935 [19:55:58] (03CR) 10Dzahn: [C: 03+2] phabricator: disable forensic logging on phab1003 but keep hiera key [puppet] - 10https://gerrit.wikimedia.org/r/521935 (owner: 10Dzahn) [20:00:04] cscott, arlolra, subbu, bearND, and halfak: #bothumor My software never has bugs. It just develops random features. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / …. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190710T2000). [20:01:21] 10Operations, 10Reading-Infrastructure-Team-Backlog, 10Wikimedia-Mailing-lists: Create new product-infrastructure mailing list - https://phabricator.wikimedia.org/T227698 (10Dzahn) note to clinic duty: this is actually a "rename list" task rather than "new list" task. see https://wikitech.wikimedia.org/wi... [20:02:30] 10Operations, 10Reading-Infrastructure-Team-Backlog, 10Wikimedia-Mailing-lists: Create new product-infrastructure mailing list - https://phabricator.wikimedia.org/T227698 (10Dzahn) [20:03:01] 10Operations, 10Reading-Infrastructure-Team-Backlog, 10Wikimedia-Mailing-lists: rename mailing list "ri_team" to "product-infrastructure" - https://phabricator.wikimedia.org/T227698 (10Dzahn) [20:07:07] (03PS1) 10Jbond: example CR: https://github.com/rodjek/rspec-puppet/pull/742 [puppet] - 10https://gerrit.wikimedia.org/r/521939 [20:07:44] (03CR) 10jerkins-bot: [V: 04-1] example CR: https://github.com/rodjek/rspec-puppet/pull/742 [puppet] - 10https://gerrit.wikimedia.org/r/521939 (owner: 10Jbond) [20:07:55] PROBLEM - HHVM rendering on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [20:08:43] PROBLEM - Nginx local proxy to apache on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [20:08:49] PROBLEM - Apache HTTP on mw1235 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [20:09:01] (03PS7) 10Jforrester: Disable ZeroBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 [20:09:03] (03PS4) 10Jforrester: Mark zerowiki as deleted and drop all configuration at once(!) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) [20:09:05] (03PS8) 10Jforrester: Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) [20:09:07] (03PS8) 10Jforrester: robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) [20:09:09] (03PS8) 10Jforrester: Drop the ability to use ZeroBanner and ZeroPortal from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482102 (https://phabricator.wikimedia.org/T212865) [20:09:11] (03PS8) 10Jforrester: Stop configuring ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482103 (https://phabricator.wikimedia.org/T212865) [20:09:13] (03PS8) 10Jforrester: Stop loading i18n for ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482104 (https://phabricator.wikimedia.org/T212865) [20:10:26] 10Operations, 10Patch-For-Review: Decommission servermon - https://phabricator.wikimedia.org/T198939 (10Dzahn) a:05faidon→03Dzahn [20:16:06] (03PS1) 10Thcipriani: its-phabricator: new build with updated its-base [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521944 [20:16:26] (03CR) 10jerkins-bot: [V: 04-1] its-phabricator: new build with updated its-base [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521944 (owner: 10Thcipriani) [20:18:39] (03CR) 10Paladox: [C: 03+2] its-phabricator: new build with updated its-base [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521944 (owner: 10Thcipriani) [20:18:58] (03CR) 10jerkins-bot: [V: 04-1] its-phabricator: new build with updated its-base [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521944 (owner: 10Thcipriani) [20:20:07] (03PS1) 10BryanDavis: cloud: Set has_lvs: false in hiera [puppet] - 10https://gerrit.wikimedia.org/r/521947 [20:21:40] (03PS2) 10Dzahn: deployment_server: remove servermon [puppet] - 10https://gerrit.wikimedia.org/r/502173 (https://phabricator.wikimedia.org/T198939) [20:22:05] (03CR) 10Dzahn: [C: 03+2] "no more new deploys for servermon. going to decom" [puppet] - 10https://gerrit.wikimedia.org/r/502173 (https://phabricator.wikimedia.org/T198939) (owner: 10Dzahn) [20:31:09] (03PS2) 10Dzahn: puppetmaster: remove servermon report [puppet] - 10https://gerrit.wikimedia.org/r/502175 (https://phabricator.wikimedia.org/T198939) [20:33:30] (03CR) 10Dzahn: "looks like i made an almost duplicate of https://gerrit.wikimedia.org/r/c/operations/puppet/+/521903 except that also removes 2 packages" [puppet] - 10https://gerrit.wikimedia.org/r/502175 (https://phabricator.wikimedia.org/T198939) (owner: 10Dzahn) [20:34:10] (03Abandoned) 10Dzahn: puppetmaster: remove servermon report [puppet] - 10https://gerrit.wikimedia.org/r/502175 (https://phabricator.wikimedia.org/T198939) (owner: 10Dzahn) [20:35:57] (03CR) 10Dzahn: [C: 03+1] "i made a duplicate at https://gerrit.wikimedia.org/r/c/operations/puppet/+/502175 so definitely +1 to that part. i just didn't know about " [puppet] - 10https://gerrit.wikimedia.org/r/521903 (https://phabricator.wikimedia.org/T198939) (owner: 10Jbond) [20:37:03] (03CR) 10Dzahn: [C: 03+1] "https://gerrit.wikimedia.org/r/q/topic:%22servermon%22+(status:open%20OR%20status:merged%20OR%20status:abandoned)" [puppet] - 10https://gerrit.wikimedia.org/r/521903 (https://phabricator.wikimedia.org/T198939) (owner: 10Jbond) [20:49:08] (03PS2) 10BPirkle: Specify CentralAuth and OAuth session storage separately from per-wiki session storage. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521409 (https://phabricator.wikimedia.org/T227097) [20:50:26] (03PS2) 10Dzahn: restbase: update rb1017 Cassandra instances for rack move [puppet] - 10https://gerrit.wikimedia.org/r/521525 (https://phabricator.wikimedia.org/T222960) (owner: 10Eevans) [20:51:10] (03CR) 10Dzahn: [C: 03+2] restbase: update rb1017 Cassandra instances for rack move [puppet] - 10https://gerrit.wikimedia.org/r/521525 (https://phabricator.wikimedia.org/T222960) (owner: 10Eevans) [20:52:39] (03CR) 10BPirkle: "Yes, thank you. OAuth needed a similar change, which I just added under T227696. Configuration change made here. It does not matter which " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521409 (https://phabricator.wikimedia.org/T227097) (owner: 10BPirkle) [20:53:19] 10Operations, 10ops-eqiad, 10Cassandra, 10serviceops, and 4 others: Fix restbase1017's physical rack - https://phabricator.wikimedia.org/T222960 (10Dzahn) [20:54:25] 10Operations, 10ops-eqiad, 10Cassandra, 10serviceops, and 4 others: Fix restbase1017's physical rack - https://phabricator.wikimedia.org/T222960 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts: ` restbase1017.eqiad.wmnet ` The log can be found in `/va... [21:01:36] (03CR) 10Smalyshev: [C: 03+1] refactor wikidata entity dumps into wikibase + wikidata specific bits [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) (owner: 10ArielGlenn) [21:01:40] (03PS6) 10Dzahn: icinga icon: Use correct icon for notes_url [puppet] - 10https://gerrit.wikimedia.org/r/520756 (owner: 10Jbond) [21:05:02] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/17308/icinga1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/520756 (owner: 10Jbond) [21:06:23] PROBLEM - puppet last run on ms-be1034 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [21:08:17] 10Operations, 10MediaWiki-Cache, 10serviceops-radar, 10Core Platform Team (Mainstash Multi-DC), and 3 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10aaron) I think (1) is more useful and fills a needed gap of writes on GET/HEAD. What... [21:09:10] (03PS1) 10Ottomata: Allow analytics-privatedata-users group to access swift auth env file [puppet] - 10https://gerrit.wikimedia.org/r/521954 (https://phabricator.wikimedia.org/T219544) [21:12:10] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [21:14:51] chaomodus: ^ known? [21:15:25] no but i'll take the look [21:15:44] (03CR) 10Nuria: [C: 03+1] Allow analytics-privatedata-users group to access swift auth env file [puppet] - 10https://gerrit.wikimedia.org/r/521954 (https://phabricator.wikimedia.org/T219544) (owner: 10Ottomata) [21:15:48] (03CR) 10Dzahn: "[icinga1001:/usr/share/icinga/htdocs/images] $ file 1-notes.gif" [puppet] - 10https://gerrit.wikimedia.org/r/520756 (owner: 10Jbond) [21:15:54] did restbase1017 just goo away? [21:18:02] chaomodus: reinstall script is running [21:18:11] ah it's good then [21:18:19] it'll fix itself as soon as its down [21:18:21] done [21:18:30] chaomodus: yes, i got restbase1017. and thanks for the netbox report thing.. [21:18:51] wait. you say one is causing the other? [21:19:19] that means that its not showing up in puppetdb but should be because it's marked as in production in netbox [21:19:29] if it's undergoing reinstall it should fix itself once it comes back to puppetdb [21:19:42] so the puppetdb PROBLEM is that it cant find the server being reinstalled? [21:19:54] aha. i did not see the relation between them [21:20:10] thanks. gotcha [21:21:04] (03CR) 10Ayounsi: [C: 03+1] "A few comments/questions but overall LGTM." (033 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/510256 (https://phabricator.wikimedia.org/T221507) (owner: 10CRusnov) [21:26:50] (03PS2) 10BryanDavis: cloud: Add default LVS hiera settings [puppet] - 10https://gerrit.wikimedia.org/r/521947 [21:26:52] (03PS1) 10BryanDavis: cloud: set scap::version: present in hiera [puppet] - 10https://gerrit.wikimedia.org/r/521959 [21:28:47] (03CR) 10Thcipriani: [C: 03+1] cloud: set scap::version: present in hiera [puppet] - 10https://gerrit.wikimedia.org/r/521959 (owner: 10BryanDavis) [21:32:08] RECOVERY - puppet last run on ms-be1034 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:34:44] (03CR) 10Jforrester: "recheck" [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521944 (owner: 10Thcipriani) [21:35:07] !log mw1290 - restarting hhvm (socket timeout alert in icinga since about 5h) [21:35:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:21] !log mw1235 - restarting hhvm (socket timeout alert in icinga since about 1.5h) [21:36:22] RECOVERY - HHVM rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 77904 bytes in 0.635 second response time https://wikitech.wikimedia.org/wiki/Application_servers [21:36:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:44] jouncebot: next [21:36:44] In 1 hour(s) and 23 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190710T2300) [21:36:57] (03CR) 10Jforrester: [C: 03+2] "Let's go." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 (owner: 10Jforrester) [21:37:00] RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.090 second response time https://wikitech.wikimedia.org/wiki/Application_servers [21:37:24] RECOVERY - Nginx local proxy to apache on mw1290 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.045 second response time https://wikitech.wikimedia.org/wiki/Application_servers [21:37:57] (03PS2) 10Thcipriani: its-phabricator: new build with updated its-base [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/521944 [21:38:20] RECOVERY - Apache HTTP on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.037 second response time https://wikitech.wikimedia.org/wiki/Application_servers [21:38:20] RECOVERY - Nginx local proxy to apache on mw1235 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.048 second response time https://wikitech.wikimedia.org/wiki/Application_servers [21:38:20] RECOVERY - HHVM rendering on mw1235 is OK: HTTP OK: HTTP/1.1 200 OK - 77904 bytes in 0.183 second response time https://wikitech.wikimedia.org/wiki/Application_servers [21:38:22] RECOVERY - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports [21:38:42] (03Merged) 10jenkins-bot: Disable ZeroBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 (owner: 10Jforrester) [21:39:00] I'm testing on mwdebug1002. [21:41:19] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T212865 Disable ZeroBanner on all wikis (duration: 00m 59s) [21:41:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:41:24] T212865: Remove Wikipedia Zero wiki and Zero-related extensions from production - https://phabricator.wikimedia.org/T212865 [21:42:39] (03CR) 10jenkins-bot: Disable ZeroBanner on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 (owner: 10Jforrester) [21:43:09] mwdebug1002 - WARNING: opcache free space is below 100 MB [21:43:14] hmm [21:44:30] Does that not get the opcache cron to clean it out? [21:45:03] (03CR) 10Jforrester: [C: 03+2] Mark zerowiki as deleted and drop all configuration at once(!) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [21:46:08] (03Merged) 10jenkins-bot: Mark zerowiki as deleted and drop all configuration at once(!) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [21:46:43] (03CR) 10Paladox: "recheck" [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/521872 (owner: 10Paladox) [21:48:06] (03CR) 10jenkins-bot: Mark zerowiki as deleted and drop all configuration at once(!) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521893 (https://phabricator.wikimedia.org/T187716) (owner: 10Jforrester) [21:49:45] !log jforrester@deploy1001 Synchronized dblists/: T187716 Mark zerowiki as deleted in dblists (duration: 01m 00s) [21:49:47] James_F: ah, that made me find this "php7adm /opcache-free" but not a cron [21:49:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:49:50] T187716: Sunset Wikipedia Zero - https://phabricator.wikimedia.org/T187716 [21:50:12] !log mwdebug1002 - php7adm /opcache-free because icinga showed a warning for opcache free space below 100MB [21:50:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:09] mutante: Thanks. [21:51:40] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T187716 Drop all zerowiki configuration (duration: 00m 58s) [21:51:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:51:48] (03CR) 10Jforrester: [C: 03+2] Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:52:04] James_F: yw (it was like that since 22 hours or so, not really new) [21:52:45] Sure, but anything that might break my ability to tell if a deploy will break the world is very welcome. :-) [21:52:53] (03Merged) 10jenkins-bot: Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:52:55] (03PS5) 10Smalyshev: Set up dumps for mediainfo RDF generation [puppet] - 10https://gerrit.wikimedia.org/r/516444 (https://phabricator.wikimedia.org/T221917) [21:53:02] yea, i wanted to say you did not cause the opcache to go low :) [21:53:02] (03CR) 10Jforrester: [C: 03+2] robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:53:08] (03CR) 10jenkins-bot: Drop the Wikipedia Zero debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482099 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:53:39] * James_F grins. [21:54:03] (03Merged) 10jenkins-bot: robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:54:20] (03CR) 10Jforrester: [C: 03+2] Drop the ability to use ZeroBanner and ZeroPortal from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482102 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:55:07] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T212865 Drop the Wikipedia Zero debug log channel (duration: 00m 58s) [21:55:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:55:13] T212865: Remove Wikipedia Zero wiki and Zero-related extensions from production - https://phabricator.wikimedia.org/T212865 [21:55:25] (03Merged) 10jenkins-bot: Drop the ability to use ZeroBanner and ZeroPortal from production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482102 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:56:39] (03CR) 10jenkins-bot: robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:57:00] (03PS2) 10Jforrester: Drop zero.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/521886 (https://phabricator.wikimedia.org/T187716) [21:57:01] (03PS1) 10Jforrester: Drop zero.wikipedia.org redirect to www.wikipedia.org [dns] - 10https://gerrit.wikimedia.org/r/521966 [21:59:17] (03PS1) 10Aaron Schulz: Set "allow_tcp_nagle_delay" to false in mc.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521967 [21:59:24] !log jforrester@deploy1001 Synchronized w/robots.php: T212865 Drop the special treatment for Wikipedia Zero (duration: 00m 58s) [21:59:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:01:19] (03CR) 10Dzahn: "nevermind, it was just browser cache. after hard reloading with shift i got the new icon. tested with https://icinga.wikimedia.org/cgi-bin" [puppet] - 10https://gerrit.wikimedia.org/r/520756 (owner: 10Jbond) [22:02:23] aha, looks like zero has zero traffic now [22:02:44] you could call it 0-day [22:03:11] !log jforrester@deploy1001 Synchronized wmf-config/mobile.php: T212865 Drop the ability to use ZeroBanner and ZeroPortal from production, mobile code (duration: 00m 57s) [22:03:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:03:17] T212865: Remove Wikipedia Zero wiki and Zero-related extensions from production - https://phabricator.wikimedia.org/T212865 [22:06:23] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T212865 Drop the ability to use ZeroBanner and ZeroPortal from production (duration: 00m 57s) [22:06:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:46] (03CR) 10Jforrester: [C: 03+2] Stop configuring ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482103 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [22:14:56] (03Merged) 10jenkins-bot: Stop configuring ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482103 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [22:16:54] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T212865 Stop configuring ZeroBanner and ZeroPortal, unused (duration: 00m 58s) [22:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:16:59] T212865: Remove Wikipedia Zero wiki and Zero-related extensions from production - https://phabricator.wikimedia.org/T212865 [22:17:51] (03CR) 10Jforrester: [C: 03+2] Stop loading i18n for ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482104 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [22:19:12] (03Merged) 10jenkins-bot: Stop loading i18n for ZeroBanner and ZeroPortal, unused [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482104 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [22:33:26] (03PS3) 10Jforrester: Sort wmgMonologChannels alphabetically [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521901 (owner: 10Krinkle) [22:33:28] (03PS3) 10Jforrester: Remove dead 'wmgMonologChannels' entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521902 (owner: 10Krinkle) [22:33:46] (03PS3) 10Jforrester: Drop zero.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/521886 (https://phabricator.wikimedia.org/T187716) [22:33:55] (03PS2) 10Jforrester: Drop zero.wikipedia.org redirect to www.wikipedia.org [dns] - 10https://gerrit.wikimedia.org/r/521966 [22:34:39] 10Operations, 10Cassandra, 10Core Platform Team (Cassandra Operational ), 10User-Eevans: puppetize turning off reserved space for cassandra /srv - https://phabricator.wikimedia.org/T132632 (10Eevans) [22:35:56] 10Operations, 10Cassandra, 10Core Platform Team (Cassandra Operational ), 10Patch-For-Review, 10User-Eevans: enable authenticated access to Cassandra JMX - https://phabricator.wikimedia.org/T92471 (10Eevans) [22:36:05] I'm releasing the conch. [22:36:19] 10Operations, 10Cassandra, 10RESTBase, 10RESTBase-Cassandra, 10Core Platform Team (Cassandra Operational ): secure Cassandra/RESTBase cluster - https://phabricator.wikimedia.org/T94329 (10Eevans) [22:41:32] (03PS4) 10Jforrester: Remove /w/skin-1.5 symlink [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521562 (https://phabricator.wikimedia.org/T156319) (owner: 10Krinkle) [22:41:39] (03CR) 10Jforrester: [C: 03+2] Remove /w/skin-1.5 symlink [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521562 (https://phabricator.wikimedia.org/T156319) (owner: 10Krinkle) [22:42:04] (I'm taking it back for train unblocking.) [22:42:44] (03Merged) 10jenkins-bot: Remove /w/skin-1.5 symlink [mediawiki-config] - 10https://gerrit.wikimedia.org/r/521562 (https://phabricator.wikimedia.org/T156319) (owner: 10Krinkle) [22:46:18] !log jforrester@deploy1001 Synchronized w: T156319 Remove /w/skin-1.5 symlink (duration: 00m 58s) [22:46:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:46:24] T156319: mediawiki-config: Try and simplify/cleanup the pile of symlinks - https://phabricator.wikimedia.org/T156319 [22:46:39] !log downgrading cp4031 to mtail_3.0.0~rc5-1~bpo9+1wmf1 to fix varnishmtail T225604 [22:46:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:46:44] T225604: log spam from mtail 3.0.0~rc19 on wezen - https://phabricator.wikimedia.org/T225604 [22:49:08] 10Operations, 10ops-eqiad, 10DC-Ops: b2-eqiad pdu refresh - https://phabricator.wikimedia.org/T227538 (10RobH) [22:49:16] shdubsh: :) [22:57:40] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.13/includes/user/User.php: T227688 User: support setting custom fields + array autocreation in non-existent field (duration: 00m 58s) [22:57:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:57:45] T227688: PHP error from Special:OAuth: "Indirect modification of overloaded property User::$oAuthUserData" - https://phabricator.wikimedia.org/T227688 [22:59:51] !log jforrester@deploy1001 scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [22:59:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:00:04] MaxSem, RoanKattouw, and Niharika: How many deployers does it take to do Evening SWAT (Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190710T2300). [23:00:04] ebernhardson: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:18] Hmm. [23:01:19] All the errors were in wmf.11, of course. [23:02:16] hrm? lemme look [23:02:24] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.13/extensions/OAuth/includes/backend/MWOAuthUtils.php: T227688 OAuth: Do not rely on array autocreation for custom User properties; re-try (duration: 00m 58s) [23:02:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:02:39] ebernhardson: Not done yours yet, that's now. [23:03:41] ebernhardson: Live on mwdebug1002, but is there a sane way to test? [23:04:03] RECOVERY - Check systemd state on cp4031 is OK: OK - running: The system is fully operational [23:14:35] James_F: not really [23:14:53] James_F: sorry in another window. but yea no way to test really, its from job runners [23:14:55] OK, let's justsync. [23:16:29] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.13/extensions/CirrusSearch/includes: T227691 RedirectsAndIncomingLinks: succeede or fail, but not both (duration: 01m 02s) [23:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:16:34] T227691: Fatal from CirrusSearch\Job\LinksUpdate: "Call to a member function getLogVariables() on null" - https://phabricator.wikimedia.org/T227691 [23:17:04] 10Operations, 10ops-eqiad, 10Cassandra, 10serviceops, and 4 others: Fix restbase1017's physical rack - https://phabricator.wikimedia.org/T222960 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['restbase1017.eqiad.wmnet'] ` Of which those **FAILED**: ` ['restbase1017.eqiad.wmnet'] ` [23:39:12] (03CR) 10Smalyshev: [C: 03+1] refactor wikidata entity dumps into wikibase + wikidata specific bits (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) (owner: 10ArielGlenn) [23:39:29] (03PS1) 10BryanDavis: keyholder: hiera setting for require_encrypted_keys [puppet] - 10https://gerrit.wikimedia.org/r/522008