[00:36:45] (03PS2) 10Smalyshev: Migrate CirrusSearch to extension.json officially [mediawiki-config] - 10https://gerrit.wikimedia.org/r/514994 (https://phabricator.wikimedia.org/T87892) [01:11:07] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [01:15:27] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:41:21] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [01:45:39] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:11:35] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [02:14:43] PROBLEM - puppet last run on lvs5003 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [02:15:57] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [02:41:57] RECOVERY - puppet last run on lvs5003 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [03:08:23] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Pruem) [03:24:57] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10CDanis) My guess is that the beginning of this problem correlates with the beginning of the fetch... [03:43:26] (03CR) 10MZMcBride: "It's disheartening to see that this changeset got abandoned." [puppet] - 10https://gerrit.wikimedia.org/r/511751 (owner: 10Ori.livneh) [04:00:32] !log depooling maps1003 for reimage into new partition scheme - T224395 [04:00:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:00:40] T224395: Maps[12]004 /srv disk space is critical - https://phabricator.wikimedia.org/T224395 [04:04:00] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Pruem) This correlates to the previous appearance of the problem in early June, see [[https://graf... [04:45:33] RECOVERY - Check systemd state on es2014 is OK: OK - running: The system is fully operational [04:46:27] (03PS1) 10Marostegui: db1112: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/517963 (https://phabricator.wikimedia.org/T225981) [04:48:06] (03CR) 10Marostegui: [C: 03+2] db1112: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/517963 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [04:48:27] (03PS1) 10Marostegui: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517964 (https://phabricator.wikimedia.org/T225981) [04:50:32] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517964 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [04:51:30] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517964 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [04:51:45] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517964 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [04:52:58] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1077 T225981 (duration: 00m 59s) [04:53:00] !log Stop replication in sync on db1112 and db1077 to move db1124 under db1112 - T225981 [04:53:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:53:03] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [04:53:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:54:30] 10Operations, 10serviceops, 10Core Platform Team Backlog (Later), 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10KartikMistry) @Joe @MoritzMuehlenhoff You can also check, https://phabricator.wikimedia.org/diffusion/GCXS/browse/master/.p... [04:56:15] ah https://wikitech.wikimedia.org/wiki/Deployments#Thursday,_June_20 8 Patches in SWAT? :/ [04:58:37] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517965 [05:02:06] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517965 (owner: 10Marostegui) [05:02:56] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517965 (owner: 10Marostegui) [05:03:10] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1077" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517965 (owner: 10Marostegui) [05:04:34] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1077 T225981 (duration: 00m 55s) [05:04:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:04:39] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [05:07:15] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Pool db1112 into s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517966 (https://phabricator.wikimedia.org/T225981) [05:10:59] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [05:15:21] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:18:31] (03CR) 10Giuseppe Lavagetto: [C: 03+1] db-eqiad,db-codfw.php: Pool db1112 into s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517966 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [05:18:55] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Pool db1112 into s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517966 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [05:19:47] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Pool db1112 into s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517966 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [05:20:01] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Pool db1112 into s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517966 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [05:22:16] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Slowly pool db1112 into s3 T225981 (duration: 00m 55s) [05:22:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:22:21] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [05:23:19] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly pool db1112 into s3 T225981 (duration: 00m 55s) [05:23:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:24:18] https://phabricator.wikimedia.org/T179884 [05:24:24] this is a serious issue [05:25:37] it should be fixed ASAP [05:25:47] still happening now [05:27:17] https://commons.wikimedia.org/wiki/File:Tekija_(Kladovo).JPG [05:33:13] (03PS1) 10Marostegui: db1077: Allow reimage [puppet] - 10https://gerrit.wikimedia.org/r/517967 (https://phabricator.wikimedia.org/T225981) [05:33:56] (03CR) 10Marostegui: [C: 03+2] db1077: Allow reimage [puppet] - 10https://gerrit.wikimedia.org/r/517967 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [05:36:27] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517968 [05:37:29] !log Deploy schema change on centralauth.oathauth_users T225643 [05:37:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:37:35] T225643: Schema change to oathauth_users - https://phabricator.wikimedia.org/T225643 [05:38:11] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517968 (owner: 10Marostegui) [05:39:00] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517968 (owner: 10Marostegui) [05:39:14] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517968 (owner: 10Marostegui) [05:40:25] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 56s) [05:40:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:40:30] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [05:41:21] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [05:43:17] marostegui, thanks [05:45:43] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [05:46:04] 10Operations, 10cloud-services-team (Kanban): etcd: listen-peer-urls only supports IP addresses and no FQDNs - https://phabricator.wikimedia.org/T226095 (10Joe) Hi @aborrero I think you were using `profile::etcd` instead than `profile::etcd::v3` which is the profile you should use with etcd3. [05:47:19] (03PS1) 10Marostegui: mariadb: Move db1077 from s3 to test-s4 [puppet] - 10https://gerrit.wikimedia.org/r/517969 (https://phabricator.wikimedia.org/T225981) [05:47:38] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517970 [05:48:56] 10Operations, 10cloud-services-team (Kanban): etcd: listen-peer-urls only supports IP addresses and no FQDNs - https://phabricator.wikimedia.org/T226095 (10Joe) 05Open→03Invalid p:05Triage→03Normal [05:50:18] (03CR) 10Marostegui: "PCC looks good: https://puppet-compiler.wmflabs.org/compiler1001/17037/" [puppet] - 10https://gerrit.wikimedia.org/r/517969 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [05:51:09] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517970 (owner: 10Marostegui) [05:51:58] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517970 (owner: 10Marostegui) [05:52:12] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517970 (owner: 10Marostegui) [05:54:55] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 56s) [05:55:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:55:00] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [05:58:47] 10Operations, 10serviceops, 10Core Platform Team Backlog (Later), 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10Joe) >>! In T210704#5267755, @KartikMistry wrote: >>>! In T210704#5267750, @Joe wrote: >> To correct myself: we already use... [06:06:53] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517971 [06:07:43] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517971 (owner: 10Marostegui) [06:08:32] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517971 (owner: 10Marostegui) [06:08:47] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517971 (owner: 10Marostegui) [06:09:38] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 57s) [06:09:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:09:42] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [06:12:07] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [06:12:53] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517972 [06:14:04] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517972 (owner: 10Marostegui) [06:14:53] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517972 (owner: 10Marostegui) [06:16:19] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517972 (owner: 10Marostegui) [06:16:25] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 55s) [06:16:25] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:16:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:16:30] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [06:18:20] !log rebooting sarin for some tests with updated intel-microcode for MDS (also covering Sandybridge server CPUs initially not supported by Intel) [06:18:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:18:41] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [06:18:42] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [06:18:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:18:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:22:48] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517973 [06:29:09] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517973 (owner: 10Marostegui) [06:29:58] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517973 (owner: 10Marostegui) [06:30:13] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1112 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517973 (owner: 10Marostegui) [06:31:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1112 in s3 T225981 (duration: 00m 56s) [06:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:31:07] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [06:32:59] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517974 (https://phabricator.wikimedia.org/T225981) [06:35:55] PROBLEM - apertium apy on scb1003 is CRITICAL: connect to address 10.64.32.153 and port 2737: Connection refused https://wikitech.wikimedia.org/wiki/CX [06:37:21] RECOVERY - apertium apy on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 5996 bytes in 5.520 second response time https://wikitech.wikimedia.org/wiki/CX [06:41:37] (03CR) 10Marostegui: [C: 03+2] db-eqiad,db-codfw.php: Remove db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517974 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [06:42:26] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517974 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [06:42:40] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db1077 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517974 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [06:43:47] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool and remove from config db1077 T225981 (duration: 00m 54s) [06:43:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:43:52] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [06:44:29] (03PS12) 10Elukey: Enable Kerberos in the Analytics Hadoop Test cluster [puppet] - 10https://gerrit.wikimedia.org/r/504280 [06:48:48] (03PS1) 10Giuseppe Lavagetto: Add debian package build [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/517979 [06:50:22] (03CR) 10Muehlenhoff: Add debian package build (031 comment) [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/517979 (owner: 10Giuseppe Lavagetto) [06:50:25] 10Operations, 10Discovery-Search, 10Datacenter-Switchover-2018: Warn when CirrusSearch is not configured to use local DC for an extended time - https://phabricator.wikimedia.org/T204135 (10dcausse) I changed the output to: ` { "wmfMasterDatacenter": "eqiad", "wmfEtcdLastModifiedIndex": 3672, "wmgCirrus... [06:51:05] (03CR) 10Marostegui: [C: 03+2] mariadb: Move db1077 from s3 to test-s4 [puppet] - 10https://gerrit.wikimedia.org/r/517969 (https://phabricator.wikimedia.org/T225981) (owner: 10Marostegui) [06:52:25] (03CR) 10Elukey: [C: 03+2] Enable Kerberos in the Analytics Hadoop Test cluster [puppet] - 10https://gerrit.wikimedia.org/r/504280 (owner: 10Elukey) [06:52:32] (03PS13) 10Elukey: Enable Kerberos in the Analytics Hadoop Test cluster [puppet] - 10https://gerrit.wikimedia.org/r/504280 [06:55:51] (03PS1) 10Ayounsi: Icinga add a 30s timeout to check_ospf.py [puppet] - 10https://gerrit.wikimedia.org/r/517980 (https://phabricator.wikimedia.org/T225905) [06:58:15] 10Operations, 10netops, 10Patch-For-Review: check_ospf.py fails on mr1-eqsin - https://phabricator.wikimedia.org/T225905 (10ayounsi) > (timeout is not respected) I couldn't reproduce that bug. And it seems to be respected on all my tries now. CR above should solve the issue. [07:00:55] !log installing intel-microcode updates to June 2019 release (microcode is unmodified for most CPUs except for Sandybridge/Core-X models) [07:00:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:14:39] (03PS1) 10Elukey: Add hiera host overrides to analytics1031 to bypass a broken disk [puppet] - 10https://gerrit.wikimedia.org/r/517981 [07:15:06] (03CR) 10Elukey: [C: 03+2] Add hiera host overrides to analytics1031 to bypass a broken disk [puppet] - 10https://gerrit.wikimedia.org/r/517981 (owner: 10Elukey) [07:15:53] !log Transfer dbprov1001:/srv/backups/tmp/db1112/sqldata to db1077 T225981 [07:15:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:15:59] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [07:18:28] 10Operations, 10ops-eqiad, 10Cassandra, 10DC-Ops, and 4 others: restbase-dev1006 has a broken disk - https://phabricator.wikimedia.org/T224260 (10MoritzMuehlenhoff) Can we please move forward with ordering a fixed disk? This broken disk causes subtle errors for all fleet-wide Cumin/debdeploy runs touching... [07:35:39] (03PS2) 10Bmansurov: Labs: enable surveys for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517789 (https://phabricator.wikimedia.org/T225819) [07:36:26] (03CR) 10jerkins-bot: [V: 04-1] Labs: enable surveys for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517789 (https://phabricator.wikimedia.org/T225819) (owner: 10Bmansurov) [07:36:57] PROBLEM - puppet last run on rdb2005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [07:38:02] 10Operations, 10ops-eqiad: Degraded RAID on db1077 - https://phabricator.wikimedia.org/T226154 (10ops-monitoring-bot) [07:38:38] (03PS1) 10Elukey: Add dfs.http.policy: 'HTTPS_ONLY' to the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/517983 (https://phabricator.wikimedia.org/T217412) [07:38:46] 10Operations, 10ops-eqiad: Degraded RAID on db1077 - https://phabricator.wikimedia.org/T226154 (10Marostegui) 05Open→03Declined This is a known BBU issue: T225981 T225391#5261662 [07:39:26] (03CR) 10Muehlenhoff: [C: 03+1] Add dfs.http.policy: 'HTTPS_ONLY' to the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/517983 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [07:41:15] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: db1077 crashed - https://phabricator.wikimedia.org/T225391 (10Marostegui) And after the reboot the battery fully failed T226154: ` Battery/Capacitor Count: 0 ` [07:41:51] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [07:42:21] RECOVERY - puppet last run on rdb2005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:42:33] PROBLEM - puppet last run on pc1008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [07:43:06] 10Operations, 10ops-codfw: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T226155 (10ops-monitoring-bot) [07:43:11] PROBLEM - puppet last run on restbase1026 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [07:43:53] PROBLEM - puppet last run on cloudcontrol1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[initramfs-tools] [07:44:46] 10Operations, 10ops-codfw: Degraded RAID on db2039 - https://phabricator.wikimedia.org/T226155 (10Marostegui) 05Open→03Declined This host is scheduled for decommissioning T225988, so no need to act on it. Just label the disk as broken so it doesn't get re-used [07:45:15] (03CR) 10Elukey: [C: 03+2] Add dfs.http.policy: 'HTTPS_ONLY' to the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/517983 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [07:46:09] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [07:46:13] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: decommission db2039 - https://phabricator.wikimedia.org/T225988 (10Marostegui) [07:46:23] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission: decommission db2039 - https://phabricator.wikimedia.org/T225988 (10Marostegui) Please mark disk #3 as broken so it doesn't get re-used {T226155} [07:46:47] 10Operations, 10serviceops, 10Core Platform Team Backlog (Later), 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10akosiaris) >>! In T210704#5270288, @Joe wrote: >>>! In T210704#5267755, @KartikMistry wrote: >>>>! In T210704#5267750, @Joe... [07:47:51] (03PS4) 10Alaa Sarhan: Introduce config variables for new terms store in mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517819 (https://phabricator.wikimedia.org/T226086) [07:48:06] (03PS5) 10Alaa Sarhan: Switch property terms migration to WRITE_BOTH on test wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517820 (https://phabricator.wikimedia.org/T225051) [07:48:16] (03PS5) 10Alaa Sarhan: Switch property terms migration to WRITE_BOTH on wikidata production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517674 (https://phabricator.wikimedia.org/T225051) [07:56:20] 10Operations, 10serviceops, 10Core Platform Team Backlog (Later), 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10akosiaris) >>! In T210704#5270230, @KartikMistry wrote: > @Joe @MoritzMuehlenhoff You can also check, https://phabricator.w... [07:57:12] 10Operations, 10Maps: Maps[12]004 /srv disk space is critical - https://phabricator.wikimedia.org/T224395 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on cumin1001.eqiad.wmnet for hosts: ` ['maps1003.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201906200757_gehel_1... [07:59:21] !log Stop MYSQL and reboot db2084 [07:59:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:02] (03CR) 10Alexandros Kosiaris: [C: 03+1] haproxy: Disable global logging to syslog [puppet] - 10https://gerrit.wikimedia.org/r/517755 (https://phabricator.wikimedia.org/T225284) (owner: 10Effie Mouzeli) [08:09:37] RECOVERY - puppet last run on pc1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:10:15] RECOVERY - puppet last run on restbase1026 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:10:55] RECOVERY - puppet last run on cloudcontrol1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [08:11:25] 10Operations, 10netops: Investigate cr2-eqord's disconnection from the rest of the network - https://phabricator.wikimedia.org/T224535 (10ayounsi) 05Open→03Resolved Opened T226158 for the tunnel. Everything else here is done. [08:14:01] (03PS1) 10Ayounsi: GRE tunnel between eqiad and eqord [dns] - 10https://gerrit.wikimedia.org/r/517989 (https://phabricator.wikimedia.org/T226158) [08:14:11] PROBLEM - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [08:16:48] (03PS1) 10Elukey: Remove unnecessary config for the Hadoop testing cluster [puppet] - 10https://gerrit.wikimedia.org/r/517994 [08:18:00] (03CR) 10Elukey: [C: 03+2] Remove unnecessary config for the Hadoop testing cluster [puppet] - 10https://gerrit.wikimedia.org/r/517994 (owner: 10Elukey) [08:20:12] (03PS1) 10Elukey: Add missing yarn container executor option to the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/517996 [08:20:49] (03CR) 10Elukey: [C: 03+2] Add missing yarn container executor option to the Hadoop test cluster [puppet] - 10https://gerrit.wikimedia.org/r/517996 (owner: 10Elukey) [08:28:24] (03CR) 10Ayounsi: [C: 03+2] Icinga add a 30s timeout to check_ospf.py [puppet] - 10https://gerrit.wikimedia.org/r/517980 (https://phabricator.wikimedia.org/T225905) (owner: 10Ayounsi) [08:28:36] (03PS2) 10Ayounsi: Icinga add a 30s timeout to check_ospf.py [puppet] - 10https://gerrit.wikimedia.org/r/517980 (https://phabricator.wikimedia.org/T225905) [08:33:47] 10Operations, 10netops, 10Patch-For-Review: check_ospf.py fails on mr1-eqsin - https://phabricator.wikimedia.org/T225905 (10ayounsi) 05Open→03Resolved a:03ayounsi Check is now green! https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=mr1-eqsin&service=OSPF+status [08:37:07] 10Operations, 10Maps: Maps[12]004 /srv disk space is critical - https://phabricator.wikimedia.org/T224395 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['maps1003.eqiad.wmnet'] ` and were **ALL** successful. [08:39:25] !log Stop Mysql on db1124: s1, s3, s5 and s8 to upgrade mysql, this will generate lag on labs [08:39:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:40:00] RECOVERY - Check the Netbox report-s- puppetdb for fail status. on netmon1002 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports [08:40:52] RECOVERY - EDAC syslog messages on wtp2020 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2020&var-datasource=codfw+prometheus/ops [08:41:22] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [08:45:40] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:48:50] (03PS1) 10Elukey: profile::kerberos::keytabs: add parent_dir_grp option [puppet] - 10https://gerrit.wikimedia.org/r/517997 (https://phabricator.wikimedia.org/T212257) [08:51:05] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1001/17040/" [puppet] - 10https://gerrit.wikimedia.org/r/517997 (https://phabricator.wikimedia.org/T212257) (owner: 10Elukey) [08:55:26] 10Operations, 10Maps: Change maps codfw replication factor for v4 keyspace - https://phabricator.wikimedia.org/T226161 (10Mathew.onipe) [08:55:31] 10Operations, 10Maps: Change maps codfw replication factor for v4 keyspace - https://phabricator.wikimedia.org/T226161 (10Mathew.onipe) p:05Triage→03Normal [08:56:32] Question - I was planning to switch on a feature flag in next SWAT, but with group1 still on wmf.8, that's not the greatest idea... [08:56:35] I could backport a few patches to wmf.8 (in an extension) - just not sure whether that'd be ok (IDK what procedure is now that train is delayed) [08:56:59] (03CR) 10Gehel: [C: 04-1] Add maps reboot cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/511819 (https://phabricator.wikimedia.org/T224072) (owner: 10Mathew.onipe) [08:57:05] (03CR) 10Elukey: [C: 03+2] profile::kerberos::keytabs: add parent_dir_grp option [puppet] - 10https://gerrit.wikimedia.org/r/517997 (https://phabricator.wikimedia.org/T212257) (owner: 10Elukey) [09:05:34] matthiasmullie: maybe ask on #wikimedia-releng [09:05:42] 10Operations, 10serviceops, 10Core Platform Team Backlog (Later), 10Services (next): Migrate node-based services in production to node10 - https://phabricator.wikimedia.org/T210704 (10KartikMistry) >>! In T210704#5270430, @akosiaris wrote: >>>! In T210704#5270230, @KartikMistry wrote: >> @Joe @MoritzMuehle... [09:06:46] right, thanks! [09:10:45] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10ema) Some additional observations: - We're currently running with TCP SACK disabled (T225998) - T... [09:14:10] anyone has the time to review a .well-known addition? I'm hoping to SWAT it today [09:14:13] https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/516055 [09:15:54] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Tomybrz) [09:17:35] !log cache nodes: resume rolling reboots for kernel and varnish upgrades T224694 T225998 T226048 [09:17:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:17:43] T226048: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 [09:17:43] T225998: Study performance impact of disabling TCP selective acknowledgments - https://phabricator.wikimedia.org/T225998 [09:17:43] T224694: cp3041 - Varnish frontend child restarted icinga alert - https://phabricator.wikimedia.org/T224694 [09:17:51] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [09:17:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:41] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Tgr) Not limited to dewiki / Germnany (unsurprisingly), there have been a bunch of reports from hu... [09:21:56] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [09:22:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:16] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [09:24:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:41] !log Remove dbprov1001:/srv/backups/tmp/db1112 - T225981 [09:25:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:46] T225981: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 [09:26:12] 10Operations, 10serviceops: create IRC channel for the Service Operations SRE subteam - https://phabricator.wikimedia.org/T211902 (10Joe) 05Open→03Resolved [09:29:34] (03PS1) 10Elukey: Revert "Remove unnecessary config for the Hadoop testing cluster" [puppet] - 10https://gerrit.wikimedia.org/r/517998 [09:29:40] (03PS2) 10Elukey: Revert "Remove unnecessary config for the Hadoop testing cluster" [puppet] - 10https://gerrit.wikimedia.org/r/517998 [09:30:48] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [09:30:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:59] (03CR) 10Elukey: [C: 03+2] Revert "Remove unnecessary config for the Hadoop testing cluster" [puppet] - 10https://gerrit.wikimedia.org/r/517998 (owner: 10Elukey) [09:40:35] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10ArielGlenn) I saw many slow en wiki page loads yesterday, including missing skins. (Logged in user... [09:40:54] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [09:44:17] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [09:44:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:26] jouncebot: now [09:44:26] No deployments scheduled for the next 1 hour(s) and 15 minute(s) [09:44:28] jouncebot: next [09:44:28] In 1 hour(s) and 15 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190620T1100) [09:44:31] matthiasmullie: ^^ [09:45:14] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:45:22] Ok, I'll go right ahead then - thanks! [09:50:49] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [09:50:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:56] (03PS3) 10Bmansurov: Labs: enable surveys for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517789 (https://phabricator.wikimedia.org/T225819) [09:51:27] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [09:51:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:05] !log ema@cumin1001 END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99) [09:56:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:07] <_joe_> !log upgraded service-checker T225707 [09:58:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:11] T225707: Upgrade python-service-checker across the fleet - https://phabricator.wikimedia.org/T225707 [09:58:57] 10Operations, 10serviceops, 10Core Platform Team Backlog (Watching / External), 10SCB, 10Services (watching): Upgrade python-service-checker across the fleet - https://phabricator.wikimedia.org/T225707 (10Joe) 05Open→03Resolved a:03Joe [10:10:59] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [10:11:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:27] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [10:11:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:11:36] https://commons.wikimedia.org/wiki/Commons:Village_pump#Timouts [10:11:46] problems reported by 4 different users [10:14:16] yannf: thanks, this is probably https://phabricator.wikimedia.org/T226048 we're currently looking into it [10:15:02] ok thanks [10:16:28] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Yann) https://commons.wikimedia.org/wiki/Commons:Village_pump#Timouts [10:17:14] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [10:17:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:40] 10Operations: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371 (10akosiaris) @MoritzMuehlenhoff does the above still stand? Should we close this? [10:22:07] (03PS1) 10Elukey: hadoop: add HTTP kerberos keytabs [puppet] - 10https://gerrit.wikimedia.org/r/518001 (https://phabricator.wikimedia.org/T212257) [10:22:42] (03CR) 10Elukey: [C: 03+2] hadoop: add HTTP kerberos keytabs [puppet] - 10https://gerrit.wikimedia.org/r/518001 (https://phabricator.wikimedia.org/T212257) (owner: 10Elukey) [10:23:21] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [10:23:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:24:19] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [10:24:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:04] (03PS1) 10Elukey: hadoop: fix HTTP keytab name in Hadoop testing cluster [puppet] - 10https://gerrit.wikimedia.org/r/518002 (https://phabricator.wikimedia.org/T212257) [10:31:00] (03CR) 10Elukey: [C: 03+2] hadoop: fix HTTP keytab name in Hadoop testing cluster [puppet] - 10https://gerrit.wikimedia.org/r/518002 (https://phabricator.wikimedia.org/T212257) (owner: 10Elukey) [10:31:02] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [10:31:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:39] !log Deploy schema change on the fishbowl wikis list on T225643 [10:33:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:44] T225643: Schema change to oathauth_users - https://phabricator.wikimedia.org/T225643 [10:34:11] (03PS1) 10Elukey: hadoop: fix HTTP keytab property in Hadoop testing cluster config [puppet] - 10https://gerrit.wikimedia.org/r/518003 [10:34:25] (03CR) 10Elukey: [V: 03+2 C: 03+2] hadoop: fix HTTP keytab property in Hadoop testing cluster config [puppet] - 10https://gerrit.wikimedia.org/r/518003 (owner: 10Elukey) [10:35:12] (03CR) 10Matthias Mullie: [C: 03+2] [SDC] Enable depicts qualifiers on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517381 (owner: 10Matthias Mullie) [10:35:22] (03CR) 10Matthias Mullie: [C: 03+2] Increase rate limits for newbies on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516633 (https://phabricator.wikimedia.org/T225148) (owner: 10Matthias Mullie) [10:39:29] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Aklapper) @Yann: There is nothing new in that thread? Feel free to add `{{tracked|Txxxxxx}}` on-wi... [10:39:43] (03PS3) 10Matthias Mullie: Increase rate limits for newbies on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516633 (https://phabricator.wikimedia.org/T225148) [10:42:01] (03CR) 10Matthias Mullie: [C: 03+2] Increase rate limits for newbies on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516633 (https://phabricator.wikimedia.org/T225148) (owner: 10Matthias Mullie) [10:42:59] (03Merged) 10jenkins-bot: Increase rate limits for newbies on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516633 (https://phabricator.wikimedia.org/T225148) (owner: 10Matthias Mullie) [10:43:12] (03PS3) 10Matthias Mullie: [SDC] Enable depicts qualifiers on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517381 [10:43:14] (03CR) 10jenkins-bot: Increase rate limits for newbies on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516633 (https://phabricator.wikimedia.org/T225148) (owner: 10Matthias Mullie) [10:43:16] (03CR) 10Matthias Mullie: [C: 03+2] [SDC] Enable depicts qualifiers on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517381 (owner: 10Matthias Mullie) [10:44:15] (03Merged) 10jenkins-bot: [SDC] Enable depicts qualifiers on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517381 (owner: 10Matthias Mullie) [10:46:22] (03CR) 10jenkins-bot: [SDC] Enable depicts qualifiers on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517381 (owner: 10Matthias Mullie) [10:48:33] !log mlitn@deploy1001 Started scap: [SDC] Enable depicts qualifiers on Commons & increase rate limits [10:48:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:12] (03PS4) 10Giuseppe Lavagetto: New library to interact with poolcounter from python [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/517828 [10:54:14] (03PS2) 10Giuseppe Lavagetto: Add debian package build [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/517979 [10:55:53] (03CR) 10jerkins-bot: [V: 04-1] New library to interact with poolcounter from python [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/517828 (owner: 10Giuseppe Lavagetto) [10:55:56] (03CR) 10jerkins-bot: [V: 04-1] Add debian package build [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/517979 (owner: 10Giuseppe Lavagetto) [10:58:29] !log rebooting scb100[12], mw2139 for MDS kernel update (their CPUs were previously unsupported by Intel, but are now covered with the new release) [10:58:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:42] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [10:58:44] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:58:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:51] !log jmm@cumin2001 START - Cookbook sre.hosts.downtime [10:58:52] !log jmm@cumin2001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [10:58:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: How many deployers does it take to do European Mid-day SWAT(Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190620T1100). [11:00:04] tgr, kart_, bmansurov, alaa_wmde, and Amir1: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:09] here [11:00:40] o/ I can deploy mine and alaa's [11:00:45] o/ [11:00:57] are people self-deploying these days? [11:00:57] here [11:01:28] yes, it seems. [11:02:01] ok, should I start then? [11:02:20] tgr: yes. Let's go by order listed in Deployment page. [11:02:28] tgr: let me when done. [11:02:54] 10Operations, 10Traffic, 10Patch-For-Review: Make cp1099 the new pinkunicorn - https://phabricator.wikimedia.org/T202966 (10MoritzMuehlenhoff) Or maybe use one of cp1071-cp1074, the servers which were used for the original ATS tests? These were bought in 2015 and are currently unused. [11:03:20] (03PS2) 10Gergő Tisza: Fix import group name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516053 [11:03:47] I don't have deployment rights, can someone else deploy mine? [11:03:49] (03CR) 10Gergő Tisza: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516053 (owner: 10Gergő Tisza) [11:04:10] bmansurov: I can help with yours [11:04:18] Amir1: thanks! [11:04:51] (03Merged) 10jenkins-bot: Fix import group name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516053 (owner: 10Gergő Tisza) [11:05:13] (03PS1) 10Elukey: role::analytics_test_cluster::coord: set more use_kerberos flags [puppet] - 10https://gerrit.wikimedia.org/r/518010 [11:06:08] (03PS6) 10KartikMistry: Remove ExternalGuidanceEnableContextDetection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504261 (https://phabricator.wikimedia.org/T219819) [11:06:37] (03CR) 10jenkins-bot: Fix import group name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516053 (owner: 10Gergő Tisza) [11:06:40] (03CR) 10Elukey: [C: 03+2] role::analytics_test_cluster::coord: set more use_kerberos flags [puppet] - 10https://gerrit.wikimedia.org/r/518010 (owner: 10Elukey) [11:08:17] scap pull hangs on mwdebug1002 at the cdb-rebuild step [11:08:34] or maybe it's just very sloww? [11:08:35] oh [11:08:57] yeah, done now, took two minutes [11:09:08] !log mlitn@deploy1001 Finished scap: [SDC] Enable depicts qualifiers on Commons & increase rate limits (duration: 20m 34s) [11:09:08] don't recall that taking nontrivial time in the past [11:09:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:06] tgr: should I go ahead? [11:10:22] (03PS1) 10Elukey: role::analytics_test_cluster::coordinator: add another use_kerberos flag [puppet] - 10https://gerrit.wikimedia.org/r/518012 [11:10:25] just a sec [11:10:54] this was the debug host scap, not the normal scap [11:11:03] OK! [11:11:05] (03CR) 10Elukey: [C: 03+2] role::analytics_test_cluster::coordinator: add another use_kerberos flag [puppet] - 10https://gerrit.wikimedia.org/r/518012 (owner: 10Elukey) [11:12:26] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:516053|Fix import group name]] (duration: 00m 57s) [11:12:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:58] kart_: thx, you can go ahead [11:13:05] cool. [11:13:20] (03CR) 10KartikMistry: [C: 03+2] Remove ExternalGuidanceEnableContextDetection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504261 (https://phabricator.wikimedia.org/T219819) (owner: 10KartikMistry) [11:14:16] (03Merged) 10jenkins-bot: Remove ExternalGuidanceEnableContextDetection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504261 (https://phabricator.wikimedia.org/T219819) (owner: 10KartikMistry) [11:14:29] !log rebooting mw2235, mw2255, mw2271 for MDS kernel update [11:14:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:17] (03CR) 10jenkins-bot: Remove ExternalGuidanceEnableContextDetection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/504261 (https://phabricator.wikimedia.org/T219819) (owner: 10KartikMistry) [11:17:42] (03PS1) 10Urbanecm: Fix wgMetaNamespaceTalk for aswikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518014 (https://phabricator.wikimedia.org/T226027) [11:20:16] Amir1 or whoever is responsible for this window: I woudl like to get https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/518014 deployed today, it's a bugfix and would like to go it out this week. Let me know, thanks! [11:20:24] !log kartik@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT [[gerrit:504261|Remove ExternalGuidanceEnableContentDetection]] (T219819) (duration: 01m 00s) [11:20:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:29] T219819: Enable context detection in all languages - https://phabricator.wikimedia.org/T219819 [11:20:30] PROBLEM - MediaWiki memcached error rate on graphite1004 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [11:20:50] Urbanecm: today is a little bit crazy [11:21:39] yeah, calendar's quite full [11:21:57] well, I still have morning SWAT, so just if time permits :) [11:22:17] moritzm: this memcached is due to upgrade or something else? [11:22:26] (03CR) 10Lucas Werkmeister (WMDE): Fix wgMetaNamespaceTalk for aswikisource (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518014 (https://phabricator.wikimedia.org/T226027) (owner: 10Urbanecm) [11:22:28] Amir1: I'm done with patch. [11:23:36] the memcached errors right after the deployment are a bit strange [11:24:04] unrelated to any reboots AFAICT [11:24:32] the peak seems gone, checking in logstash [11:24:44] (03PS2) 10Urbanecm: Fix wgMetaNamespaceTalk for aswikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518014 (https://phabricator.wikimedia.org/T226027) [11:24:47] (03CR) 10Urbanecm: Fix wgMetaNamespaceTalk for aswikisource (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518014 (https://phabricator.wikimedia.org/T226027) (owner: 10Urbanecm) [11:24:50] RECOVERY - MediaWiki memcached error rate on graphite1004 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [11:25:19] elukey: Should I move forward? [11:25:50] Amir1: lemme check a thing one sec, just to be sure [11:26:15] so on mw1300 mcrouter seemed to complain about connect errors to mw2271.codfw.wmnet. [11:26:24] that is a mcrouter codfw proxy [11:26:59] yep same thing on another host [11:27:16] elukey@mw2271:~$ uptime [11:27:17] 11:27:10 up 9 min, 1 user, load average: 0.04, 0.19, 0.16 [11:27:59] moritzm: --^ was it part of a reboot batch? [11:28:29] (03CR) 10Ladsgroup: [C: 03+2] "noop for prod" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517789 (https://phabricator.wikimedia.org/T225819) (owner: 10Bmansurov) [11:28:39] ah, indeed! I totally forgot about the codfw proxies! [11:28:47] Amir1: you can proceed :) [11:28:59] in fact that explains it totally, sorry for my incorrect earlier statement [11:29:02] bmansurov: for beta cluster patches, they can go in at any time, just let a deployer know [11:29:14] Thanks! [11:29:17] Amir1: ok, thanks! [11:29:21] (03Merged) 10jenkins-bot: Labs: enable surveys for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517789 (https://phabricator.wikimedia.org/T225819) (owner: 10Bmansurov) [11:29:33] moritzm: all good! I just wanted to make sure that we knew the root cause, I am always scared when I see the memcached alerts :D [11:29:36] (03CR) 10jenkins-bot: Labs: enable surveys for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517789 (https://phabricator.wikimedia.org/T225819) (owner: 10Bmansurov) [11:29:56] yep! [11:30:19] bmansurov: so it's merged and rebased, it'll be live in half an hour in beta cluster automatically (it's outside of my control, sorry) [11:30:33] Amir1: all good [11:31:01] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517819 (https://phabricator.wikimedia.org/T226086) (owner: 10Alaa Sarhan) [11:32:48] (03PS1) 10Elukey: hive::server: add missing use_kerberos flag [puppet/cdh] - 10https://gerrit.wikimedia.org/r/518016 [11:33:06] (03PS5) 10Ladsgroup: Introduce config variables for new terms store in mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517819 (https://phabricator.wikimedia.org/T226086) (owner: 10Alaa Sarhan) [11:33:08] (03CR) 10Elukey: [V: 03+2 C: 03+2] hive::server: add missing use_kerberos flag [puppet/cdh] - 10https://gerrit.wikimedia.org/r/518016 (owner: 10Elukey) [11:33:13] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517819 (https://phabricator.wikimedia.org/T226086) (owner: 10Alaa Sarhan) [11:34:13] (03Merged) 10jenkins-bot: Introduce config variables for new terms store in mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517819 (https://phabricator.wikimedia.org/T226086) (owner: 10Alaa Sarhan) [11:34:15] (03PS1) 10Elukey: Update cdh submodule to its latest version [puppet] - 10https://gerrit.wikimedia.org/r/518017 [11:34:27] (03CR) 10jenkins-bot: Introduce config variables for new terms store in mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517819 (https://phabricator.wikimedia.org/T226086) (owner: 10Alaa Sarhan) [11:35:25] (03CR) 10Elukey: [V: 03+2 C: 03+2] Update cdh submodule to its latest version [puppet] - 10https://gerrit.wikimedia.org/r/518017 (owner: 10Elukey) [11:36:38] Testing prod [11:39:43] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:517819|Introduce config variables for new terms store in mediawiki-config (T226086)]] (duration: 00m 57s) [11:39:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:48] T226086: Introduce config variables for new terms store in mediawiki-config - https://phabricator.wikimedia.org/T226086 [11:42:14] !log ladsgroup@deploy1001 Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:517819|Introduce config variables for new terms store in mediawiki-config (T226086)]], Part II (duration: 00m 57s) [11:42:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:34] (03PS6) 10Ladsgroup: Switch property terms migration to WRITE_BOTH on test wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517820 (https://phabricator.wikimedia.org/T225051) (owner: 10Alaa Sarhan) [11:43:31] (03CR) 10Ladsgroup: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517820 (https://phabricator.wikimedia.org/T225051) (owner: 10Alaa Sarhan) [11:44:26] (03Merged) 10jenkins-bot: Switch property terms migration to WRITE_BOTH on test wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517820 (https://phabricator.wikimedia.org/T225051) (owner: 10Alaa Sarhan) [11:44:52] (03PS1) 10Effie Mouzeli: role::mediawiki::jobrunner: Add php_fpm_exporter [puppet] - 10https://gerrit.wikimedia.org/r/518018 [11:47:17] (03CR) 10jenkins-bot: Switch property terms migration to WRITE_BOTH on test wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517820 (https://phabricator.wikimedia.org/T225051) (owner: 10Alaa Sarhan) [11:49:57] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:517820|Switch property terms migration to WRITE_BOTH on test wikidata (T225051)]] (duration: 00m 58s) [11:50:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:03] T225051: Switch `tmpPropertyTermsMigrationStage` to MIGRATION_WRITE_BOTH - https://phabricator.wikimedia.org/T225051 [11:53:40] (03PS1) 10Marostegui: install_server: Do not reimage db1077 [puppet] - 10https://gerrit.wikimedia.org/r/518019 [11:54:01] !log EU SWAT is done [11:54:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:49] (03CR) 10Marostegui: [C: 03+2] install_server: Do not reimage db1077 [puppet] - 10https://gerrit.wikimedia.org/r/518019 (owner: 10Marostegui) [11:56:23] (03PS1) 10Arturo Borrero Gonzalez: toolforge: k8s: etcd: restart etcd service when certs change [puppet] - 10https://gerrit.wikimedia.org/r/518020 (https://phabricator.wikimedia.org/T226098) [11:56:38] (03Abandoned) 10Arturo Borrero Gonzalez: etcd::ssl: restart etcd service when the SSL cert changes [puppet] - 10https://gerrit.wikimedia.org/r/512338 (https://phabricator.wikimedia.org/T169287) (owner: 10Arturo Borrero Gonzalez) [11:57:14] (03Abandoned) 10Arturo Borrero Gonzalez: etcd: make monitoring optional [puppet] - 10https://gerrit.wikimedia.org/r/514710 (owner: 10Arturo Borrero Gonzalez) [11:57:56] (03Abandoned) 10Arturo Borrero Gonzalez: profile: etcd: make peer list configurable [puppet] - 10https://gerrit.wikimedia.org/r/514751 (https://phabricator.wikimedia.org/T215531) (owner: 10Arturo Borrero Gonzalez) [12:04:52] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "This approach doesn't work as is, since the certificates aren't resources in the puppet catalog for the server:" [puppet] - 10https://gerrit.wikimedia.org/r/518020 (https://phabricator.wikimedia.org/T226098) (owner: 10Arturo Borrero Gonzalez) [12:07:26] (03PS1) 10Effie Mouzeli: mediawiki::php: increase opcache on jobrunners [puppet] - 10https://gerrit.wikimedia.org/r/518023 (https://phabricator.wikimedia.org/T224857) [12:11:00] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [12:15:18] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:24:49] Amir1: swat looks all done yes? [12:25:10] I need to do a quick mw config [12:33:50] jijiki: looks so, as per the SAL entry earlier [12:34:07] yeah I am just making sure, I read SAL :) [12:34:26] or SAL read me :p [12:36:31] !log updated jenkins package on apt.wikimedia.org to 2.176.1 for jessie and stretch (T226159) [12:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:36:37] T226159: Upgrade jenkins instances to 2.171.1 - https://phabricator.wikimedia.org/T226159 [12:36:42] ^ hashar [12:36:47] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [12:36:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:01] moritzm: thank you :] [12:37:23] moritzm: I wasn't sure which tag to add in phabricator so I went ahead and subscribed just you. Hope you dont mind [12:37:26] the stable repo pointed to 2.176, though [12:37:35] sure, np [12:37:41] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [12:37:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:40:27] !log Upgrading java/jenkins on releases* hosts # T226159 [12:40:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:41:16] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [12:41:41] Unpacking puppet-common (5.5.10-2~deb9u1) over (4.8.2-5) ... [12:41:43] pff [12:42:47] puppet-common is just a transitional package, you can simply remove it [12:42:56] :) [12:43:35] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Bouzinac) Interested in the solving of this pb. What I can say is that a common template (https://... [12:44:30] (03CR) 10Effie Mouzeli: "Should we discuss a bit if we want to allow php there? Is it something that used to be allowed and then we stopped?" [puppet] - 10https://gerrit.wikimedia.org/r/517926 (owner: 10Krinkle) [12:44:43] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [12:44:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:57] !log Upgrading packages on contint1001 [12:45:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:38] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:48:14] PROBLEM - IPsec on cp1080 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:48:16] PROBLEM - IPsec on cp1076 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:48:20] PROBLEM - IPsec on cp1082 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:48:20] PROBLEM - IPsec on cp1090 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:48:20] PROBLEM - IPsec on cp1086 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:48:36] PROBLEM - IPsec on cp5003 is CRITICAL: Strongswan CRITICAL - ok: 38 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:48:36] PROBLEM - IPsec on cp5005 is CRITICAL: Strongswan CRITICAL - ok: 38 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:48:36] PROBLEM - IPsec on cp5002 is CRITICAL: Strongswan CRITICAL - ok: 38 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:48:44] PROBLEM - IPsec on cp5006 is CRITICAL: Strongswan CRITICAL - ok: 38 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:48:56] PROBLEM - IPsec on cp5004 is CRITICAL: Strongswan CRITICAL - ok: 38 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:49:23] that's me, sorry for the spam ^ [12:49:30] PROBLEM - IPsec on cp5001 is CRITICAL: Strongswan CRITICAL - ok: 38 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:49:30] PROBLEM - IPsec on cp1078 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:49:32] PROBLEM - IPsec on cp1088 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:49:32] PROBLEM - IPsec on cp1084 is CRITICAL: Strongswan CRITICAL - ok: 34 not-conn: cp2017_v4, cp2017_v6 https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:50:54] !log powercycle cp2017, stuck rebooting [12:50:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:04] (03PS3) 10Marostegui: db-eqiad,db-codfw.php: Change last parsercache key [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517807 (https://phabricator.wikimedia.org/T210725) [12:53:46] (03CR) 10Effie Mouzeli: [V: 03+1] "LGTM https://puppet-compiler.wmflabs.org/compiler1001/17042/" [puppet] - 10https://gerrit.wikimedia.org/r/518018 (owner: 10Effie Mouzeli) [12:54:06] RECOVERY - IPsec on cp1090 is OK: Strongswan OK - 36 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:54:06] RECOVERY - IPsec on cp1086 is OK: Strongswan OK - 36 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:54:24] effie: sorry I was afk for meeting, is everything alright now? [12:54:24] RECOVERY - IPsec on cp5005 is OK: Strongswan OK - 40 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:54:24] RECOVERY - IPsec on cp5003 is OK: Strongswan OK - 40 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:54:24] RECOVERY - IPsec on cp5002 is OK: Strongswan OK - 40 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:54:32] RECOVERY - IPsec on cp5006 is OK: Strongswan OK - 40 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:54:44] RECOVERY - IPsec on cp5004 is OK: Strongswan OK - 40 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:54:56] Amir1: nah just doublechecking, tx! [12:55:18] RECOVERY - IPsec on cp5001 is OK: Strongswan OK - 40 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:55:18] RECOVERY - IPsec on cp1078 is OK: Strongswan OK - 36 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:55:18] RECOVERY - IPsec on cp1088 is OK: Strongswan OK - 36 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:55:20] RECOVERY - IPsec on cp1084 is OK: Strongswan OK - 36 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:55:28] RECOVERY - IPsec on cp1080 is OK: Strongswan OK - 36 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:55:30] RECOVERY - IPsec on cp1076 is OK: Strongswan OK - 36 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:55:32] RECOVERY - IPsec on cp1082 is OK: Strongswan OK - 36 ESP OK https://wikitech.wikimedia.org/wiki/Monitoring/strongswan [12:56:24] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [12:56:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:21] (03CR) 10Effie Mouzeli: [V: 03+1] "LGTM https://puppet-compiler.wmflabs.org/compiler1002/17043/mw1296.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/518023 (https://phabricator.wikimedia.org/T224857) (owner: 10Effie Mouzeli) [12:58:36] (03CR) 10Ladsgroup: Introduce config variables for new terms store in mediawiki-config (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517819 (https://phabricator.wikimedia.org/T226086) (owner: 10Alaa Sarhan) [12:59:01] !log Disable puppet on jobrunners to merge 518023 and 518018 [12:59:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:25] (03PS1) 10Ladsgroup: Fix variable naming [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518028 (https://phabricator.wikimedia.org/T226086) [13:01:30] (03CR) 10Ladsgroup: "https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/518028" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517819 (https://phabricator.wikimedia.org/T226086) (owner: 10Alaa Sarhan) [13:02:37] 10Operations, 10Dumps-Generation, 10SDC General, 10Wikidata: Capacity planning for Commons Structured Data - https://phabricator.wikimedia.org/T226093 (10ArielGlenn) @jcrespo I'm adding you too, please remove yourself if you're already covered by other tasks. @MarkTraceur The number of new revisions to wi... [13:02:39] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] role::mediawiki::jobrunner: Add php_fpm_exporter [puppet] - 10https://gerrit.wikimedia.org/r/518018 (owner: 10Effie Mouzeli) [13:02:53] (03PS2) 10Effie Mouzeli: role::mediawiki::jobrunner: Add php_fpm_exporter [puppet] - 10https://gerrit.wikimedia.org/r/518018 [13:04:43] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [13:04:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:26] hmm looks like jenkins is taking its time a bit [13:07:36] (03PS1) 10Marostegui: dbproxy: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/518029 (https://phabricator.wikimedia.org/T222978) [13:07:46] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Agusbou2015) When will the fix be ready? [13:08:06] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[jenkins] [13:08:36] oh great [13:09:12] hashar: is there an issue with jenkins? [13:09:17] I rebased https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518018/ [13:09:35] and it does not look like it is moving forward [13:09:40] bah [13:09:42] ;( [13:09:59] bah in greek means no :p [13:10:13] from what I can see there are very large commits being processed at the moment [13:10:26] ok, I will wait then [13:10:34] (03CR) 10Lucas Werkmeister (WMDE): [C: 03+1] Fix variable naming [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518028 (https://phabricator.wikimedia.org/T226086) (owner: 10Ladsgroup) [13:10:36] jijiki: all executors are busy unfortunately, so gotta wait for a job to complete and free up a slot [13:10:55] (03CR) 10Marostegui: "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler1002/17044/" [puppet] - 10https://gerrit.wikimedia.org/r/518029 (https://phabricator.wikimedia.org/T222978) (owner: 10Marostegui) [13:11:10] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [13:11:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:15] but I am hoping to have less jobs triggered and thus less congestion [13:11:44] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [13:13:02] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Reedy) >>! In T226109#5271093, @Agusbou2015 wrote: > When will the fix be ready? Sometime after someone works out what's broken [13:14:44] (03PS1) 10Elukey: profile::kerberos::kerberos-puppet-wrapper: add principals [puppet] - 10https://gerrit.wikimedia.org/r/518030 (https://phabricator.wikimedia.org/T212257) [13:16:04] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [13:16:24] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [13:16:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:14] (03CR) 10Jhedden: "@Anomie this task T101631 is specific to the deleted revisions. I quoted your question in the task, but you may want to open a new task sp" [puppet] - 10https://gerrit.wikimedia.org/r/515062 (https://phabricator.wikimedia.org/T101631) (owner: 10Jhedden) [13:19:26] (03PS2) 10Marostegui: dbproxy: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/518029 (https://phabricator.wikimedia.org/T222978) [13:20:02] (03CR) 10Marostegui: [C: 03+2] dbproxy: Depool labsdb1011 [puppet] - 10https://gerrit.wikimedia.org/r/518029 (https://phabricator.wikimedia.org/T222978) (owner: 10Marostegui) [13:20:45] !log Reload haproxy on dbproxy1010 and dbproxy1011 to depool labsdb1011 - T222978 [13:20:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:20:52] T222978: Compress and defragment tables on labsdb hosts - https://phabricator.wikimedia.org/T222978 [13:21:30] !log depool mw1311 [13:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:56] !log ema@cumin1001 END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99) [13:23:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:08] !log Stop replication on labsdb1011 to defragment tables T222978 [13:23:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:48] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [13:28:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:14] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Gestumblindi) Seems to be better. Solved by the rebooting mentioned by @ema ? I don't experience t... [13:29:21] (03CR) 10Giuseppe Lavagetto: [C: 03+1] mediawiki::php: increase opcache on jobrunners [puppet] - 10https://gerrit.wikimedia.org/r/518023 (https://phabricator.wikimedia.org/T224857) (owner: 10Effie Mouzeli) [13:31:11] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [13:31:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:03] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [13:35:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:15] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Papaul) a:05Papaul→03Marostegui Disk replaced [13:38:28] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [13:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:25] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Papaul) a:05Papaul→03Marostegui Disk replaced [13:43:48] (03CR) 10Giuseppe Lavagetto: "Generally LGTM, I just have one clarification to ask regarding file permissions." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero) [13:43:56] (03PS2) 10Elukey: profile::kerberos::kerberos-puppet-wrapper: add principals [puppet] - 10https://gerrit.wikimedia.org/r/518030 (https://phabricator.wikimedia.org/T212257) [13:46:07] (03CR) 10Elukey: [C: 03+2] profile::kerberos::kerberos-puppet-wrapper: add principals [puppet] - 10https://gerrit.wikimedia.org/r/518030 (https://phabricator.wikimedia.org/T212257) (owner: 10Elukey) [13:50:03] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [13:50:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:55:57] (03PS5) 10Giuseppe Lavagetto: New library to interact with poolcounter from python [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/517828 [13:55:59] (03PS3) 10Giuseppe Lavagetto: Add debian package build [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/517979 [13:56:21] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [13:56:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:28] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [13:58:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:00:05] Amir1: Dear deployers, time to do the Deploy property terms migration WRITE_BOTH on production wikidata deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190620T1400). [14:00:56] o/ [14:01:24] ACKNOWLEDGEMENT - HP RAID on db2043 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:3 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226186 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [14:01:27] 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T226186 (10ops-monitoring-bot) [14:01:46] (03CR) 10Ladsgroup: [C: 03+2] Fix variable naming [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518028 (https://phabricator.wikimedia.org/T226086) (owner: 10Ladsgroup) [14:02:44] (03Merged) 10jenkins-bot: Fix variable naming [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518028 (https://phabricator.wikimedia.org/T226086) (owner: 10Ladsgroup) [14:02:58] (03CR) 10jenkins-bot: Fix variable naming [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518028 (https://phabricator.wikimedia.org/T226086) (owner: 10Ladsgroup) [14:03:35] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Wurgl) Agree to @Gestumblindi ! I just tried and everything is less than 3 seconds. [14:03:49] (03CR) 10Fsero: "thanks for the review :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero) [14:04:04] !log ema@cumin1001 END (FAIL) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=99) [14:04:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:29] (03CR) 10Fsero: k8s, deploy: introducing helmfile for manage charts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/517888 (https://phabricator.wikimedia.org/T212130) (owner: 10Fsero) [14:04:43] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) Thanks! It is rebuilding ` root@db2058:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337DC560) Port Name: 1I Port Name: 2I Gen8 Ser... [14:05:12] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) a:05Marostegui→03Papaul The disk failed, can we try another one? Thanks! ` physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Failed) ` [14:05:39] 10Operations, 10Citoid, 10serviceops, 10Patch-For-Review, and 2 others: allow zotero container nodejs server to define the amount of heap used instead of the fixed limit of 1.7Gi - https://phabricator.wikimedia.org/T213414 (10fsero) 05Open→03Resolved [14:06:43] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [14:06:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:06:52] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) Sorry, this was for db2043 [14:07:07] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) a:05Papaul→03Marostegui [14:07:24] (03CR) 10Reedy: [C: 03+1] "Boo we have to add another proper docroot folder again. But c'est la vie" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [14:07:59] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) a:05Marostegui→03Papaul The disk has failed - can we try a different one? ` root@db2043:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337FA... [14:08:37] (03PS2) 10Ladsgroup: Set EntityUsageTable addUsage batch size to 150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517669 (https://phabricator.wikimedia.org/T225500) [14:09:01] 10Operations, 10Prod-Kubernetes, 10serviceops, 10Kubernetes, and 2 others: improve docker registry architecture - https://phabricator.wikimedia.org/T209271 (10fsero) [14:09:04] 10Operations, 10Prod-Kubernetes, 10serviceops, 10Kubernetes, 10User-fsero: placeholder task for migration problems - https://phabricator.wikimedia.org/T222210 (10fsero) 05Open→03Resolved new registry has been in production for some time without issues, there are some leftovers that need to be address... [14:09:06] 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T226186 (10Marostegui) 05Open→03Declined Duplicate of T225889 [14:09:09] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:518028|Switch property terms migration to WRITE_BOTH on test wikidata (T225051)]] (duration: 00m 56s) [14:09:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:14] T225051: Switch `tmpPropertyTermsMigrationStage` to MIGRATION_WRITE_BOTH - https://phabricator.wikimedia.org/T225051 [14:10:59] !log akosiaris@puppetmaster1001 conftool action : set/pooled=no; selector: name=kubernetes2001.* [14:11:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:22] !log ema@cumin1001 START - Cookbook sre.hosts.upgrade-and-reboot [14:11:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:21] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet [14:12:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:38] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet [14:13:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:01] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [14:14:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:13] (03CR) 10Ladsgroup: [C: 03+2] Set EntityUsageTable addUsage batch size to 150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517669 (https://phabricator.wikimedia.org/T225500) (owner: 10Ladsgroup) [14:14:18] !log akosiaris@puppetmaster1001 conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet [14:14:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:27] !log ladsgroup@deploy1001 Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:518028|Switch property terms migration to WRITE_BOTH on test wikidata (T225051)]] (duration: 00m 56s) [14:14:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:32] T225051: Switch `tmpPropertyTermsMigrationStage` to MIGRATION_WRITE_BOTH - https://phabricator.wikimedia.org/T225051 [14:15:33] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+2] mediawiki::php: increase opcache on jobrunners [puppet] - 10https://gerrit.wikimedia.org/r/518023 (https://phabricator.wikimedia.org/T224857) (owner: 10Effie Mouzeli) [14:15:50] (03CR) 10Reedy: [C: 04-1] "Per discussion on IRC, these might want to be in docroot/wwwportal" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [14:15:57] (03PS2) 10Effie Mouzeli: mediawiki::php: increase opcache on jobrunners [puppet] - 10https://gerrit.wikimedia.org/r/518023 (https://phabricator.wikimedia.org/T224857) [14:15:59] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: name=kubernetes2001.* [14:16:00] (03Merged) 10jenkins-bot: Set EntityUsageTable addUsage batch size to 150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517669 (https://phabricator.wikimedia.org/T225500) (owner: 10Ladsgroup) [14:16:05] !log akosiaris@puppetmaster1001 conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-analytics,name=kubernetes2001.codfw.wmnet [14:16:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:30] (03CR) 10jenkins-bot: Set EntityUsageTable addUsage batch size to 150 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517669 (https://phabricator.wikimedia.org/T225500) (owner: 10Ladsgroup) [14:18:18] !log start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052) [14:18:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:23] T225052: Run Property Terms Rebuild script - https://phabricator.wikimedia.org/T225052 [14:18:29] !log ema@cumin1001 END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0) [14:18:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:20] (03PS2) 10Jhedden: wiki replicas: unfilter deleted rev_len versions [puppet] - 10https://gerrit.wikimedia.org/r/515062 (https://phabricator.wikimedia.org/T101631) [14:22:04] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) @Papaul has removed and inserted back the disk and it is rebuilding again. Let's see if it goes fine this time or we have to replace it completely ` root@db2043:~# hpssacli controller all sho... [14:22:06] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Agusbou2015) >>! In T226109#5271108, @Reedy wrote: >>>! In T226109#5271093, @Agusbou2015 wrote: >> When will the fix be ready? > > Sometime after someo... [14:22:09] 10Operations, 10DNS, 10Matrix, 10Traffic, and 2 others: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Joe) @tgr just to be sure, you just want the url https://wikimedia.org/.well_known to be served from a static file? [14:22:12] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1014 with 10G interfaces - https://phabricator.wikimedia.org/T226188 (10Andrew) [14:22:15] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): Move cloudvirt hosts to 10Gb ethernet - https://phabricator.wikimedia.org/T216195 (10Andrew) [14:22:18] 10Operations, 10Wikibase-Containers, 10Wikidata, 10wikidata-tech-focus, 10Release Pipeline (Blubber): Create a wmf production ready nginx image - https://phabricator.wikimedia.org/T209292 (10fsero) [14:22:48] !log ladsgroup@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:517669|Set EntityUsageTable addUsage batch size to 150 (T225500)]] (duration: 00m 56s) [14:22:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:55] T225500: Decrease EntityUsageTable addUsage batch size to 100 - https://phabricator.wikimedia.org/T225500 [14:23:59] (03CR) 10Jhedden: [C: 03+2] wiki replicas: unfilter deleted rev_len versions [puppet] - 10https://gerrit.wikimedia.org/r/515062 (https://phabricator.wikimedia.org/T101631) (owner: 10Jhedden) [14:24:21] (03PS3) 10Jhedden: wiki replicas: unfilter deleted rev_len versions [puppet] - 10https://gerrit.wikimedia.org/r/515062 (https://phabricator.wikimedia.org/T101631) [14:24:31] (03CR) 10Jhedden: [V: 03+2 C: 03+2] wiki replicas: unfilter deleted rev_len versions [puppet] - 10https://gerrit.wikimedia.org/r/515062 (https://phabricator.wikimedia.org/T101631) (owner: 10Jhedden) [14:28:17] !log end of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=testwikidatawiki --batch-size=100 --sleep=3 (T225052) [14:28:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:22] T225052: Run Property Terms Rebuild script - https://phabricator.wikimedia.org/T225052 [14:30:06] (03CR) 10Thcipriani: [V: 03+2 C: 03+2] Merge remote-tracking branch 'upstream/v2.15.14' into wmf/stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/516727 (owner: 10Paladox) [14:34:18] 10Operations, 10Traffic, 10docker-pkg, 10serviceops: Getting registry metadata from a public client fails on our registry - https://phabricator.wikimedia.org/T220085 (10fsero) works for me ` >>> import docker >>> client = docker.from_env(version='auto') >>> print(client.images.get_registry_data('quay.io... [14:34:21] (03PS1) 10Thcipriani: Gerrit v2.15.14 [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/518039 [14:34:52] (03CR) 10Paladox: [C: 03+2] Gerrit v2.15.14 [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/518039 (owner: 10Thcipriani) [14:35:42] RECOVERY - Device not healthy -SMART- on db2058 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2058&var-datasource=codfw+prometheus/ops [14:35:52] PROBLEM - Nginx local proxy to apache on mw1341 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.010 second response time https://wikitech.wikimedia.org/wiki/Application_servers [14:36:55] !log T101631 updating replica views on labsdb1012 [14:37:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:00] T101631: rev_len should be available also for deleted revisions in database replicas - https://phabricator.wikimedia.org/T101631 [14:37:18] RECOVERY - Nginx local proxy to apache on mw1341 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Application_servers [14:37:23] (03PS2) 10Gergő Tisza: Add .well-known/matrix for wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) [14:40:05] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes some pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10PM3) +1, everything running smoothly now, including API queries. [14:40:55] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10PM3) [14:41:22] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [14:41:35] !log akosiaris@puppetmaster1001 conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=eventgate-main,name=kubernetes2001.codfw.wmnet [14:41:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:51] 10Operations, 10DNS, 10Matrix, 10Traffic, and 2 others: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 (10Tgr) >>! In T223835#5271300, @Joe wrote: > @tgr just to be sure, you just want the url https://wikimedia.org/.well_known to be served from... [14:45:44] PROBLEM - Check systemd state on stat1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:46:22] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks for reverting this. The cleanup is appreciated" [dns] - 10https://gerrit.wikimedia.org/r/516056 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [14:46:26] (03PS2) 10Alexandros Kosiaris: Revert "Matrix wikimedia.org IDs domain authorization" [dns] - 10https://gerrit.wikimedia.org/r/516056 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [14:47:37] !log T101631 updating replica views on labsdb1011 [14:47:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:42] T101631: rev_len should be available also for deleted revisions in database replicas - https://phabricator.wikimedia.org/T101631 [14:52:29] (03CR) 10Krinkle: "All canonical domains or just wikimedia.org?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [14:52:40] PROBLEM - puppet last run on stat1007 is CRITICAL: CRITICAL: Puppet has 38 failures. Last run 3 minutes ago with 38 failures. Failed resources (up to 3 shown): Package[tzdata],Package[apport],Package[command-not-found],Package[command-not-found-data] [14:54:48] !log T101631 updating replica views on labsdb1010 [14:54:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:54] T101631: rev_len should be available also for deleted revisions in database replicas - https://phabricator.wikimedia.org/T101631 [14:55:14] 10Operations, 10Dumps-Generation, 10SDC General, 10Wikidata: Capacity planning for Commons Structured Data - https://phabricator.wikimedia.org/T226093 (10MarkTraceur) @ArielGlenn https://grafana.wikimedia.org/d/000000175/wikidata-datamodel-statements?refresh=30m&panelId=4&fullscreen&orgId=1 <-- average sta... [14:55:17] (03CR) 10Gergő Tisza: "Just wikimedia.org. This will be used by Matrix software looking up usernames / room names / etc. matching *:wikimedia.org. There is no ne" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [14:56:34] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: name=kubernetes2005.* [14:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:45] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: name=kubernetes2006.* [14:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:23] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: name=kubernetes1006.* [14:57:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:33] !log akosiaris@puppetmaster1001 conftool action : set/pooled=yes; selector: name=kubernetes1005.* [14:57:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:41] !log enable puppet on jobrunners [14:57:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:11] !log make sure all kubernetes hosts (except kubernetes2001 which is used to investigate some outgoing packet discards) are pooled and with the exact same weight [14:58:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:26] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Release-Engineering-Team (Kanban): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10thcipriani) >>! In T207707#5239428, @hashar wrote: > The new disks can be shown as sdc and s... [15:00:28] (03PS1) 10Esanders: Enable new mobile contexts on bn/fa/hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518051 (https://phabricator.wikimedia.org/T221314) [15:01:47] !log T101631 updating replica views on labsdb1009 [15:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:01:53] T101631: rev_len should be available also for deleted revisions in database replicas - https://phabricator.wikimedia.org/T101631 [15:03:16] !log Repool mw1311 [15:03:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:57] !log Rolling restart php-fpm on jobrunners to pick up new opcache settings - 518023 [15:06:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:47] 10Operations, 10Continuous-Integration-Infrastructure, 10serviceops, 10Release-Engineering-Team (Kanban): contint1001 store docker images on separate partition or disk - https://phabricator.wikimedia.org/T207707 (10hashar) From a discussion with @mmodell, we might need an extra partition soonish as well.... [15:06:50] 10Operations, 10ops-codfw, 10Cloud-Services: rack/setup codfw: cloudbackup2001.codfw.wmnet and cloudbackup2002.codfw.wmnet - https://phabricator.wikimedia.org/T224528 (10Papaul) PowerEdge Virtual Disk 0: RAID1, 223GB, Ready Virtual Disk 1: RAID6, 106.918TB, Read... [15:07:21] 10Operations, 10ops-codfw, 10Cloud-Services: rack/setup codfw: cloudbackup2001.codfw.wmnet and cloudbackup2002.codfw.wmnet - https://phabricator.wikimedia.org/T224528 (10Papaul) [15:10:59] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): Move cloudvirt hosts to 10Gb ethernet - https://phabricator.wikimedia.org/T216195 (10Andrew) [15:11:20] 10Operations, 10ops-eqiad, 10DC-Ops, 10Epic, 10cloud-services-team (Kanban): relocate/reimage cloudvirt1014 with 10G interfaces - https://phabricator.wikimedia.org/T226188 (10Andrew) a:03Cmjohnson [15:11:42] RECOVERY - Check systemd state on stat1004 is OK: OK - running: The system is fully operational [15:12:01] what's up with stat1004 it has been flapping all the time today? [15:13:06] elukey: ^ [15:18:11] marostegui: I think it is the prometheus exporter, didn't have the time to investigate [15:18:35] yeah, I don't think it is a big deal, I was surprised it has been like that all day [15:18:40] you want me to create a task? [15:19:46] RECOVERY - puppet last run on stat1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [15:21:37] marostegui: probably not needed, somebody needs to take a look [15:21:50] jijiki: ^ maybe? [15:24:51] (03CR) 10Reedy: [C: 03+1] "Looks better to me now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [15:26:53] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) 05Open→03Resolved The RAID is back to Optimal! ` root@db2058:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337DC560) Port Name: 1I... [15:27:01] marostegui: I'll do it later on, I think that it is not super urgent atm [15:27:02] PROBLEM - HHVM rendering on mw1282 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1309 bytes in 0.003 second response time https://wikitech.wikimedia.org/wiki/Application_servers [15:27:24] PROBLEM - Apache HTTP on mw1282 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers [15:27:48] 10Operations, 10ops-eqiad, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) @RobH if you add the production DNS entries, I can take care of the installations myself [15:28:30] RECOVERY - HHVM rendering on mw1282 is OK: HTTP OK: HTTP/1.1 200 OK - 74706 bytes in 0.495 second response time https://wikitech.wikimedia.org/wiki/Application_servers [15:28:38] 10Operations, 10Traffic, 10docker-pkg, 10serviceops: Getting registry metadata from a public client fails on our registry - https://phabricator.wikimedia.org/T220085 (10Joe) >>! In T220085#5271335, @fsero wrote: > works for me using python 2.7 and docker==3.7.2 > > ` >>>> import docker >>>> client = dock... [15:28:50] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.044 second response time https://wikitech.wikimedia.org/wiki/Application_servers [15:31:39] (03PS1) 10Elukey: role::analytics_test_cluster::coordinator: use HTTP-oozie.keytab [puppet] - 10https://gerrit.wikimedia.org/r/518058 [15:32:07] (03CR) 10Elukey: [C: 03+2] role::analytics_test_cluster::coordinator: use HTTP-oozie.keytab [puppet] - 10https://gerrit.wikimedia.org/r/518058 (owner: 10Elukey) [15:33:56] RECOVERY - HP RAID on db2058 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [15:34:38] marostegui: look about? [15:35:03] jijiki: stat1004, I'll take care of it don't worry [15:35:16] (the prometheus exporter has been flapping today) [15:36:21] oh sorry [15:36:42] marostegui: stat1004 is elukey's pet [15:37:06] 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install (3) new osd ceph nodes - https://phabricator.wikimedia.org/T224188 (10Bstorm) @ayounsi Ceph docs are vague at best or tend to ask you to read dissertations eventually. Overall, everything comes back to "test it... [15:39:45] (03PS1) 10RobH: setting production dns for new dbproxy systems [dns] - 10https://gerrit.wikimedia.org/r/518059 (https://phabricator.wikimedia.org/T225704) [15:39:57] jijiki: I wouldn't call it in that way :D [15:40:06] (03CR) 10jerkins-bot: [V: 04-1] setting production dns for new dbproxy systems [dns] - 10https://gerrit.wikimedia.org/r/518059 (https://phabricator.wikimedia.org/T225704) (owner: 10RobH) [15:40:13] I want to spare you the agony of debugging it :D [15:40:32] elukey: it was a secret affair ? stats1003 had no idea? [15:40:42] !log krinkle@deploy1001: pull down 98399b1032a0 to wmf.10 (test-only change) [15:40:46] this is turning into a soap opera [15:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:22] (03PS2) 10RobH: setting production dns for new dbproxy systems [dns] - 10https://gerrit.wikimedia.org/r/518059 (https://phabricator.wikimedia.org/T225704) [15:41:58] (03CR) 10RobH: [C: 03+2] setting production dns for new dbproxy systems [dns] - 10https://gerrit.wikimedia.org/r/518059 (https://phabricator.wikimedia.org/T225704) (owner: 10RobH) [15:43:29] 10Operations, 10ops-eqiad, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10RobH) [15:43:50] 10Operations, 10ops-eqiad, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10RobH) a:05RobH→03Marostegui Assigned to @Marostegui per irc sync up (dns records are live.) [15:44:00] 10Operations, 10DBA: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10RobH) [15:45:50] ACKNOWLEDGEMENT - HP RAID on db2043 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:3 - Controller: OK - Battery/Capacitor: OK nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T226194 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering [15:45:53] 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T226194 (10ops-monitoring-bot) [15:51:46] Amir1: done with deploy? [15:59:22] (03CR) 10Andrew Bogott: [C: 03+1] "This is clearly better!" [puppet] - 10https://gerrit.wikimedia.org/r/511686 (owner: 10Jbond) [15:59:35] 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T226194 (10Marostegui) 05Open→03Declined Duplicate of T225889 [16:00:04] godog and _joe_: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Puppet SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190620T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:03:56] (03PS1) 10Ema: Add debian/patches/0031-vbt-close-stolen.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518062 [16:03:59] (03PS1) 10Ema: Add debian/patches/0032-vbe_dir_finish-no-VBT_Wait.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518063 [16:04:01] (03PS1) 10Ema: Add debian/patches/0033-recycled-honor-first_byte_timeout.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518064 [16:04:03] (03PS1) 10Ema: Add debian/patches/0034-r02135.vtc-fixes.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518065 [16:08:21] (03CR) 10jerkins-bot: [V: 04-1] Add debian/patches/0033-recycled-honor-first_byte_timeout.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/518064 (owner: 10Ema) [16:13:36] * Krinkle deploys https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/518013/ [16:15:41] (03PS1) 10Elukey: role::analytics_test_cluster::hadoop::master: enable kerberos for secrets [puppet] - 10https://gerrit.wikimedia.org/r/518068 [16:16:00] (03PS1) 10Mforns: analytics::refinery::job::data_purge add deletion for data_quality_hourly [puppet] - 10https://gerrit.wikimedia.org/r/518069 (https://phabricator.wikimedia.org/T215863) [16:16:34] !log scb1001 is producing 120,000 errors per minute as of 16:09 UTC minute ago (under 500/min before that) [16:16:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:16:41] (03CR) 10Elukey: [C: 03+2] role::analytics_test_cluster::hadoop::master: enable kerberos for secrets [puppet] - 10https://gerrit.wikimedia.org/r/518068 (owner: 10Elukey) [16:16:42] mobrovac: ^ [16:16:55] (03CR) 10jerkins-bot: [V: 04-1] analytics::refinery::job::data_purge add deletion for data_quality_hourly [puppet] - 10https://gerrit.wikimedia.org/r/518069 (https://phabricator.wikimedia.org/T215863) (owner: 10Mforns) [16:17:10] euh? [16:17:13] looking Krinkle [16:18:12] (03CR) 10CRusnov: "I shall merge this unless there are objections. If there are nitpicks i can address them in a further patch if desired." [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/513003 (https://phabricator.wikimedia.org/T216469) (owner: 10CRusnov) [16:18:22] (03PS1) 10Niharika29: Deploy partial blocks on hewikivoyage on community request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518073 (https://phabricator.wikimedia.org/T218626) [16:19:08] !log krinkle@deploy1001 Synchronized php-1.34.0-wmf.10/includes/specials/pagers/ImageListPager.php: T226102 / 294500d6e1d70b2 (duration: 00m 58s) [16:19:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:14] T226102: BadMethodCallException on Commons: /includes/specials/pagers/ImageListPager.php: Call to a member function getUrl() on a non-object (boolean) - https://phabricator.wikimedia.org/T226102 [16:19:19] (03PS2) 10Mforns: analytics::refinery::job::data_purge add deletion for data_quality_hourly [puppet] - 10https://gerrit.wikimedia.org/r/518069 (https://phabricator.wikimedia.org/T215863) [16:20:35] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10RhinosF1) >>! In T226109#5271297, @Agusbou2015 wrote: >>>! In T226109#5271108, @Reedy wrote: >>>>! In T226109#5271093, @Agusbou2015 wrote: >>> When will... [16:20:41] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10RhinosF1) >>! In T226109#5271297, @Agusbou2015 wrote: >>>! In T226109#5271108, @Reedy wrote: >>>>! In T226109#5271093, @Agusbou2015 wrote: >>> When will... [16:21:01] (03PS1) 10Arturo Borrero Gonzalez: toolforge: k8s: etcd: use domain names instead of IP addresses [puppet] - 10https://gerrit.wikimedia.org/r/518075 (https://phabricator.wikimedia.org/T226098) [16:22:52] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Krinkle) >>! In T226109#5271093, @Agusbou2015 wrote: > When will the fix be ready? For your information, if you are still experiencing issues today, pl... [16:24:51] (03PS1) 10Ottomata: Revert page-properties-change stream back to eventlogging-service-eventbus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518076 [16:25:14] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10mmodell) Given that this is a train blocker, 12 days isn't acceptable. [16:26:09] (03CR) 10Ottomata: [C: 03+2] Revert page-properties-change stream back to eventlogging-service-eventbus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518076 (owner: 10Ottomata) [16:26:28] (03CR) 10jenkins-bot: Revert page-properties-change stream back to eventlogging-service-eventbus [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518076 (owner: 10Ottomata) [16:27:57] !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Revert page-properties-change back to eventbus, new schema does not work with change prop (duration: 00m 55s) [16:28:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:14] 10Operations, 10DC-Ops, 10Traffic: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) Ok, in checking, EQIAD seems to enter its PEAK usage around 20:00 GMT (so about a half an hour from now at 10:00 Pacific.) I'll pull both the 'show chassis power' on cr1 an... [16:31:06] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Aklapper) >>! In T226109#5271628, @RhinosF1 wrote: > They will be. Based on the last metrics it takes ~12 days for a UBN to be fixed. [offtopic] No, "~... [16:37:32] !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: ACTUALLY Revert page-properties-change back to eventbus, new schema does not work with change prop (duration: 00m 57s) [16:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:58] (03PS1) 10Bstorm: cloudstore: add sudo config for the nagios user [puppet] - 10https://gerrit.wikimedia.org/r/518079 (https://phabricator.wikimedia.org/T225265) [16:47:47] (03PS2) 10Bstorm: cloudstore: add sudo config for the nagios user [puppet] - 10https://gerrit.wikimedia.org/r/518079 (https://phabricator.wikimedia.org/T225265) [16:52:11] 10Operations, 10DC-Ops, 10Traffic: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) commands to run: juniper: show chassis power dell (via idrac): racadm getsensorinfo [16:53:06] !log otto@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Revert page-properties-change back to eventbus, new schema does not work with change prop - deploy take 3 (duration: 00m 56s) [16:53:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:15] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) And the disk failed again [16:58:52] (03CR) 10Jhedden: [C: 03+2] cloudstore: add sudo config for the nagios user [puppet] - 10https://gerrit.wikimedia.org/r/518079 (https://phabricator.wikimedia.org/T225265) (owner: 10Bstorm) [17:00:04] cscott, arlolra, subbu, and halfak: I, the Bot under the Fountain, allow thee, The Deployer, to do Services – Graphoid / Parsoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190620T1700). [17:06:17] 10Operations, 10DC-Ops, 10Traffic: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) Power data: Power drawn live @ approximately 10:00-10:10 AM Pacific: cr1-eqiad: System: Zone 0: Capacity: 4100 W (maximum 4100 W) Allocated power:... [17:13:00] 10Operations, 10Commons, 10Wikimedia-Site-requests, 10media-storage, 10User-Urbanecm: Server-side upload request for Hurtigruten minutt for minutt videos - https://phabricator.wikimedia.org/T223052 (10Urbanecm) Even worse failure... ` urbanecm@mwmaint1002:~$ mwscript importImages.php --wiki=commonswiki... [17:14:58] 10Operations, 10DC-Ops, 10Traffic: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) so for the QFX5100 (thanks @papaul) the command is: ` show chassis environment pem ` The 10G switches are PEM 2/4/7, so I'll just include them all: asw2-b-eqiad: ` FPC... [17:18:46] Hey folks. I'm onboarding a new engineer. What's the process for getting his public ssh key in the right spot these days? [17:21:53] right spot for what? :) [17:22:48] !log mholloway-shell@deploy1001 Started deploy [recommendation-api/deploy@7dc63ab]: Deploy Suggested Edits endpoints (T209997, T224233) [17:22:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:22:55] T224233: Enhance the existing article description suggested edit APIs to use the new approach used by the image caption suggested edit APIs - https://phabricator.wikimedia.org/T224233 [17:22:56] T209997: Create a new API endpoint which returns Commons images in need of a caption or caption translation - https://phabricator.wikimedia.org/T209997 [17:25:43] !log mholloway-shell@deploy1001 Finished deploy [recommendation-api/deploy@7dc63ab]: Deploy Suggested Edits endpoints (T209997, T224233) (duration: 02m 55s) [17:25:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:45] !log arlolra@deploy1001 Started deploy [parsoid/deploy@1084a7b]: Updating Parsoid to 4fa8d01 [17:27:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:29:54] halfak: https://wikitech.wikimedia.org/wiki/Production_shell_access#New_users [17:30:03] :) [17:31:59] 10Operations, 10SRE-Access-Requests: Requesting access to deployment hosts for Andy Craze - https://phabricator.wikimedia.org/T226204 (10ACraze) [17:34:03] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@1084a7b]: Updating Parsoid to 4fa8d01 (duration: 06m 17s) [17:34:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:31] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@fd98900]: Deploy media-list endpoint (T225443) and service template upgrade to v0.7.0 [17:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:37] T225443: Media endpoint does not refresh structured captions - https://phabricator.wikimedia.org/T225443 [17:43:09] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@fd98900]: Deploy media-list endpoint (T225443) and service template upgrade to v0.7.0 (duration: 05m 38s) [17:43:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:17] T225443: Media endpoint does not refresh structured captions - https://phabricator.wikimedia.org/T225443 [17:49:13] 10Operations, 10SRE-Access-Requests: Requesting access to deployment hosts for Andy Craze - https://phabricator.wikimedia.org/T226204 (10Halfak) I support this request. Andy will be working with us on #ORES and other #scoring-platform-team stuff. We ran into this issue today when I was walking Andy through o... [18:00:04] MaxSem, RoanKattouw, and Niharika: #bothumor I � Unicode. All rise for Morning SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190620T1800). [18:00:04] MatmaRex, tgr, kostajh, and Niharika: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:13] \o [18:00:16] hello [18:02:02] o/ [18:02:51] !log Updated Parsoid to 4fa8d01 (T211251) [18:02:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:02:57] T211251: Cannot read property 'nodeName' of undefined - https://phabricator.wikimedia.org/T211251 [18:03:03] (03PS1) 10Elukey: Add cdh::systemd_timer [puppet/cdh] - 10https://gerrit.wikimedia.org/r/518097 (https://phabricator.wikimedia.org/T212259) [18:04:14] JDI time I guess [18:04:19] I can swat [18:04:36] (03PS4) 10ArielGlenn: refactor wikidata entity dumps into wikibase + wikidata specific bits [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) [18:05:07] (03PS2) 10Gergő Tisza: Centralize enwiki's VisualEditor feedback page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517924 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [18:05:25] (03PS3) 10Gergő Tisza: Ensure no lossy WTE→VE switching in public wikis (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516567 (owner: 10Bartosz Dziewoński) [18:05:44] thanks [18:05:44] MatmaRex: can the two chained config patches go together? [18:06:02] tgr: yeah [18:07:20] oops, did I just break that chain? [18:07:33] (03PS4) 10Gergő Tisza: Ensure no lossy WTE→VE switching in public wikis (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516567 (owner: 10Bartosz Dziewoński) [18:07:47] (03CR) 10Gergő Tisza: [C: 03+2] Centralize enwiki's VisualEditor feedback page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517924 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [18:08:02] (03CR) 10Gergő Tisza: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516567 (owner: 10Bartosz Dziewoński) [18:08:11] (03CR) 10Gergő Tisza: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517924 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [18:08:48] (03Merged) 10jenkins-bot: Centralize enwiki's VisualEditor feedback page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517924 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [18:09:00] (03Merged) 10jenkins-bot: Ensure no lossy WTE→VE switching in public wikis (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516567 (owner: 10Bartosz Dziewoński) [18:09:04] (03CR) 10jenkins-bot: Centralize enwiki's VisualEditor feedback page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517924 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [18:10:46] (03CR) 10Nuria: analytics::refinery::job::data_purge add deletion for data_quality_hourly (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/518069 (https://phabricator.wikimedia.org/T215863) (owner: 10Mforns) [18:11:02] (03PS1) 10Krinkle: mediawiki: Use HTTPS for /nl-portal and /be-portal redirects [puppet] - 10https://gerrit.wikimedia.org/r/518099 [18:11:04] (03CR) 10jenkins-bot: Ensure no lossy WTE→VE switching in public wikis (no-op) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516567 (owner: 10Bartosz Dziewoński) [18:12:08] MatmaRex: on mwdebug1002 [18:12:34] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:13:11] tgr: both seem good [18:14:36] (03PS2) 10Gergő Tisza: Enable new mobile contexts on bn/fa/hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518051 (https://phabricator.wikimedia.org/T221314) (owner: 10Esanders) [18:15:10] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:517924|Centralize enwikis VisualEditor feedback page (T224851)]] (duration: 00m 57s) [18:15:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:15] T224851: Please centralize enwiki's feedback for VisualEditor - https://phabricator.wikimedia.org/T224851 [18:16:06] (03CR) 10Gergő Tisza: [C: 03+2] Enable new mobile contexts on bn/fa/hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518051 (https://phabricator.wikimedia.org/T221314) (owner: 10Esanders) [18:16:53] !log tgr@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:516567|Ensure no lossy WTE→VE switching in public wikis (no-op)]] (duration: 00m 58s) [18:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:03] (03Merged) 10jenkins-bot: Enable new mobile contexts on bn/fa/hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518051 (https://phabricator.wikimedia.org/T221314) (owner: 10Esanders) [18:17:18] (03CR) 10jenkins-bot: Enable new mobile contexts on bn/fa/hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518051 (https://phabricator.wikimedia.org/T221314) (owner: 10Esanders) [18:18:37] MatmaRex: first two live, third on mwdebug1002 [18:20:40] tgr: hmm [18:21:04] tgr: i just realized that we're running wmf.8 still rather than wmf.10 on those wikis. [18:21:15] tgr: so i guess the change is a no-op until wmf.10 goes live [18:22:26] ok [18:22:47] i think this is fine [18:22:48] thanks! [18:23:07] (03PS3) 10Gergő Tisza: Add .well-known/matrix for wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) [18:24:15] (03CR) 10Gergő Tisza: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [18:25:03] (03Merged) 10jenkins-bot: Add .well-known/matrix for wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [18:26:19] (03CR) 10jenkins-bot: Add .well-known/matrix for wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516055 (https://phabricator.wikimedia.org/T223835) (owner: 10Gergő Tisza) [18:29:00] (03PS2) 10Gergő Tisza: Deploy partial blocks on hewikivoyage on community request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518073 (https://phabricator.wikimedia.org/T218626) (owner: 10Niharika29) [18:29:17] (03PS5) 10ArielGlenn: refactor wikidata entity dumps into wikibase + wikidata specific bits [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) [18:29:31] !log tgr@deploy1001 Synchronized docroot/wwwportal/.well-known/: SWAT: [[gerrit:516055|Add .well-known/matrix for wikimedia.org (Bug: T223835)]] (duration: 00m 57s) [18:29:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:29:35] T223835: Configure wikimedia.org to enable *:wikimedia.org Matrix user IDs - https://phabricator.wikimedia.org/T223835 [18:29:58] tgr: You might want to purgeList the urls incase you or others have hit them and a 404s been cached [18:30:48] yeah, I just realized I should not have tested them pre-deploy [18:31:56] 10Operations, 10DC-Ops, 10Traffic: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) [18:32:07] echo "https://url" | mwscript purgeList.php --wiki=aawiki [18:32:33] :) [18:32:43] please do poke me when you are done with SWAT. I will restart Jenkins [18:33:14] Niharika: around for SWAT? [18:33:34] 10Operations, 10DC-Ops, 10Traffic: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) [18:36:23] 10Operations, 10DC-Ops, 10Traffic: poll power data for redeployment of esams/knams - https://phabricator.wikimedia.org/T225720 (10RobH) Updated from irc chat and @bblack. Peak eqiad time is actually 01:30 GMT (18:30 Pacific). For total Linux hosts: 9x lvs/misc/ganeti type nodes, 16x cache nodes. Basically... [18:37:31] kostajh: on mwdebug1002 [18:37:37] tgr: looking [18:37:38] tgr: Right here. [18:39:57] (03CR) 10Gergő Tisza: [C: 03+2] Deploy partial blocks on hewikivoyage on community request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518073 (https://phabricator.wikimedia.org/T218626) (owner: 10Niharika29) [18:40:55] (03Merged) 10jenkins-bot: Deploy partial blocks on hewikivoyage on community request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518073 (https://phabricator.wikimedia.org/T218626) (owner: 10Niharika29) [18:41:09] (03CR) 10jenkins-bot: Deploy partial blocks on hewikivoyage on community request [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518073 (https://phabricator.wikimedia.org/T218626) (owner: 10Niharika29) [18:41:50] kostajh: group2 wikis are still on wmf.8, FWIW [18:41:59] tgr: yep [18:43:34] tgr: looks good! [18:43:44] tgr: Looks like group1 wikis are also on wmf.8. Did the train not run yesterday? [18:45:33] seems like there was a rollback due to the job queue loss issue? [18:47:04] (03CR) 10Ottomata: "Hm, I'm trying to remember why we made the cdh:exec be in the cdh module in the first place. Looking at it now, it doesn't have anything " [puppet/cdh] - 10https://gerrit.wikimedia.org/r/518097 (https://phabricator.wikimedia.org/T212259) (owner: 10Elukey) [18:47:06] !log tgr@deploy1001 Synchronized php-1.34.0-wmf.10/extensions/GrowthExperiments/extension.json: SWAT: [[gerrit:518047|HomepageModule: Use newer schema with start module name (Bug: T222836)]] (duration: 00m 58s) [18:47:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:11] T222836: Mobile Homepage: instrumentation - https://phabricator.wikimedia.org/T222836 [18:48:04] 10Operations, 10SRE-Access-Requests: Access Q re maint1002 - https://phabricator.wikimedia.org/T225253 (10Iflorez) Hello, Any tips and direction are appreciated. I tried the above on Stat6 and also tried suggestions from https://wikitech.wikimedia.org/wiki/Analytics/Data_access unsuccessfully. My next step is... [18:48:45] Niharika: on mwdebug1002 [18:50:07] tgr: Looks good. [18:52:45] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:518073|Deploy partial blocks on hewikivoyage on community request (Bug: T218626)]] (duration: 00m 58s) [18:52:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:52:51] T218626: [Epic] Partial block rollout - https://phabricator.wikimedia.org/T218626 [18:52:57] Thanks tgr! [18:53:06] hashar: all yours [18:53:12] cool :) [18:54:00] !log upgrading and restarting jenkins [18:54:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:35] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Agusbou2015) Any progress here? [18:55:47] (03PS1) 10Smalyshev: Remove BETA from RDF dump filenames [puppet] - 10https://gerrit.wikimedia.org/r/518108 (https://phabricator.wikimedia.org/T226153) [18:57:22] (03PS6) 10ArielGlenn: refactor wikidata entity dumps into wikibase + wikidata specific bits [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) [18:58:00] was easy [18:58:45] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Krinkle) >>! In T226109#5272141, @Agusbou2015 wrote: > Any progress here? >>! In T226109#5271636, @Krinkle wrote: >>>! In T226109#5271093, @Agusbou2015... [18:59:04] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Aklapper) @Agusbou2015: Please stop posting unhelpful repetitive messages. Thanks. [19:00:04] twentyafterfour: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki train - American version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190620T1900). [19:01:40] (03PS1) 10Kosta Harlan: Betalabs: Enable GrowthExperiments features for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518109 (https://phabricator.wikimedia.org/T226205) [19:05:30] (03PS7) 10ArielGlenn: refactor wikidata entity dumps into wikibase + wikidata specific bits [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) [19:09:51] * apergos raises an eyebrow [19:11:52] twentyafterfour: do you know what is the status of the train? [19:11:57] has anyone worked out the blockers yet? [19:12:03] (or is working them out?) [19:12:45] twentyafterfour: i'm asking because we want to enable some VE things that only exist in wmf.10 today. if the train is still blocked, then i'll probably be backporting them individually. [19:13:54] (03CR) 10Jforrester: "This won't work, because the StructuredDiscussions code has been removed from enwiki (even though it's meant to be live on all prod wikis)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517924 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [19:15:37] job queue issues still unsolved afaik [19:17:27] MatmaRex: train is blocked right now and rolled back [19:18:03] MatmaRex: I can help you with backports if that would be useful [19:20:52] twentyafterfour: i can do it, i just wanted to know if i need to. thanks :) [19:31:33] (03CR) 10Jforrester: "Wonderful to see this dropped." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517871 (https://phabricator.wikimedia.org/T222268) (owner: 10Ottomata) [19:32:33] (03CR) 10Bartosz Dziewoński: "argh" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/517924 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [19:33:19] (03PS1) 10Bartosz Dziewoński: Revert "Centralize enwiki's VisualEditor feedback page" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518117 (https://phabricator.wikimedia.org/T224851) [19:35:41] (03PS2) 10Bartosz Dziewoński: Revert "Centralize enwiki's VisualEditor feedback page" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518117 (https://phabricator.wikimedia.org/T224851) [19:35:53] (03PS3) 10Bartosz Dziewoński: Revert "Centralize enwiki's VisualEditor feedback page" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518117 (https://phabricator.wikimedia.org/T224851) [20:00:18] (03PS6) 10EBernhardson: LVS for cloudelastic [puppet] - 10https://gerrit.wikimedia.org/r/512925 (https://phabricator.wikimedia.org/T224324) [20:00:20] (03PS1) 10EBernhardson: Define ferm classes for lvs owned ips [puppet] - 10https://gerrit.wikimedia.org/r/518130 [20:01:17] (03CR) 10jerkins-bot: [V: 04-1] Define ferm classes for lvs owned ips [puppet] - 10https://gerrit.wikimedia.org/r/518130 (owner: 10EBernhardson) [20:02:06] (03CR) 10EBernhardson: LVS for cloudelastic (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/512925 (https://phabricator.wikimedia.org/T224324) (owner: 10EBernhardson) [20:05:34] (03PS2) 10EBernhardson: Define ferm classes for lvs owned ips [puppet] - 10https://gerrit.wikimedia.org/r/518130 [20:06:18] (03CR) 10jerkins-bot: [V: 04-1] Define ferm classes for lvs owned ips [puppet] - 10https://gerrit.wikimedia.org/r/518130 (owner: 10EBernhardson) [20:12:10] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/ codfw: ganeti2009 - ganeti201[0-8] - https://phabricator.wikimedia.org/T224603 (10Papaul) [20:20:41] (03CR) 10Jforrester: [C: 03+2] Revert "Centralize enwiki's VisualEditor feedback page" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518117 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [20:21:42] (03Merged) 10jenkins-bot: Revert "Centralize enwiki's VisualEditor feedback page" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518117 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [20:21:56] (03CR) 10jenkins-bot: Revert "Centralize enwiki's VisualEditor feedback page" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518117 (https://phabricator.wikimedia.org/T224851) (owner: 10Bartosz Dziewoński) [20:23:18] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Revert Centralize enwiki's VisualEditor feedback page T224851 (duration: 00m 59s) [20:23:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:24] T224851: Please centralize enwiki's feedback for VisualEditor - https://phabricator.wikimedia.org/T224851 [20:27:00] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/ codfw: ganeti2009 - ganeti201[0-8] - https://phabricator.wikimedia.org/T224603 (10Papaul) add interface range ganeti in both row C and D row C ` interface-range ganeti { member ge-1/0/19; native-vlan-id 2... [20:37:19] James_F: thanks [20:37:43] MatmaRex: Sorry I didn't see the change until it was live. [20:37:58] Clearly I shouldn't have taken Wednesday off. :-) [20:38:40] James_F: heh. i should have noticed when testing. but i only opened the dialog and checked that the link there goes to the right place, didn't actually try submitting feedback [20:38:53] * James_F nods. [20:39:11] I guess I can ask RoanKattouw how on Earth we can fix this. :-) [20:39:36] (Some wikis are already broken, e.g. meta.) [20:41:00] James_F: i imagine that with some effort we'd be able to load the module from the target wiki rather than the local wiki [20:41:31] unrelatedly: [20:41:39] That sounds like something Security would be Unhappy® about. [20:41:51] can someone confirm that bumping the submodule like this is going to work? https://gerrit.wikimedia.org/r/c/mediawiki/core/+/518141 (i verified that the code is compatible) [20:42:21] MatmaRex: Yeah, it's fine. Want it pushed out now whilst nothing's happening? [20:42:46] (Well, really you should make a merge-commit for wmf.8.) [20:43:55] James_F: hm, i think we could [20:44:03] James_F: also, yeah, probably a good idea [20:44:25] so, i submit a commit to mw/ext/VE branch wmf.8 that merges in the wmf.10 branch of that repo? [20:44:43] (03PS3) 10CDanis: Diff support. [software/conftool] - 10https://gerrit.wikimedia.org/r/515323 [20:44:47] Yes. And then the core wmf.8 patch will appear by magic. [20:44:48] and then i submit a commit to mw/core that bumps the submodule to the new commit (or perhaps this even happens automatically when we merge that?) [20:45:08] No, no need to manually make a core commit, submodules get magic bumps. [20:50:26] James_F: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/518142 [20:53:01] James_F: also https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/VisualEditor/+/518144 (i was going to ignore this, since it's a tiny visual issue, but since it's easy to fix with your proper approach, we might as well) [20:53:19] James_F: if you're busy, i'll schedule these for SWAT, i was planning to do that anyway [21:14:39] Oy, CI, why are you so slow? [21:14:45] lol [21:15:14] Aka "Oi, SRE, screw the opcache issues, let's drop HHVM tonight". :-) [21:15:27] +1 [21:15:38] Yes, it's bad, but so are lots of other things. [21:15:48] Like the cron job restarting our HHVM instances. [21:20:12] MatmaRex: Live on mwdebug1002 for wmf.8 (except the MF change which is still landing). [21:21:10] James_F: seems good [21:21:22] example page: https://bn.m.wikipedia.org/wiki/রামগোপাল_ঘোষ#/editor/1 [21:22:04] OK, will sync. [21:22:59] Some minor i18n is going to be flaky. [21:23:21] I guess I could do a full scap afterwards. It's not like we're going to be able to move the train. :-( [21:23:51] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.8/extensions/VisualEditor/: Pull VisualEditor wmf.8 all the way to wmf.10 (duration: 01m 08s) [21:23:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:24:52] MatmaRex: Should be working? [21:25:20] MatmaRex: Should the config go out now too? [21:25:32] James_F: i don't think the i18 changes were important, we mostly removed messages [21:25:42] James_F: the config already went out? not sure what you mean [21:25:44] Yeah, I was thinking for GuidedTour though. [21:26:34] Sorry, yes, it did. [21:27:46] (03PS3) 10EBernhardson: Define ferm classes for lvs owned ips [puppet] - 10https://gerrit.wikimedia.org/r/518130 [21:28:35] (03CR) 10jerkins-bot: [V: 04-1] Define ferm classes for lvs owned ips [puppet] - 10https://gerrit.wikimedia.org/r/518130 (owner: 10EBernhardson) [21:30:09] James_F: is it supposed to be live? when not using mwdebug, i still see the old version [21:30:41] hmmmmmmmm [21:31:11] MatmaRex: Caching? [21:32:07] MatmaRex: I get it logged in and out in non-debug,. [21:32:57] James_F: i am actually getting wgVisualEditorConfig.enableNewMobileContext==false [21:34:17] Well that's interesting. [21:34:59] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.8/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.MobileArticleTarget.js: Revert 'MobileArticleTarget: Update loading interface for new design' (duration: 00m 57s) [21:35:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:06] quite [21:35:13] i'm trying to figure out how is that possible [21:35:25] It's not an A/B test is it? [21:35:39] (03CR) 10EBernhardson: "This is a plausible approach, but not sure if desired. I could alternatively use this data to define some new values in hieradata/common.y" [puppet] - 10https://gerrit.wikimedia.org/r/518130 (owner: 10EBernhardson) [21:36:04] no [21:36:10] (03CR) 10CDanis: Diff support. (032 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/515323 (owner: 10CDanis) [21:36:34] https://bn.m.wikipedia.org/w/load.php?lang=bn&modules=startup&only=scripts&skin=minerva&target=mobile&debug=true [21:36:36] "enableNewMobileContext": false, [21:36:45] am i looking at the right wiki? am i otherwise crazy? [21:37:42] it's true on mwdebug1002, false without mwdebug [21:37:54] James_F: can you make sure that the config is actually synced? D: [21:38:30] RECOVERY - EDAC syslog messages on db2084 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db2084&var-datasource=codfw+prometheus/ops [21:38:47] Err. [21:40:21] It's definitely set on mwmaint1002's copy. [21:40:45] But there's nothing in the SAL? [21:41:34] * James_F syncs. [21:42:27] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Ensure that wmgVisualEditorEnableNewMobileContext IS part is set on all servers (duration: 00m 59s) [21:42:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:43:44] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: Ensure that wmgVisualEditorEnableNewMobileContext CS part is set on all servers (duration: 00m 59s) [21:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:49] 10Operations, 10ops-eqiad: Degraded RAID on analytics1039 - https://phabricator.wikimedia.org/T226213 (10ops-monitoring-bot) [21:45:53] MatmaRex: OK, it works fine in debug mode. Possibly this is a race condition inside VE's code setting things up/ [21:46:35] James_F: load.php…&modules=startup&debug=true output is now correct, but non-debug still has the wrong value [21:46:47] https://bn.m.wikipedia.org/w/load.php?lang=bn&modules=startup&only=scripts&skin=minerva&target=mobile [21:46:52] "enableNewMobileContext":!1 [21:46:57] where !1 stands for false [21:47:05] i assume this is cached or something? [21:47:41] i will go make a tea and be back in 5 minutes to see if it fixes itself [21:47:43] Ooooh, this might be a Varnish cache of an RL result. [21:49:32] !log Manually purged https://bn.m.wikipedia.org/w/load.php?lang=bn&modules=startup&only=scripts&skin=minerva&target=mobile from Varnish [21:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:47] MatmaRex: That seems to have fixed it. Someone probably loaded it mid-deployment and the bad value stuck in some of the Varnishes. [21:52:07] (03CR) 10CDanis: [C: 03+2] "Riccardo, since you're out for a few more days I'm self-merging. Happy to address any further comments in followup patches!" [software/conftool] - 10https://gerrit.wikimedia.org/r/515323 (owner: 10CDanis) [21:52:41] 10Operations, 10SRE-Access-Requests: Access Q re maint1002 - https://phabricator.wikimedia.org/T225253 (10Krenair) Ah, I see the old analytics-store got removed. Baring in mind eswiki [[ https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/dblists/s7.dblist#3 | would l... [21:54:42] (03Merged) 10jenkins-bot: Diff support. [software/conftool] - 10https://gerrit.wikimedia.org/r/515323 (owner: 10CDanis) [22:00:18] !log jforrester@deploy1001 Started scap: Full scap for new i18n in VisualEditor [22:00:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:01:39] James_F: i'm back. thanks, indeed. that someone might have been me [22:01:57] everything looks great now. thank you! [22:03:07] MatmaRex: Any time. [22:04:57] 10Operations, 10Dumps-Generation, 10SDC General, 10Wikidata: Capacity planning for Commons Structured Data - https://phabricator.wikimedia.org/T226093 (10Ramsey-WMF) @ArielGlenn here's a *tentative* roadmap that provides a high-level view of the SDC work we have planned for the rest of the calendar year. A... [22:05:35] 10Operations, 10SRE-Access-Requests: Access Q re maint1002 - https://phabricator.wikimedia.org/T225253 (10Iflorez) Hello Krenair! I tried the above suggestion on s7 with eswiki, enwiki, and cawiki. I received this message: > Could not open required defaults file: /etc/mysql/conf.d/research-client.cnf > Fatal... [22:08:29] 10Operations, 10SRE-Access-Requests: Access Q re maint1002 - https://phabricator.wikimedia.org/T225253 (10Krenair) enwiki would be s1, eswiki s7, and cawiki s7. Not sure what the problem with the defaults file is - are you able to open the file? It will contain a password so do not paste it here. [22:09:14] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10mobrovac) @Pchelolo and I looked through all the logs and graphs and all the patchsets between `.8` and `.10` for EventBus and MassMessage (in detail) a... [22:12:04] win stick 1 off [22:21:25] PROBLEM - recommendation_api endpoints health on scb2005 is CRITICAL: /{domain}/v1/article/creation/morelike/{seed} (article.creation.morelike - good article title) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [22:21:39] PROBLEM - Nginx local proxy to apache on mw1339 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.010 second response time https://wikitech.wikimedia.org/wiki/Application_servers [22:21:43] PROBLEM - Apache HTTP on mw1339 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Application_servers [22:22:45] RECOVERY - recommendation_api endpoints health on scb2005 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/recommendation_api [22:23:05] RECOVERY - Nginx local proxy to apache on mw1339 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.054 second response time https://wikitech.wikimedia.org/wiki/Application_servers [22:23:11] RECOVERY - Apache HTTP on mw1339 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.076 second response time https://wikitech.wikimedia.org/wiki/Application_servers [22:31:13] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [22:31:19] !log Scap is stuck in scap-cdb-rebuild with one server left to sync. [22:31:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:31:47] !log jforrester@deploy1001 Finished scap: Full scap for new i18n in VisualEditor (duration: 31m 29s) [22:31:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:32:03] Ha, and of course it just fixed itself as soon as I logged that. [22:32:22] Presumably a long-running incoming query. [22:34:05] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [22:37:47] (03PS1) 10CRusnov: Add new dumpbackup.py script [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/518166 [22:38:15] (03CR) 10Smalyshev: [C: 04-1] "Mostly looks good except --help part seems to be missing." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) (owner: 10ArielGlenn) [22:49:19] (03Abandoned) 10Smalyshev: Migrate CirrusSearch to extension.json officially [mediawiki-config] - 10https://gerrit.wikimedia.org/r/514994 (https://phabricator.wikimedia.org/T87892) (owner: 10Smalyshev) [22:50:31] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Reedy) >>! In T226109#5272506, @mobrovac wrote: > @Pchelolo and I looked through all the logs and graphs and all the patchsets between `.8` and `.10` fo... [23:00:04] MaxSem, RoanKattouw, and Niharika: How many deployers does it take to do Evening SWAT (Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190620T2300). [23:00:04] James_F: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:01:01] I'll do it. [23:01:07] (03CR) 10Jforrester: [C: 03+2] Enable TimedMediaHandler's new video player Beta Feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354390 (https://phabricator.wikimedia.org/T148103) (owner: 10Jforrester) [23:01:14] (03PS5) 10Jforrester: Enable TimedMediaHandler's new video player Beta Feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354390 (https://phabricator.wikimedia.org/T148103) [23:01:19] (03CR) 10Jforrester: [C: 03+2] "…" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354390 (https://phabricator.wikimedia.org/T148103) (owner: 10Jforrester) [23:01:53] !log pool maps1003 - node is ready to receive requests - T224395 [23:01:54] !log jforrester@deploy1001 Synchronized php-1.34.0-wmf.8/extensions/TimedMediaHandler/resources/videojs/: Latest VideoJS for T222763 (duration: 00m 59s) [23:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:01:57] T224395: Maps[12]004 /srv disk space is critical - https://phabricator.wikimedia.org/T224395 [23:02:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:02:02] T222763: VideoJS loads subtitles on read rather than play (and once per video instance on-page, rather than aggregated) - https://phabricator.wikimedia.org/T222763 [23:02:16] 10Operations, 10Core Platform Team, 10MassMessage, 10WMF-JobQueue: Jobs not being executed on 1.34.0-wmf.10 - https://phabricator.wikimedia.org/T226109 (10Jdforrester-WMF) [23:02:23] (03Merged) 10jenkins-bot: Enable TimedMediaHandler's new video player Beta Feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354390 (https://phabricator.wikimedia.org/T148103) (owner: 10Jforrester) [23:03:06] (03CR) 10jenkins-bot: Enable TimedMediaHandler's new video player Beta Feature [mediawiki-config] - 10https://gerrit.wikimedia.org/r/354390 (https://phabricator.wikimedia.org/T148103) (owner: 10Jforrester) [23:04:22] (03PS2) 10CRusnov: Add new dumpbackup.py script [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/518166 (https://phabricator.wikimedia.org/T223292) [23:05:35] 10Operations, 10WMF-CTO-Team-Backlog: Migrate "Operations" page folder to "SRE" on officewiki - https://phabricator.wikimedia.org/T226220 (10JAufrecht) [23:06:52] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable TimedMediaHandler's new video player Beta Feature T148103 (duration: 00m 57s) [23:06:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:06:57] T148103: Provide a desktop beta feature of replacing Kaltura player with Video.js - https://phabricator.wikimedia.org/T148103 [23:08:49] OK, SWAT done. [23:16:07] (03Abandoned) 10Reedy: Increase memory limit for Scribunto to 100MB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/511061 (https://phabricator.wikimedia.org/T223737) (owner: 10Reedy) [23:16:30] 10Operations, 10Wikimedia-Site-requests, 10serviceops, 10Performance, 10Performance-Team (Radar): Increase Memory Limit for Scribunto - https://phabricator.wikimedia.org/T223737 (10Reedy) [23:21:21] (03CR) 10Smalyshev: [C: 04-1] refactor wikidata entity dumps into wikibase + wikidata specific bits (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) (owner: 10ArielGlenn) [23:22:57] (03PS2) 10Reedy: Prevent $wgFlaggedRevsNamespaces from having NS listed twice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/516443 (https://phabricator.wikimedia.org/T225276) [23:23:44] (03CR) 10Smalyshev: [C: 04-1] refactor wikidata entity dumps into wikibase + wikidata specific bits (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/517670 (https://phabricator.wikimedia.org/T221917) (owner: 10ArielGlenn) [23:28:42] (03PS1) 10DannyS712: Add "mass-upload" to autopatrollers and patrollers on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518171 (https://phabricator.wikimedia.org/T226217) [23:31:46] (03PS3) 10Smalyshev: Set up dumps for mediainfo RDF generation [puppet] - 10https://gerrit.wikimedia.org/r/516444 (https://phabricator.wikimedia.org/T221917) [23:31:58] (03PS2) 10DannyS712: Add "mass-upload" to autopatrollers and patrollers on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518171 (https://phabricator.wikimedia.org/T226217) [23:32:18] (03CR) 10jerkins-bot: [V: 04-1] Set up dumps for mediainfo RDF generation [puppet] - 10https://gerrit.wikimedia.org/r/516444 (https://phabricator.wikimedia.org/T221917) (owner: 10Smalyshev) [23:32:50] (03CR) 10jerkins-bot: [V: 04-1] Add "mass-upload" to autopatrollers and patrollers on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518171 (https://phabricator.wikimedia.org/T226217) (owner: 10DannyS712) [23:35:29] (03PS4) 10Smalyshev: Set up dumps for mediainfo RDF generation [puppet] - 10https://gerrit.wikimedia.org/r/516444 (https://phabricator.wikimedia.org/T221917) [23:38:02] (03PS3) 10DannyS712: Add "mass-upload" to autopatrollers and patrollers on commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518171 (https://phabricator.wikimedia.org/T226217)