[00:03:32] (03CR) 10Krinkle: [C: 03+1] "Live on beta cluster. Confirmed local static pages now carry CC headers. Whereas links like http://performance-beta.wmflabs.org/arclamp/sv" [puppet] - 10https://gerrit.wikimedia.org/r/499537 (https://phabricator.wikimedia.org/T219417) (owner: 10Gilles) [00:18:02] (03PS12) 10Alex Monk: tlsproxy::localssl: No hardcoding of prod webproxy hostname [puppet] - 10https://gerrit.wikimedia.org/r/500406 [00:39:33] PROBLEM - puppet last run on wtp1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:48:37] (03PS1) 10Mholloway: Revert "Cleanup: Remove obsolete WikimediaEditorTasks beta cluster prefs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501841 (https://phabricator.wikimedia.org/T220153) [00:50:06] (03Abandoned) 10Mholloway: Revert "Cleanup: Remove obsolete WikimediaEditorTasks beta cluster prefs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501841 (https://phabricator.wikimedia.org/T220153) (owner: 10Mholloway) [00:56:56] (03PS1) 10Mholloway: WikimediaEditorTasks: Replace needed Beta Cluster config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501845 (https://phabricator.wikimedia.org/T220153) [00:58:33] (03CR) 10jerkins-bot: [V: 04-1] WikimediaEditorTasks: Replace needed Beta Cluster config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501845 (https://phabricator.wikimedia.org/T220153) (owner: 10Mholloway) [00:59:33] (03PS2) 10Mholloway: WikimediaEditorTasks: Replace needed Beta Cluster config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501845 (https://phabricator.wikimedia.org/T220153) [01:05:57] RECOVERY - puppet last run on wtp1029 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [01:33:18] (03CR) 10Alex Monk: "I'm not sure why but this doesn't quite seem to work." [puppet] - 10https://gerrit.wikimedia.org/r/501587 (https://phabricator.wikimedia.org/T171188) (owner: 10Alex Monk) [01:36:03] 10Operations, 10Puppet, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10Krenair) I've got puppetmaster set up on puppetmaster.cloudinfra.wmflabs.org now, hosted at cloud-puppetmaster-0... [02:25:11] PROBLEM - puppet last run on an-worker1087 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:30:13] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 22099408 and 2 seconds [02:35:23] RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 57896 and 5 seconds [02:39:31] PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 477.59 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [02:40:29] PROBLEM - MariaDB Slave Lag: m3 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 502.43 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [02:47:15] RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 26.70 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [02:48:13] RECOVERY - MariaDB Slave Lag: m3 on db2078 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_slave [02:51:33] RECOVERY - puppet last run on an-worker1087 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [03:57:59] PROBLEM - puppet last run on cp1082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:00:07] (03CR) 10Vgutierrez: [C: 04-1] tlsproxy::localssl: No hardcoding of prod webproxy hostname (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/500406 (owner: 10Alex Monk) [04:20:01] RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1117 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [04:24:25] RECOVERY - puppet last run on cp1082 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [04:48:57] PROBLEM - puppet last run on maps1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:20:41] RECOVERY - puppet last run on maps1002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:32:37] PROBLEM - puppet last run on mc1035 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/redis/redis-common.conf] [06:58:59] RECOVERY - puppet last run on mc1035 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:26:50] (03CR) 10Gilles: [C: 03+1] "I'm completely in favour of having only access to things that are strictly necessary :)" [puppet] - 10https://gerrit.wikimedia.org/r/501578 (https://phabricator.wikimedia.org/T220175) (owner: 10Elukey) [08:32:40] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-Elukey: Requesting ability to scap-deploy on stat1007 for gilles - https://phabricator.wikimedia.org/T220175 (10Gilles) [08:33:08] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-Elukey: Requesting ability to scap-deploy on stat1007 for gilles - https://phabricator.wikimedia.org/T220175 (10Gilles) [08:44:44] (03CR) 10Umherirrender: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/501791 (owner: 10Andrew Bogott) [08:44:51] (03CR) 10Umherirrender: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/499669 (https://phabricator.wikimedia.org/T102367) (owner: 10BryanDavis) [08:45:40] (03CR) 10jerkins-bot: [V: 04-1] site.pp: Make cloudvirt1008 a cloudvirt host [puppet] - 10https://gerrit.wikimedia.org/r/501791 (owner: 10Andrew Bogott) [10:09:59] !log Purging ruwiki namespaces > 0 [10:10:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:33] Does anybody know if MassMessage is badly broken or something? https://meta.wikimedia.org/wiki/Special:Log/massmessage doesn't give error messages and the queue looked empty? [10:44:41] Elitre: 10:42, 6 April 2019 Delivery of "Update on the Affiliate-selected Board seats 2019 process" to User talk:Elitre (WMF) failed with an error code of editconflict ? [10:45:01] or was no message sent at all? [10:45:27] that message took several minutes to show up. [10:45:43] I saw nothing queued, and couldn't figure out why I wasn't getting anything. [10:45:58] I'll attempt actual delivery now and see... TY <3 [10:46:16] but did the other messages sent arrived to their destination? [10:47:07] yup! [10:47:58] :) [10:48:06] JobQueue being lazy on Saturday probably [11:07:37] (03PS5) 10MarcoAurelio: maintain-views: Note explicit exclusion of `oathauth_users` from replicas [puppet] - 10https://gerrit.wikimedia.org/r/496063 (https://phabricator.wikimedia.org/T218165) [11:35:08] (03PS13) 10Alex Monk: tlsproxy::localssl: No hardcoding of prod webproxy hostname [puppet] - 10https://gerrit.wikimedia.org/r/500406 [11:35:10] (03PS1) 10Alex Monk: role::swift::proxy: Move TLS stuff out into profile [puppet] - 10https://gerrit.wikimedia.org/r/501890 [12:32:33] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Evan Prodromou - https://phabricator.wikimedia.org/T220226 (10MarcoAurelio) [13:07:05] PROBLEM - puppet last run on wdqs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:38:45] RECOVERY - puppet last run on wdqs1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [13:59:27] 10Puppet, 10Cloud-Services, 10cloud-services-team (Kanban): Consider ways to make puppetmaster CA changes smoother on the puppet client end - https://phabricator.wikimedia.org/T220268 (10Krenair) [14:11:58] (03PS2) 10Andrew Bogott: site.pp: Make cloudvirt1008 a cloudvirt host [puppet] - 10https://gerrit.wikimedia.org/r/501791 [14:13:18] (03CR) 10Andrew Bogott: [C: 03+2] site.pp: Make cloudvirt1008 a cloudvirt host [puppet] - 10https://gerrit.wikimedia.org/r/501791 (owner: 10Andrew Bogott) [15:49:21] (03PS1) 10Urbanecm: Change arwiki's default user preferences [mediawiki-config] - 10https://gerrit.wikimedia.org/r/501926 (https://phabricator.wikimedia.org/T220186) [15:53:22] PROBLEM - puppet last run on ores1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:12:20] PROBLEM - puppet last run on elastic1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:19:26] RECOVERY - puppet last run on ores1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:43:14] RECOVERY - puppet last run on elastic1052 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [16:55:31] PROBLEM - puppet last run on analytics1056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:21:41] RECOVERY - puppet last run on analytics1056 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:28:19] PROBLEM - puppet last run on cloudvirt1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:33:45] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) The https://rocm.github.io/ROCmInstall.html module lists among... [17:39:23] RECOVERY - EDAC syslog messages on wtp2013 is OK: (C)4 ge (W)2 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2013&var-datasource=codfw+prometheus/ops [17:41:07] RECOVERY - Memory correctable errors -EDAC- on wtp2013 is OK: (C)4 ge (W)2 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2013&var-datasource=codfw+prometheus/ops [17:59:57] RECOVERY - puppet last run on cloudvirt1017 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:13:43] (03PS13) 10Ammarpad: Enable blocking feature of AbuseFilter in zh.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486103 (https://phabricator.wikimedia.org/T210364) [18:14:51] (03PS15) 10Ammarpad: Add 'Author' namespace in Sanskrit Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486221 (https://phabricator.wikimedia.org/T214553) [18:32:29] (03CR) 10Zoranzoki21: [C: 03+1] "poke Jenkins" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486103 (https://phabricator.wikimedia.org/T210364) (owner: 10Ammarpad) [19:18:02] (03PS1) 10Andrew Bogott: cloudvirt1008: experimental partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/501949 [19:19:58] (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt1008: experimental partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/501949 (owner: 10Andrew Bogott) [19:41:03] (03PS1) 10Andrew Bogott: cloudvirt1008: Update device name for new partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/501956 [19:41:46] (03CR) 10Andrew Bogott: [C: 03+2] cloudvirt1008: Update device name for new partman recipe [puppet] - 10https://gerrit.wikimedia.org/r/501956 (owner: 10Andrew Bogott) [20:22:00] PROBLEM - puppet last run on elastic1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:53:12] RECOVERY - puppet last run on elastic1036 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [21:46:30] PROBLEM - puppet last run on db1081 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:49:47] ^ our 503 error friend, runs fine fwiw [21:51:44] RECOVERY - puppet last run on db1081 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [22:26:53] 10Operations, 10Gerrit, 10serviceops, 10Release-Engineering-Team (Backlog): Deploy multi-site plugin to cobalt and gerrit2001 - https://phabricator.wikimedia.org/T217174 (10greg) [23:32:00] 10Puppet, 10Cloud-Services, 10cloud-services-team (Kanban): Consider ways to make puppetmaster CA changes smoother on the puppet client end - https://phabricator.wikimedia.org/T220268 (10Krenair) [23:50:12] PROBLEM - Check systemd state on ms-be1037 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:50:42] PROBLEM - puppet last run on db1076 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues