[00:01:59] (03PS2) 10Krinkle: Archive repository [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/440644 [00:02:11] (03CR) 10jerkins-bot: [V: 04-1] Archive repository [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/440644 (owner: 10Krinkle) [00:02:51] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) timed out before a response was received [00:03:17] (03PS3) 10Krinkle: Archive repository [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/440644 [00:03:26] (03CR) 10jerkins-bot: [V: 04-1] Archive repository [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/440644 (owner: 10Krinkle) [00:03:51] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [00:03:52] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) is CRITICAL: Test normal source and target returned the unexpected status 404 (expecting: 200) [00:04:32] PROBLEM - recommendation_api endpoints health on scb2004 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target) is CRITICAL: Test normal source and target returned the unexpected status 404 (expecting: 200) [00:04:52] 10Operations, 10Cleanup: Archive operations/puppet/varnishkafka repository - https://phabricator.wikimedia.org/T197503#4294431 (10Krinkle) [00:04:59] (03PS4) 10Krinkle: Archive repository [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/440644 (https://phabricator.wikimedia.org/T197503) [00:05:09] (03CR) 10jerkins-bot: [V: 04-1] Archive repository [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/440644 (https://phabricator.wikimedia.org/T197503) (owner: 10Krinkle) [00:15:56] !log krinkle@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Ica4cb6644 (duration: 00m 59s) [00:15:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:18:20] (03CR) 10jenkins-bot: logging: Raise minimum level for 'preferences' to INFO [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440553 (owner: 10Krinkle) [01:09:36] (03CR) 10Krinkle: Make mc-labs.php settings more similar to mc.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440643 (owner: 10Aaron Schulz) [01:23:11] PROBLEM - SSH on ms-be1035 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:27:22] RECOVERY - SSH on ms-be1035 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u2 (protocol 2.0) [02:18:51] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [02:22:12] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:54:12] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received [04:55:21] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [05:54:26] Krinkle: thanks for the patch and the task, my next step would have been to figure out the exact procedure for those modules. I thought to just add a big note in readme, but maybe wipe is better? [06:28:32] PROBLEM - puppet last run on aqs1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh] [06:28:41] PROBLEM - puppet last run on mw1323 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/logrotate.d/mediawiki_apache] [06:30:11] PROBLEM - puppet last run on analytics1071 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/gen_fingerprints] [06:32:11] PROBLEM - puppet last run on analytics1061 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/mysql-ps1.sh] [06:57:32] RECOVERY - puppet last run on analytics1061 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:59:01] RECOVERY - puppet last run on aqs1005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:59:11] RECOVERY - puppet last run on mw1323 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:00:32] RECOVERY - puppet last run on analytics1071 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:35:28] (03CR) 10Aaron Schulz: Make mc-labs.php settings more similar to mc.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440643 (owner: 10Aaron Schulz) [08:30:01] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) is CRITICAL: Test Respond file not found for a nonexistent title returned the unexpected status 503 (expecting: 404) [08:31:01] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy [09:19:02] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [09:19:22] PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received [09:20:22] RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy [09:21:13] (03PS1) 10星耀晨曦: Enable zh-my variant on zhwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440651 (https://phabricator.wikimedia.org/T193983) [09:22:21] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:33:41] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received [09:34:42] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [09:35:21] (03PS1) 10Urbanecm: Remove BN alias for NS_USER on dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440652 (https://phabricator.wikimedia.org/T196905) [10:11:01] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200): /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a [10:11:01] ved [10:12:01] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [10:29:41] PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received [10:30:41] RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy [10:34:02] PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received [10:35:11] RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy [11:22:11] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 3 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4294644 (10akosiaris) Hello, >>! In T186748#4282998, @pmiazga wrote: > @akosiaris - I did couple checks, looks like there are couple errors... [11:33:09] (03PS1) 10Urbanecm: Change logo resources for ruwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440655 (https://phabricator.wikimedia.org/T197508) [11:33:11] (03PS1) 10Urbanecm: Use uploaded HD logos for ruwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440656 (https://phabricator.wikimedia.org/T197508) [11:42:21] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 503 (expecting: 200) [11:43:22] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [12:07:36] (03PS1) 10Volans: Packages details page: don't wrap badges [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440657 (https://phabricator.wikimedia.org/T167504) [12:07:49] (03PS1) 10Volans: Query optimizations [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440659 (https://phabricator.wikimedia.org/T191299) [12:08:42] (03CR) 10jerkins-bot: [V: 04-1] Packages details page: don't wrap badges [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440657 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [12:09:12] (03CR) 10jerkins-bot: [V: 04-1] Query optimizations [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440659 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [13:19:12] !log reindexing Croatian wikis on elastic@codfw (T196658) [13:19:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:14] T196658: Re-index Croatian, Serbo-Croatian, and Bosnian Wikis - https://phabricator.wikimedia.org/T196658 [13:48:51] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) is CRITICAL: Test Respond file not found for a nonexistent title returned the unexpected status 503 (expecting: 404) [13:49:52] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [14:19:22] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [14:22:42] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:12:31] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) timed out before a response was received [15:13:31] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [17:19:22] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [17:22:41] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:28:41] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) timed out before a response was received [17:33:01] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [18:38:32] PROBLEM - Apache HTTP on mw2144 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:39:31] RECOVERY - Apache HTTP on mw2144 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.150 second response time [19:24:36] !log reindexing Croatian wikis on elastic@eqiad (T196658) [19:24:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:38] T196658: Re-index Croatian, Serbo-Croatian, and Bosnian Wikis - https://phabricator.wikimedia.org/T196658 [19:48:51] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [19:52:02] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:05:41] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) is CRITICAL: Test Respond file not found for a nonexistent title returned the unexpected status 503 (expecting: 404) [20:06:42] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [20:24:47] (03PS4) 10Gergő Tisza: Add techadmin to privileged groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421122 (https://phabricator.wikimedia.org/T190015) [20:24:49] (03PS5) 10Gergő Tisza: Temporarily preserve sysops' JS editing ability [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421123 (https://phabricator.wikimedia.org/T190015) [20:24:51] (03PS6) 10Gergő Tisza: Remove sitewide and user CSS/JS editing from old groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421124 (https://phabricator.wikimedia.org/T190015) [20:24:53] (03PS7) 10Gergő Tisza: Enforce that techadmin is the only group that can edit non-own CSS/JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) [20:24:55] (03PS1) 10Gergő Tisza: Configure group management for techadmin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440676 [20:42:49] !log disabled puppet on maps-test2004 for testing new map styles setup [20:42:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:48:51] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [20:52:11] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:31:41] PROBLEM - proton endpoints health on proton2001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) timed out before a response was received [21:33:51] RECOVERY - proton endpoints health on proton2001 is OK: All endpoints are healthy [22:35:18] (03CR) 10Krinkle: [C: 032] Make mc-labs.php settings more similar to mc.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440643 (owner: 10Aaron Schulz) [22:36:43] (03Merged) 10jenkins-bot: Make mc-labs.php settings more similar to mc.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440643 (owner: 10Aaron Schulz) [22:37:04] (03CR) 10Krinkle: "Per T156938, I believe we're going with mcrouter instead, right?" [puppet] - 10https://gerrit.wikimedia.org/r/415789 (owner: 10Aaron Schulz) [22:37:57] (03CR) 10jenkins-bot: Make mc-labs.php settings more similar to mc.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440643 (owner: 10Aaron Schulz) [22:48:41] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [22:51:52] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [22:54:51] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200) [22:56:52] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy [23:48:41] RECOVERY - Check systemd state on kubernetes2003 is OK: OK - running: The system is fully operational [23:52:01] PROBLEM - Check systemd state on kubernetes2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:56:12] PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200) [23:57:21] RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy