[00:44:13] (03CR) 10Hoo man: [C: 031] snapshot: no $hostname checks in node groups, use role [puppet] - 10https://gerrit.wikimedia.org/r/250082 (owner: 10Dzahn) [00:44:16] (03PS1) 10Papaul: Add DNS entries for the CODFW labtest servers Bug:T117107 [dns] - 10https://gerrit.wikimedia.org/r/250164 (https://phabricator.wikimedia.org/T117107) [00:45:27] (03PS2) 10Alex Monk: Add DNS entries for the CODFW labtest servers [dns] - 10https://gerrit.wikimedia.org/r/250164 (https://phabricator.wikimedia.org/T117107) (owner: 10Papaul) [00:50:53] 6operations, 10ops-codfw, 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: on-site tasks for labs deployment cluster - https://phabricator.wikimedia.org/T117107#1770886 (10Papaul) [01:22:54] PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 25.93% of data above the critical threshold [100000000.0] [01:55:03] RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [02:23:04] !log l10nupdate@tin Synchronized php-1.27.0-wmf.4/cache/l10n: l10nupdate for 1.27.0-wmf.4 (duration: 06m 26s) [02:23:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:26:22] !log l10nupdate@tin LocalisationUpdate completed (1.27.0-wmf.4) at 2015-10-31 02:26:22+00:00 [02:26:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:54:18] !log ori@tin Synchronized php-1.27.0-wmf.4/vendor/monolog/monolog/src/Monolog/Logger.php: Iccfda4768 (duration: 00m 19s) [02:54:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:40:55] (03PS1) 10Ori.livneh: Clean up some file_exists checks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250170 [04:14:13] (03PS1) 10Ori.livneh: add /etc/wikimedia-cluster [puppet] - 10https://gerrit.wikimedia.org/r/250172 [04:15:04] (03CR) 10jenkins-bot: [V: 04-1] add /etc/wikimedia-cluster [puppet] - 10https://gerrit.wikimedia.org/r/250172 (owner: 10Ori.livneh) [04:17:04] (03PS2) 10Ori.livneh: add /etc/wikimedia-cluster [puppet] - 10https://gerrit.wikimedia.org/r/250172 [04:21:14] (03PS3) 10Ori.livneh: add /etc/wikimedia-cluster [puppet] - 10https://gerrit.wikimedia.org/r/250172 [04:25:19] (03CR) 10Ori.livneh: [C: 032] add /etc/wikimedia-cluster [puppet] - 10https://gerrit.wikimedia.org/r/250172 (owner: 10Ori.livneh) [04:26:11] (03PS2) 10Ori.livneh: [WIP] cut down on system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250170 [04:46:28] (03PS1) 10Ori.livneh: [WIP] Add clusterRequire [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250173 [05:15:43] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Oct 31 05:15:43 UTC 2015 (duration 15m 42s) [05:15:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:29:27] (03PS3) 10Ori.livneh: [WIP] cut down on system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250170 [05:31:33] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 35, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/0/0: down - Peering: ! Equinix Chicago (SR 17915277) {#11374} [10Gbps DF]BR [05:32:18] (03PS4) 10Ori.livneh: [WIP] cut down on system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250170 [05:36:54] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 37, down: 0, dormant: 0, excluded: 0, unused: 0 [05:38:58] (03PS5) 10Ori.livneh: Cut down on system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250170 [05:49:22] (03CR) 10Ori.livneh: [C: 032] "no-op." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250170 (owner: 10Ori.livneh) [05:49:28] (03Merged) 10jenkins-bot: Cut down on system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250170 (owner: 10Ori.livneh) [05:57:12] PROBLEM - YARN NodeManager Node-State on analytics1032 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:58:52] RECOVERY - YARN NodeManager Node-State on analytics1032 is OK: OK: YARN NodeManager analytics1032.eqiad.wmnet:8041 Node-State: RUNNING [06:10:33] PROBLEM - puppet last run on db2017 is CRITICAL: CRITICAL: Puppet has 1 failures [06:26:53] PROBLEM - Disk space on elastic1008 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=95%) [06:30:56] PROBLEM - puppet last run on mw2024 is CRITICAL: CRITICAL: puppet fail [06:31:13] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:13] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:22] PROBLEM - puppet last run on chromium is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:33] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:53] PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:02] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:13] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:13] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:25] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:33] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:46] PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:47] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:53] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:14] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:34] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:42] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:42] RECOVERY - puppet last run on db2017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:45] PROBLEM - puppet last run on cp2016 is CRITICAL: CRITICAL: puppet fail [06:55:53] RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:56:02] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:56:03] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:56:03] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:56:13] RECOVERY - puppet last run on chromium is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:56:22] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:56:52] RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:12] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:13] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:23] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:57:32] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:57:43] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:58:23] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:33] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:42] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:44] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:33] RECOVERY - puppet last run on mw2024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:15:33] RECOVERY - puppet last run on cp2016 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [08:15:04] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [500.0] [08:27:33] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:57:45] !log elastic1008 deleting /var/log/elasticsearch/production-search-eqiad_index_indexing_slowlog.log.[2-7] [08:57:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:58:32] RECOVERY - Disk space on elastic1008 is OK: DISK OK [09:04:43] PROBLEM - Kafka Broker Replica Max Lag on kafka1018 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [5000000.0] [09:06:13] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [5000000.0] [09:09:26] 6operations, 10Math: Install texlive-extra-utils on mw appservers - https://phabricator.wikimedia.org/T109195#1771045 (10Physikerwelt) [09:13:13] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [5000000.0] [09:15:22] RECOVERY - Kafka Broker Replica Max Lag on kafka1018 is OK: OK: Less than 1.00% above the threshold [1000000.0] [09:16:52] RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 1.00% above the threshold [1000000.0] [09:18:43] PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: puppet fail [09:45:12] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 629 [09:45:32] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:50:12] RECOVERY - check_mysql on db1008 is OK: Uptime: 8528540 Threads: 2 Questions: 99749173 Slow queries: 57846 Opens: 98414 Flush tables: 2 Open tables: 64 Queries per second avg: 11.695 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [11:03:02] PROBLEM - puppet last run on mw2125 is CRITICAL: CRITICAL: puppet fail [11:31:42] RECOVERY - puppet last run on mw2125 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [12:32:04] 6operations, 7Easy, 5Patch-For-Review: server admin log should include year in date (again) - https://phabricator.wikimedia.org/T85803#1771201 (10Aklapper) @Elee: Any news here? Are you still working on this (as you're set as assignee)? [12:56:32] PROBLEM - puppet last run on mw2106 is CRITICAL: CRITICAL: Puppet has 1 failures [13:21:42] RECOVERY - puppet last run on mw2106 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [13:38:12] PROBLEM - puppet last run on es2001 is CRITICAL: CRITICAL: puppet fail [13:50:42] PROBLEM - puppet last run on mw2030 is CRITICAL: CRITICAL: puppet fail [14:05:12] RECOVERY - puppet last run on es2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [14:19:32] RECOVERY - puppet last run on mw2030 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:39:41] matt_flaschen, thank you for being clear about order in your swat entry for monday [15:41:01] (03CR) 10Alex Monk: "Krinkle: Ping" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/248634 (owner: 10Alex Monk) [15:49:11] 6operations, 6Phabricator, 7Mail: DomainKeys Identified Mail (DKIM) for phabricator.wikimedia.org - https://phabricator.wikimedia.org/T116805#1771324 (10greg) [17:14:43] (03CR) 10Alex Monk: "Ping" [puppet] - 10https://gerrit.wikimedia.org/r/227327 (https://phabricator.wikimedia.org/T114161) (owner: 10Alex Monk) [17:15:04] (03CR) 10Alex Monk: "Ping" [puppet] - 10https://gerrit.wikimedia.org/r/243357 (owner: 10Alex Monk) [17:35:58] (03PS1) 10Alex Monk: phabricator: Subscribe myself to the project changes cron [puppet] - 10https://gerrit.wikimedia.org/r/250205 [18:08:18] (03PS1) 10Ori.livneh: Eliminate some more system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250208 [18:09:03] (03CR) 10Ori.livneh: [C: 032] Eliminate some more system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250208 (owner: 10Ori.livneh) [18:09:11] (03Merged) 10jenkins-bot: Eliminate some more system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250208 (owner: 10Ori.livneh) [18:09:43] RECOVERY - Restbase endpoints health on cerium is OK: All endpoints are healthy [18:09:52] RECOVERY - Restbase endpoints health on praseodymium is OK: All endpoints are healthy [18:09:52] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [18:10:03] RECOVERY - Restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [18:11:02] Krenair: are you sure that mail's to-addr parameter can take a comma-separated list of addresses? [18:11:55] it goes into the 'mail' command [18:12:04] yes [18:12:13] hence are you sure that mail's to-addr parameter can take a comma-separated list of addresses? [18:12:51] http://www.binarytides.com/linux-mail-command-examples/ #5 [18:13:06] I haven't tried it myself though [18:13:20] that's good enough [18:15:03] PROBLEM - Restbase endpoints health on cerium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [18:15:04] PROBLEM - Restbase endpoints health on praseodymium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [18:15:04] PROBLEM - Restbase endpoints health on xenon is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [18:15:23] PROBLEM - Restbase endpoints health on restbase-test2003 is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [18:16:45] ori, originally I thought maybe to use spaces, but then I found that and changed it commas [18:17:16] (03PS1) 10Ori.livneh: phabricator::logmail: support an array of recipient addresses [puppet] - 10https://gerrit.wikimedia.org/r/250210 [18:18:01] (03CR) 10Ori.livneh: [C: 032 V: 032] phabricator::logmail: support an array of recipient addresses [puppet] - 10https://gerrit.wikimedia.org/r/250210 (owner: 10Ori.livneh) [18:18:27] (03PS2) 10Ori.livneh: phabricator: Subscribe myself to the project changes cron [puppet] - 10https://gerrit.wikimedia.org/r/250205 (owner: 10Alex Monk) [18:18:33] RECOVERY - Restbase endpoints health on cerium is OK: All endpoints are healthy [18:18:42] RECOVERY - Restbase endpoints health on praseodymium is OK: All endpoints are healthy [18:18:42] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [18:18:53] thanks ori [18:18:54] RECOVERY - Restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [18:20:08] (03PS3) 10Ori.livneh: phabricator: Subscribe myself (=Krenair) to the project changes cron [puppet] - 10https://gerrit.wikimedia.org/r/250205 (owner: 10Alex Monk) [18:20:19] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1771492 (10RobLa-WMF) Could this issue (T706) get renamed, and the description updated? Suggested name: Requests for addition to the #proj... [18:20:31] (03CR) 10Ori.livneh: [C: 032 V: 032] phabricator: Subscribe myself (=Krenair) to the project changes cron [puppet] - 10https://gerrit.wikimedia.org/r/250205 (owner: 10Alex Monk) [18:58:45] 6operations, 7Database, 5Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#1771518 (10JanZerebecki) >>! In T111654#1770311, @jcrespo wrote: > * We hit a recent bug by which [[ https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=788905 | mysql and mariadb has hardc... [19:07:13] PROBLEM - Restbase endpoints health on praseodymium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [19:07:13] PROBLEM - Restbase endpoints health on xenon is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [19:07:13] PROBLEM - Restbase endpoints health on cerium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [19:07:32] PROBLEM - Restbase endpoints health on restbase-test2003 is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [20:01:02] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [20:01:02] RECOVERY - Restbase endpoints health on praseodymium is OK: All endpoints are healthy [20:01:22] RECOVERY - Restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [20:04:06] ACKNOWLEDGEMENT - Restbase endpoints health on cerium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) gwicke Related to https://phabricator.wikimedia.org/T116911, and isolated to sta [20:04:06] ACKNOWLEDGEMENT - Restbase endpoints health on praseodymium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) gwicke Related to https://phabricator.wikimedia.org/T116911, and isolated [20:04:06] ACKNOWLEDGEMENT - Restbase endpoints health on xenon is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) gwicke Related to https://phabricator.wikimedia.org/T116911, and isolated to stag [20:06:42] PROBLEM - Restbase endpoints health on restbase-test2003 is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [20:11:52] RECOVERY - Restbase endpoints health on cerium is OK: All endpoints are healthy [20:12:03] RECOVERY - Restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [20:18:53] PROBLEM - Restbase endpoints health on praseodymium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [20:18:54] PROBLEM - Restbase endpoints health on xenon is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [20:18:54] PROBLEM - Restbase endpoints health on cerium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [20:19:13] PROBLEM - Restbase endpoints health on restbase-test2003 is CRITICAL: /page/mobile-html/{title} is CRITICAL: Could not fetch url http://127.0.0.1:7231/en.wikipedia.org/v1/page/mobile-html/Main_Page: Generic connection error: (Received response with content-encoding: gzip, but failed to decode it., error(Error -3 while decompressing: incorrect header check,)) [21:05:53] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: puppet fail [21:32:52] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:25] (03CR) 10BryanDavis: Fix getMWScriptWithArgs() user error message (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/249803 (owner: 10Aaron Schulz) [22:29:52] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [22:29:52] RECOVERY - Restbase endpoints health on praseodymium is OK: All endpoints are healthy [22:29:53] RECOVERY - Restbase endpoints health on cerium is OK: All endpoints are healthy [22:32:04] RECOVERY - Restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [22:43:09] (03PS2) 10Ori.livneh: Eliminate still more system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250173 [22:46:00] (03CR) 10Ori.livneh: [C: 032] Eliminate still more system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250173 (owner: 10Ori.livneh) [22:46:07] (03Merged) 10jenkins-bot: Eliminate still more system calls [mediawiki-config] - 10https://gerrit.wikimedia.org/r/250173 (owner: 10Ori.livneh) [23:14:59] (03PS1) 10Ori.livneh: Remove /etc/wikimedia-site and /etc/wikimedia-realm [puppet] - 10https://gerrit.wikimedia.org/r/250283 [23:15:41] (03PS2) 10Ori.livneh: Remove /etc/wikimedia-site and /etc/wikimedia-realm [puppet] - 10https://gerrit.wikimedia.org/r/250283 [23:16:12] (03CR) 10Ori.livneh: [C: 032 V: 032] Remove /etc/wikimedia-site and /etc/wikimedia-realm [puppet] - 10https://gerrit.wikimedia.org/r/250283 (owner: 10Ori.livneh) [23:23:38] (03PS1) 10Ori.livneh: Remove /etc/wikimedia-image-scaler; obsolete as of If792de0112 [puppet] - 10https://gerrit.wikimedia.org/r/250284 [23:24:20] (03CR) 10Ori.livneh: [C: 032 V: 032] Remove /etc/wikimedia-image-scaler; obsolete as of If792de0112 [puppet] - 10https://gerrit.wikimedia.org/r/250284 (owner: 10Ori.livneh) [23:25:12] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [5000000.0] [23:30:42] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 1.00% above the threshold [1000000.0] [23:56:57] (03CR) 10Krinkle: Fix getMWScriptWithArgs() user error message (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/249803 (owner: 10Aaron Schulz) [23:58:10] (03CR) 10Krinkle: Fix getMWScriptWithArgs() user error message (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/249803 (owner: 10Aaron Schulz)