[00:25:56] PROBLEM - puppet last run on es2002 is CRITICAL: CRITICAL: puppet fail [00:29:23] (03PS2) 10Reedy: Remove no longer needed hook handlers from Bug54847 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/184136 (owner: 10Hoo man) [00:31:54] (03PS2) 10Reedy: Parameter type hints for Bug54847.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/184141 [00:43:46] RECOVERY - puppet last run on es2002 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [01:19:01] (03CR) 10Yurik: [C: 031] Remove FlaggedRevs config on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/184133 (https://phabricator.wikimedia.org/T86443) (owner: 10Glaisher) [02:11:09] !log l10nupdate Synchronized php-1.25wmf13/cache/l10n: (no message) (duration: 00m 02s) [02:11:13] !log LocalisationUpdate completed (1.25wmf13) at 2015-01-11 02:11:13+00:00 [02:11:20] Logged the message, Master [02:11:23] Logged the message, Master [02:18:07] !log l10nupdate Synchronized php-1.25wmf14/cache/l10n: (no message) (duration: 00m 04s) [02:18:11] !log LocalisationUpdate completed (1.25wmf14) at 2015-01-11 02:18:10+00:00 [02:18:12] Logged the message, Master [02:18:15] Logged the message, Master [03:10:23] (03CR) 10Alex Monk: "Am guessing this is abandoned?" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/90565 (https://bugzilla.wikimedia.org/53969) (owner: 10Aklapper) [03:10:37] (03CR) 10Alex Monk: "Am guessing this is abandoned?" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/106864 (https://bugzilla.wikimedia.org/55536) (owner: 1001tonythomas) [03:10:54] (03CR) 10Alex Monk: "Am guessing this is abandoned?" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/106883 (https://bugzilla.wikimedia.org/50376) (owner: 10Tinaj1234) [03:21:25] 3Wikimedia-Stream, operations: stream.wikimedia.org: Uneven distribution of client connections on backends - https://phabricator.wikimedia.org/T69957#968884 (10Krenair) [03:23:42] 3Wikimedia-Stream, operations: stream.wikimedia.org throws websocket.WebSocketException: Handshake Status 502 Bad Gateway - https://phabricator.wikimedia.org/T68989#968886 (10Krenair) [03:33:37] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: Puppet has 1 failures [03:33:37] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Puppet has 1 failures [03:34:06] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: Puppet has 1 failures [03:34:16] PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: Puppet has 1 failures [03:34:18] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: Puppet has 1 failures [03:34:37] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Puppet has 1 failures [03:36:27] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: Puppet has 1 failures [03:37:37] PROBLEM - puppet last run on mw1121 is CRITICAL: CRITICAL: Puppet has 1 failures [03:42:57] PROBLEM - puppet last run on mw1064 is CRITICAL: CRITICAL: Puppet has 1 failures [03:43:17] PROBLEM - puppet last run on mw1107 is CRITICAL: CRITICAL: Puppet has 1 failures [03:44:57] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: Puppet has 1 failures [03:46:37] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: Puppet has 1 failures [03:54:26] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: Puppet has 1 failures [03:55:26] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Puppet has 1 failures [03:56:46] PROBLEM - puppet last run on mw1136 is CRITICAL: CRITICAL: Puppet has 1 failures [03:59:07] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [03:59:17] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [03:59:17] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [03:59:17] RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [03:59:26] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Jan 11 03:59:26 UTC 2015 (duration 59m 25s) [03:59:34] Logged the message, Master [03:59:38] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [03:59:56] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [03:59:57] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [04:00:06] RECOVERY - puppet last run on mw1107 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [04:00:06] PROBLEM - puppet last run on elastic1015 is CRITICAL: CRITICAL: Puppet has 1 failures [04:00:26] RECOVERY - puppet last run on mw1136 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [04:00:27] RECOVERY - puppet last run on mw1121 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [04:00:37] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:00:57] RECOVERY - puppet last run on mw1064 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:02:07] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:02:17] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [04:02:27] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [04:02:47] PROBLEM - puppet last run on mw1055 is CRITICAL: CRITICAL: Puppet has 1 failures [04:03:56] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [04:06:06] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [04:06:58] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [04:07:36] RECOVERY - puppet last run on mw1055 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [04:07:37] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:09:36] RECOVERY - puppet last run on elastic1015 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [04:32:37] (03CR) 10Rillke: [C: 031] "Looks fine to me. Thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/184121 (https://phabricator.wikimedia.org/T86313) (owner: 10Steinsplitter) [04:49:36] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [500.0] [04:50:07] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Puppet has 1 failures [04:52:27] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Puppet has 1 failures [05:05:07] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [05:07:57] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [05:09:07] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:22:36] PROBLEM - SSH on sodium is CRITICAL: Server answer: [06:23:46] RECOVERY - SSH on sodium is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7.1 (protocol 2.0) [06:28:56] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:56] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:17] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:17] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:46] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:46:38] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:46:56] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:46:57] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:46:57] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [09:00:27] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [09:10:47] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [10:53:07] PROBLEM - Varnishkafka Delivery Errors per minute on cp3003 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [20000.0] [10:59:17] RECOVERY - Varnishkafka Delivery Errors per minute on cp3003 is OK: OK: Less than 1.00% above the threshold [0.0] [11:45:46] (03PS1) 10KartikMistry: Use cxserver/deploy in deployment [puppet] - 10https://gerrit.wikimedia.org/r/184217 [11:58:56] PROBLEM - Varnishkafka Delivery Errors per minute on cp3005 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [20000.0] [12:06:07] RECOVERY - Varnishkafka Delivery Errors per minute on cp3005 is OK: OK: Less than 1.00% above the threshold [0.0] [12:17:47] PROBLEM - HHVM busy threads on mw1126 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [86.4] [12:55:37] PROBLEM - puppet last run on mw1077 is CRITICAL: CRITICAL: Puppet has 1 failures [13:01:37] PROBLEM - Varnishkafka Delivery Errors per minute on cp3005 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [13:10:06] RECOVERY - Varnishkafka Delivery Errors per minute on cp3005 is OK: OK: Less than 1.00% above the threshold [0.0] [13:13:36] RECOVERY - puppet last run on mw1077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:41:17] PROBLEM - Varnishkafka Delivery Errors per minute on cp3008 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [13:48:27] RECOVERY - Varnishkafka Delivery Errors per minute on cp3008 is OK: OK: Less than 1.00% above the threshold [0.0] [14:13:16] PROBLEM - Varnishkafka Delivery Errors per minute on cp3017 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [20000.0] [14:21:47] RECOVERY - Varnishkafka Delivery Errors per minute on cp3017 is OK: OK: Less than 1.00% above the threshold [0.0] [14:45:00] (03PS1) 10QChris: Sync Hive generated TSVs to stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/184223 [14:51:53] (03CR) 10QChris: Sync Hive generated TSVs to stat1002 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/184223 (owner: 10QChris) [14:56:08] (03Abandoned) 10Aklapper: Add support for Trello URLs to Bugzilla's 'See Also' field. [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/90565 (https://bugzilla.wikimedia.org/53969) (owner: 10Aklapper) [14:56:17] (03Abandoned) 10Aklapper: Gave Bugzilla's "Save Changes" button style. [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/106864 (https://bugzilla.wikimedia.org/55536) (owner: 1001tonythomas) [14:56:42] (03Abandoned) 10Aklapper: Error message for invalid login changed. [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/106883 (https://bugzilla.wikimedia.org/50376) (owner: 10Tinaj1234) [15:12:56] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [500.0] [15:26:08] (03CR) 10Vogone: "Just a little correction, 'editinterface' is used globally on every WMF wiki being also the name of the global interface editor group. Thu" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/183056 (https://phabricator.wikimedia.org/T85713) (owner: 10Glaisher) [15:28:36] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [15:50:26] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [15:55:36] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [16:05:20] (03PS4) 10Gage: Strongswan: IPsec Puppet module [puppet] - 10https://gerrit.wikimedia.org/r/181742 [16:07:14] (03PS5) 10Gage: Strongswan: IPsec Puppet module [puppet] - 10https://gerrit.wikimedia.org/r/181742 [16:08:20] (03CR) 10Alex Monk: "I think we can just do it now..." [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/107410 (owner: 10Aklapper) [16:08:46] !log springle Synchronized wmf-config/db-eqiad.php: depool db1050, mysqld got TERM somehow (duration: 00m 05s) [16:08:53] Logged the message, Master [16:12:16] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail [16:29:50] !log springle Synchronized wmf-config/db-eqiad.php: repool db1050, warm up (duration: 00m 05s) [16:29:57] Logged the message, Master [16:30:17] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [16:42:56] 3operations: puppet stopped mysqld using orphan pid file from puppet agent - https://phabricator.wikimedia.org/T86482#969256 (10Springle) 3NEW a:3Springle [16:44:37] !log db1050 dberror log noise was https://phabricator.wikimedia.org/T86482 [16:44:42] Logged the message, Master [16:55:36] PROBLEM - Varnishkafka Delivery Errors per minute on cp3010 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [17:00:07] PROBLEM - Varnishkafka Delivery Errors per minute on cp3004 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [20000.0] [17:02:57] RECOVERY - Varnishkafka Delivery Errors per minute on cp3010 is OK: OK: Less than 1.00% above the threshold [0.0] [17:03:17] PROBLEM - Varnishkafka Delivery Errors per minute on cp3018 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [20000.0] [17:05:46] PROBLEM - Varnishkafka Delivery Errors per minute on cp3006 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [17:07:26] RECOVERY - Varnishkafka Delivery Errors per minute on cp3004 is OK: OK: Less than 1.00% above the threshold [0.0] [17:10:48] (03CR) 10Mark Bergsma: "A few comments inline." (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/181742 (owner: 10Gage) [17:17:57] PROBLEM - Varnishkafka Delivery Errors per minute on cp3004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [20000.0] [17:19:17] PROBLEM - Varnishkafka Delivery Errors per minute on cp3007 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [20000.0] [17:21:16] RECOVERY - Varnishkafka Delivery Errors per minute on cp3018 is OK: OK: Less than 1.00% above the threshold [0.0] [17:22:27] RECOVERY - Varnishkafka Delivery Errors per minute on cp3006 is OK: OK: Less than 1.00% above the threshold [0.0] [17:23:46] PROBLEM - Varnishkafka Delivery Errors per minute on cp3016 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [20000.0] [17:25:08] RECOVERY - Varnishkafka Delivery Errors per minute on cp3004 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:07] PROBLEM - Varnishkafka Delivery Errors on cp3016 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 556.266663 [17:29:57] RECOVERY - Varnishkafka Delivery Errors per minute on cp3007 is OK: OK: Less than 1.00% above the threshold [0.0] [17:30:18] RECOVERY - Varnishkafka Delivery Errors on cp3016 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [17:34:17] RECOVERY - Varnishkafka Delivery Errors per minute on cp3016 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:46] PROBLEM - Varnishkafka Delivery Errors per minute on cp3018 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [20000.0] [17:40:17] PROBLEM - Varnishkafka Delivery Errors on cp3018 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 712.424988 [17:47:47] PROBLEM - Varnishkafka Delivery Errors per minute on cp3007 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [17:49:47] RECOVERY - Varnishkafka Delivery Errors on cp3018 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [17:53:16] RECOVERY - Varnishkafka Delivery Errors per minute on cp3018 is OK: OK: Less than 1.00% above the threshold [0.0] [17:54:56] RECOVERY - Varnishkafka Delivery Errors per minute on cp3007 is OK: OK: Less than 1.00% above the threshold [0.0] [18:39:37] PROBLEM - nutcracker port on mw1229 is CRITICAL: Cannot assign requested address [18:40:56] RECOVERY - nutcracker port on mw1229 is OK: TCP OK - 0.000 second response time on port 11212 [18:50:57] PROBLEM - Varnishkafka Delivery Errors per minute on cp3007 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [20000.0] [18:59:27] RECOVERY - Varnishkafka Delivery Errors per minute on cp3007 is OK: OK: Less than 1.00% above the threshold [0.0] [19:30:17] PROBLEM - Varnishkafka Delivery Errors per minute on cp3003 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [20000.0] [19:36:07] RECOVERY - Varnishkafka Delivery Errors per minute on cp3003 is OK: OK: Less than 1.00% above the threshold [0.0] [19:37:58] PROBLEM - Varnishkafka Delivery Errors per minute on cp3018 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [20000.0] [19:48:38] RECOVERY - Varnishkafka Delivery Errors per minute on cp3018 is OK: OK: Less than 1.00% above the threshold [0.0] [19:49:18] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [19:51:37] PROBLEM - Varnishkafka Delivery Errors per minute on cp3010 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [20000.0] [19:51:37] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [19:51:37] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Puppet has 1 failures [19:52:27] PROBLEM - Varnishkafka Delivery Errors per minute on cp3008 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [20000.0] [19:55:38] PROBLEM - Varnishkafka Delivery Errors per minute on cp3018 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [20000.0] [19:58:17] PROBLEM - Varnishkafka Delivery Errors per minute on cp3005 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [20000.0] [20:00:37] PROBLEM - Varnishkafka Delivery Errors per minute on cp3006 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [20000.0] [20:00:47] RECOVERY - Varnishkafka Delivery Errors per minute on cp3008 is OK: OK: Less than 1.00% above the threshold [0.0] [20:01:07] RECOVERY - Varnishkafka Delivery Errors per minute on cp3010 is OK: OK: Less than 1.00% above the threshold [0.0] [20:03:37] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [20:03:47] PROBLEM - Varnishkafka Delivery Errors on cp3006 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 483.25 [20:03:57] PROBLEM - Varnishkafka Delivery Errors per minute on cp3017 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [20:05:17] PROBLEM - Varnishkafka Delivery Errors per minute on cp3009 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [20000.0] [20:06:27] RECOVERY - Varnishkafka Delivery Errors per minute on cp3018 is OK: OK: Less than 1.00% above the threshold [0.0] [20:06:47] RECOVERY - Varnishkafka Delivery Errors per minute on cp3005 is OK: OK: Less than 1.00% above the threshold [0.0] [20:06:57] RECOVERY - Varnishkafka Delivery Errors on cp3006 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:07:07] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [20:07:26] Hi, is it possible to set throttle exception ASAP for hewiki per https://phabricator.wikimedia.org/T85773?(there are less than 24 hours before the edithon) [20:08:17] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [20:09:01] PROBLEM - Varnishkafka Delivery Errors per minute on cp3016 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [20000.0] [20:09:41] PROBLEM - Varnishkafka Delivery Errors on cp3017 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 656.06665 [20:10:10] greg-g, see above. you want sam to deploy it sometime between now (8pm) and 7am local time? [20:10:40] Krenair: I directed him to somebody else per https://phabricator.wikimedia.org/T85773#966029 [20:10:44] But either way [20:11:11] oh [20:11:23] same person lol but I assume he saw it on phab [20:11:30] Reedy, ^ [20:14:18] PROBLEM - Varnishkafka Delivery Errors per minute on cp3015 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [20:16:21] PROBLEM - Varnishkafka Delivery Errors per minute on cp3004 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [20000.0] [20:16:27] RECOVERY - Varnishkafka Delivery Errors per minute on cp3009 is OK: OK: Less than 1.00% above the threshold [0.0] [20:19:48] PROBLEM - Varnishkafka Delivery Errors per minute on cp3005 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [20000.0] [20:20:52] PROBLEM - Varnishkafka Delivery Errors per minute on cp3016 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [20000.0] [20:20:58] PROBLEM - Varnishkafka Delivery Errors per minute on cp3008 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [20:21:25] RECOVERY - Varnishkafka Delivery Errors per minute on cp3015 is OK: OK: Less than 1.00% above the threshold [0.0] [20:22:17] RECOVERY - Varnishkafka Delivery Errors on cp3017 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:23:06] PROBLEM - Varnishkafka Delivery Errors on cp3003 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 422.475006 [20:23:07] PROBLEM - Varnishkafka Delivery Errors per minute on cp3018 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [20000.0] [20:24:48] PROBLEM - Varnishkafka Delivery Errors per minute on cp3004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [20:25:37] PROBLEM - Varnishkafka Delivery Errors on cp3018 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1741.849976 [20:26:16] RECOVERY - Varnishkafka Delivery Errors on cp3003 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:27:37] RECOVERY - Varnishkafka Delivery Errors per minute on cp3017 is OK: OK: Less than 1.00% above the threshold [0.0] [20:27:47] RECOVERY - Varnishkafka Delivery Errors per minute on cp3006 is OK: OK: Less than 1.00% above the threshold [0.0] [20:28:49] RECOVERY - Varnishkafka Delivery Errors on cp3018 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [20:29:16] RECOVERY - Varnishkafka Delivery Errors per minute on cp3008 is OK: OK: Less than 1.00% above the threshold [0.0] [20:29:17] RECOVERY - Varnishkafka Delivery Errors per minute on cp3005 is OK: OK: Less than 1.00% above the threshold [0.0] [20:31:27] RECOVERY - Varnishkafka Delivery Errors per minute on cp3016 is OK: OK: Less than 1.00% above the threshold [0.0] [20:34:06] RECOVERY - Varnishkafka Delivery Errors per minute on cp3004 is OK: OK: Less than 1.00% above the threshold [0.0] [20:43:22] RECOVERY - Varnishkafka Delivery Errors per minute on cp3018 is OK: OK: Less than 1.00% above the threshold [0.0] [20:49:48] PROBLEM - Varnishkafka Delivery Errors per minute on cp3003 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [20000.0] [20:51:29] PROBLEM - Varnishkafka Delivery Errors per minute on cp3018 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [20000.0] [20:55:37] PROBLEM - Graphite Carbon on graphite1002 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [20:55:37] PROBLEM - Varnishkafka Delivery Errors per minute on cp3010 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [20:57:00] RECOVERY - Varnishkafka Delivery Errors per minute on cp3003 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:58] PROBLEM - Varnishkafka Delivery Errors per minute on cp3005 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [20:59:16] PROBLEM - Varnishkafka Delivery Errors per minute on cp3015 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [20000.0] [20:59:56] PROBLEM - Varnishkafka Delivery Errors per minute on cp3006 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [20000.0] [21:00:07] PROBLEM - Varnishkafka Delivery Errors per minute on cp3008 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [20000.0] [21:01:36] PROBLEM - Varnishkafka Delivery Errors per minute on cp3004 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [20000.0] [21:02:17] RECOVERY - Varnishkafka Delivery Errors per minute on cp3018 is OK: OK: Less than 1.00% above the threshold [0.0] [21:04:07] PROBLEM - Varnishkafka Delivery Errors on cp3006 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 773.233337 [21:04:17] PROBLEM - Varnishkafka Delivery Errors on cp3004 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 660.275024 [21:05:08] RECOVERY - Varnishkafka Delivery Errors per minute on cp3015 is OK: OK: Less than 1.00% above the threshold [0.0] [21:05:08] RECOVERY - Varnishkafka Delivery Errors per minute on cp3010 is OK: OK: Less than 1.00% above the threshold [0.0] [21:06:07] RECOVERY - Varnishkafka Delivery Errors per minute on cp3005 is OK: OK: Less than 1.00% above the threshold [0.0] [21:07:17] RECOVERY - Varnishkafka Delivery Errors on cp3006 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:10:37] RECOVERY - Varnishkafka Delivery Errors on cp3004 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [21:14:07] RECOVERY - Varnishkafka Delivery Errors per minute on cp3006 is OK: OK: Less than 1.00% above the threshold [0.0] [21:20:17] RECOVERY - Varnishkafka Delivery Errors per minute on cp3008 is OK: OK: Less than 1.00% above the threshold [0.0] [21:23:37] PROBLEM - Varnishkafka Delivery Errors per minute on cp3009 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [21:23:39] gods [21:23:52] am i the only person who can never get gerrit to work [21:24:03] i'm trying to clone mediawiki/extensions/ConfirmAccount [21:24:05] it's tiny [21:24:12] it just time dout for the third time [21:24:42] this must be some kind of network issue, how do i debug it? [21:25:02] life is too short to spend half an hour waiting for a repository to clone every fucking time [21:26:27] RECOVERY - Varnishkafka Delivery Errors per minute on cp3004 is OK: OK: Less than 1.00% above the threshold [0.0] [21:26:53] MatmaRex: use ss or mtr to check network [21:30:56] PROBLEM - Varnishkafka Delivery Errors per minute on cp3016 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [20000.0] [21:31:56] RECOVERY - Varnishkafka Delivery Errors per minute on cp3009 is OK: OK: Less than 1.00% above the threshold [0.0] [21:32:17] PROBLEM - Varnishkafka Delivery Errors per minute on cp3004 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [20000.0] [21:35:57] PROBLEM - Varnishkafka Delivery Errors per minute on cp3015 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [20000.0] [21:37:07] PROBLEM - Varnishkafka Delivery Errors per minute on cp3010 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [20000.0] [21:39:16] RECOVERY - Varnishkafka Delivery Errors per minute on cp3016 is OK: OK: Less than 1.00% above the threshold [0.0] [21:43:06] RECOVERY - Varnishkafka Delivery Errors per minute on cp3004 is OK: OK: Less than 1.00% above the threshold [0.0] [21:43:16] RECOVERY - Varnishkafka Delivery Errors per minute on cp3010 is OK: OK: Less than 1.00% above the threshold [0.0] [21:45:37] RECOVERY - Varnishkafka Delivery Errors per minute on cp3015 is OK: OK: Less than 1.00% above the threshold [0.0] [21:48:39] (03Abandoned) 10Aklapper: Some smaller CSS and linebreak cleanup [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/107410 (owner: 10Aklapper) [22:27:07] PROBLEM - Varnishkafka Delivery Errors per minute on cp3004 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [20000.0] [22:32:57] RECOVERY - Varnishkafka Delivery Errors per minute on cp3004 is OK: OK: Less than 1.00% above the threshold [0.0] [22:54:21] 3Wikimedia-Stream, operations: stream.wikimedia.org throws websocket.WebSocketException: Handshake Status 502 Bad Gateway - https://phabricator.wikimedia.org/T68989#969475 (10ori) 5Open>3Resolved a:3ori [22:57:09] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 67 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 2, uunassigned_shards: 67, utimed_out: False, uactive_primary_shards: 67, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 134, uinitializing_shards: 0, unumber_of_data_nodes: 2} [22:57:16] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch inactive shards 67 threshold =0.1% breach: {ustatus: uyellow, unumber_of_nodes: 2, uunassigned_shards: 67, utimed_out: False, uactive_primary_shards: 67, ucluster_name: uproduction-logstash-eqiad, urelocating_shards: 0, uactive_shards: 134, uinitializing_shards: 0, unumber_of_data_nodes: 2} [22:57:16] PROBLEM - ElasticSearch health check for shards on logstash1001 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.138:9200/_cluster/health error while fetching: Request timed out.