[00:08:43] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [00:11:33] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 6346 seconds [00:21:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [00:26:33] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [00:27:34] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [00:30:33] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 7197 seconds [00:30:33] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 7200 seconds [00:32:53] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [00:33:25] (03PS2) 10Ori.livneh: Use aliasByNode() to clean up metric labels [operations/puppet] - 10https://gerrit.wikimedia.org/r/147673 (owner: 10BryanDavis) [00:33:33] (03CR) 10Ori.livneh: [C: 032 V: 032] Use aliasByNode() to clean up metric labels [operations/puppet] - 10https://gerrit.wikimedia.org/r/147673 (owner: 10BryanDavis) [00:35:19] (03PS1) 10Ori.livneh: HHVM: add explanatory comment re: runtime error level [operations/puppet] - 10https://gerrit.wikimedia.org/r/151258 [00:35:41] (03CR) 10Ori.livneh: [C: 032 V: 032] HHVM: add explanatory comment re: runtime error level [operations/puppet] - 10https://gerrit.wikimedia.org/r/151258 (owner: 10Ori.livneh) [00:38:03] https://integration.wikimedia.org/zuul/ jenkins stuck? [00:40:02] Krinkle: ^ ? [00:40:15] everything is stuck at "queued" [00:41:46] legoktm: such as? [00:42:02] pywikibot-core-tox-flake8 (non-voting) queued [00:42:11] pywikibot-core-tox-flake8-docstrings (non-voting) queued [00:42:24] a bunch of operations/puppet ones too [00:43:56] !log Restarting Jenkins on gallium because the pipeline is clogged [00:44:00] Logged the message, Master [00:47:00] Krinkle: should I comment "recheck" on those patchsets or will they automatically get picked up? [00:50:26] Zuul wasn't restarted, queue should be intact. [00:52:43] ah great, it took care of them [00:53:16] thanks! [01:14:03] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 23:12:57 UTC [01:20:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [01:35:31] legoktm: page_content_model done, ar_content_* started, rev_content_* todo [01:35:44] :D yay [01:38:12] springle: also, wiktionary seems to have some major db lag right now... https://en.wiktionary.org/w/index.php?maxlag=1 says 10198 seconds [01:39:30] that's db1018 i guess. looks like it's suffering from the backlinks namespace jobs [01:47:10] (03PS1) 10Springle: depool db1018 while lagged due to backlinks namespace jobs. reassign db1036 to vslow and dump load groups for s2. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151264 [01:48:06] (03CR) 10Springle: [C: 032] depool db1018 while lagged due to backlinks namespace jobs. reassign db1036 to vslow and dump load groups for s2. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151264 (owner: 10Springle) [01:48:10] (03Merged) 10jenkins-bot: depool db1018 while lagged due to backlinks namespace jobs. reassign db1036 to vslow and dump load groups for s2. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151264 (owner: 10Springle) [01:49:31] !log springle Synchronized wmf-config/db-eqiad.php: depool db1018, replag (duration: 00m 06s) [01:49:41] Logged the message, Master [01:52:53] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [01:53:43] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Aug 2 01:53:34 UTC 2014 [02:21:05] !log LocalisationUpdate completed (1.24wmf15) at 2014-08-02 02:20:02+00:00 [02:21:13] Logged the message, Master [02:22:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [02:23:53] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [02:33:39] !log LocalisationUpdate completed (1.24wmf16) at 2014-08-02 02:32:36+00:00 [02:33:45] Logged the message, Master [02:54:53] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [03:21:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [03:57:03] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sat 02 Aug 2014 01:56:52 UTC [04:17:23] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sat Aug 2 04:17:17 UTC 2014 [04:21:52] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Aug 2 04:20:45 UTC 2014 (duration 20m 44s) [04:21:56] Logged the message, Master [04:23:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [05:22:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [06:02:33] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 87 seconds [06:02:43] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 73 seconds [06:24:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [06:25:44] RECOVERY - Disk space on vanadium is OK: DISK OK [06:28:03] PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:44] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:53] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:53] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:53] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:53] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:53] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:03] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:04] PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:34] PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:43] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:54] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:44] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [06:46:13] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:46:13] RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:46:43] RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [06:46:53] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:50:01] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [07:06:13] RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [07:20:14] (03CR) 10Giuseppe Lavagetto: "This shouldn't have been merged." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150873 (owner: 10Ori.livneh) [07:23:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [08:25:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [08:31:04] PROBLEM - Apache HTTP on mw1179 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:32:03] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [08:32:54] RECOVERY - Apache HTTP on mw1179 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.079 second response time [08:44:13] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [09:24:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [10:26:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [10:54:03] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 02 Aug 2014 08:53:12 UTC [11:01:03] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Sat 02 Aug 2014 09:00:43 UTC [11:25:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [11:33:54] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Aug 2 11:33:49 UTC 2014 [12:00:33] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sat Aug 2 12:00:32 UTC 2014 [12:27:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [13:26:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [14:00:10] !log Restarting Jenkins in attempt to unstuck the clogged Zuul pipeline for gallium [14:00:16] Logged the message, Master [14:08:55] !log Jenkins / Zuul stuck {{bug|69045}} [14:09:00] Logged the message, Master [14:10:01] hashar: Yikes, now what? [14:10:08] I've restarted it twice, but no effect [14:10:14] I'm restarting Zuul now [14:10:17] Krinkle: will look at it [14:10:18] hold on [14:10:19] !log Restarting Zuul [14:10:29] Logged the message, Master [14:12:21] bah no traces for me :D [14:12:44] Sorry.. Been broken for almost 3 hours, didnt' want to wait longer. [14:12:47] We've got logs though [14:28:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [15:09:08] (03PS1) 10Hedonil: exec_environ.pp: Install package python-pygments (syntax highlighting) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) [15:27:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [15:28:06] !log reboot ms-be1008, stuck on xfs errors and most processes in D state [15:28:10] Logged the message, Master [15:30:24] PROBLEM - swift-object-replicator on ms-be1008 is CRITICAL: Connection refused by host [15:31:03] PROBLEM - swift-account-auditor on ms-be1008 is CRITICAL: Connection refused by host [15:31:03] PROBLEM - swift-object-updater on ms-be1008 is CRITICAL: Connection refused by host [15:31:04] PROBLEM - swift-account-replicator on ms-be1008 is CRITICAL: Connection refused by host [15:31:13] PROBLEM - swift-account-reaper on ms-be1008 is CRITICAL: Connection refused by host [15:42:03] RECOVERY - swift-account-auditor on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [15:42:04] RECOVERY - swift-object-updater on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [15:42:04] RECOVERY - swift-account-replicator on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [15:42:13] RECOVERY - swift-account-reaper on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [15:42:24] RECOVERY - swift-object-replicator on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [16:29:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [17:04:03] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.001 second response time [17:04:44] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 28 data above and 0 below the confidence bounds [17:04:44] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 28 data above and 0 below the confidence bounds [17:05:03] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.012 second response time [17:28:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [17:54:03] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 02 Aug 2014 15:53:18 UTC [18:30:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [18:33:03] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Aug 2 18:32:57 UTC 2014 [19:29:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [20:21:34] (03PS2) 10Yuvipanda: exec_environ.pp: Install package python-pygments [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil) [20:21:52] (03CR) 10Yuvipanda: [C: 031] exec_environ.pp: Install package python-pygments [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil) [20:31:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [20:48:39] (03PS1) 10Nemo bis: [Italian Planet] Update fcvg.it [operations/puppet] - 10https://gerrit.wikimedia.org/r/151362 [20:50:16] (03PS3) 10Yuvipanda: tools: Install package python-pygments [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil) [21:30:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC [22:32:03] PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC [23:03:48] _joe_: have you used signal handlers in python before? [23:04:44] <_joe_> uh, yes, but I would have to use pydoc to remind anything [23:04:53] <_joe_> or grep through my old code [23:05:06] hmm, ok. 's fine if it isn't in cache :) [23:31:03] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC