[00:08:43] <icinga-wm>	 RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds  
[00:11:33] <icinga-wm>	 PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 6346 seconds  
[00:21:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[00:26:33] <icinga-wm>	 RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds  
[00:27:34] <icinga-wm>	 RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds  
[00:30:33] <icinga-wm>	 PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 7197 seconds  
[00:30:33] <icinga-wm>	 PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 7200 seconds  
[00:32:53] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds  
[00:33:25] <grrrit-wm>	 (03PS2) 10Ori.livneh: Use aliasByNode() to clean up metric labels [operations/puppet] - 10https://gerrit.wikimedia.org/r/147673 (owner: 10BryanDavis)
[00:33:33] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] Use aliasByNode() to clean up metric labels [operations/puppet] - 10https://gerrit.wikimedia.org/r/147673 (owner: 10BryanDavis)
[00:35:19] <grrrit-wm>	 (03PS1) 10Ori.livneh: HHVM: add explanatory comment re: runtime error level [operations/puppet] - 10https://gerrit.wikimedia.org/r/151258 
[00:35:41] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] HHVM: add explanatory comment re: runtime error level [operations/puppet] - 10https://gerrit.wikimedia.org/r/151258 (owner: 10Ori.livneh)
[00:38:03] <legoktm>	 https://integration.wikimedia.org/zuul/ jenkins stuck?
[00:40:02] <legoktm>	 Krinkle: ^ ?
[00:40:15] <legoktm>	 everything is stuck at "queued"
[00:41:46] <Krinkle>	 legoktm: such as?
[00:42:02] <legoktm>	 pywikibot-core-tox-flake8 (non-voting) queued
[00:42:11] <legoktm>	 pywikibot-core-tox-flake8-docstrings (non-voting) queued
[00:42:24] <legoktm>	 a bunch of operations/puppet ones too
[00:43:56] <Krinkle>	 !log Restarting Jenkins on gallium because the pipeline is clogged
[00:44:00] <morebots>	 Logged the message, Master
[00:47:00] <legoktm>	 Krinkle: should I comment "recheck" on those patchsets or will they automatically get picked up?
[00:50:26] <Krinkle>	 Zuul wasn't restarted, queue should be intact.
[00:52:43] <legoktm>	 ah great, it took care of them
[00:53:16] <legoktm>	 thanks!
[01:14:03] <icinga-wm>	 PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 23:12:57 UTC  
[01:20:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[01:35:31] <springle>	 legoktm: page_content_model done, ar_content_* started, rev_content_* todo
[01:35:44] <legoktm>	 :D yay
[01:38:12] <legoktm>	 springle: also, wiktionary seems to have some major db lag right now... https://en.wiktionary.org/w/index.php?maxlag=1 says 10198 seconds
[01:39:30] <springle>	 that's db1018 i guess. looks like it's suffering from the backlinks namespace jobs
[01:47:10] <grrrit-wm>	 (03PS1) 10Springle: depool db1018 while lagged due to backlinks namespace jobs. reassign db1036 to vslow and dump load groups for s2. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151264 
[01:48:06] <grrrit-wm>	 (03CR) 10Springle: [C: 032] depool db1018 while lagged due to backlinks namespace jobs. reassign db1036 to vslow and dump load groups for s2. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151264 (owner: 10Springle)
[01:48:10] <grrrit-wm>	 (03Merged) 10jenkins-bot: depool db1018 while lagged due to backlinks namespace jobs. reassign db1036 to vslow and dump load groups for s2. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/151264 (owner: 10Springle)
[01:49:31] <logmsgbot>	 !log springle Synchronized wmf-config/db-eqiad.php: depool db1018, replag (duration: 00m 06s)
[01:49:41] <morebots>	 Logged the message, Master
[01:52:53] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected  
[01:53:43] <icinga-wm>	 RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Aug  2 01:53:34 UTC 2014  
[02:21:05] <logmsgbot>	 !log LocalisationUpdate completed (1.24wmf15) at 2014-08-02 02:20:02+00:00
[02:21:13] <morebots>	 Logged the message, Master
[02:22:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[02:23:53] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0]  
[02:33:39] <logmsgbot>	 !log LocalisationUpdate completed (1.24wmf16) at 2014-08-02 02:32:36+00:00
[02:33:45] <morebots>	 Logged the message, Master
[02:54:53] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[03:21:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[03:57:03] <icinga-wm>	 PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sat 02 Aug 2014 01:56:52 UTC  
[04:17:23] <icinga-wm>	 RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sat Aug  2 04:17:17 UTC 2014  
[04:21:52] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Aug  2 04:20:45 UTC 2014 (duration 20m 44s)
[04:21:56] <morebots>	 Logged the message, Master
[04:23:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[05:22:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[06:02:33] <icinga-wm>	 RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 87 seconds  
[06:02:43] <icinga-wm>	 RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 73 seconds  
[06:24:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[06:25:44] <icinga-wm>	 RECOVERY - Disk space on vanadium is OK: DISK OK  
[06:28:03] <icinga-wm>	 PROBLEM - puppet last run on iron is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:28:44] <icinga-wm>	 PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:28:53] <icinga-wm>	 PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:28:53] <icinga-wm>	 PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:28:53] <icinga-wm>	 PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:28:53] <icinga-wm>	 PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:28:53] <icinga-wm>	 PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:03] <icinga-wm>	 PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:04] <icinga-wm>	 PROBLEM - puppet last run on db1018 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:34] <icinga-wm>	 PROBLEM - puppet last run on search1001 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:29:43] <icinga-wm>	 PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:04] <icinga-wm>	 PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:30:54] <icinga-wm>	 PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures  
[06:45:44] <icinga-wm>	 RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures  
[06:45:54] <icinga-wm>	 RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures  
[06:45:54] <icinga-wm>	 RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures  
[06:45:54] <icinga-wm>	 RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures  
[06:46:13] <icinga-wm>	 RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures  
[06:46:13] <icinga-wm>	 RECOVERY - puppet last run on db1018 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures  
[06:46:43] <icinga-wm>	 RECOVERY - puppet last run on search1001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures  
[06:46:53] <icinga-wm>	 RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures  
[06:46:54] <icinga-wm>	 RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures  
[06:46:54] <icinga-wm>	 RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures  
[06:46:54] <icinga-wm>	 RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures  
[06:50:01] <icinga-wm>	 RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures  
[07:06:13] <icinga-wm>	 RECOVERY - puppet last run on iron is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures  
[07:20:14] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: "This shouldn't have been merged." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/150873 (owner: 10Ori.livneh)
[07:23:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[08:25:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[08:31:04] <icinga-wm>	 PROBLEM - Apache HTTP on mw1179 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[08:32:03] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0]  
[08:32:54] <icinga-wm>	 RECOVERY - Apache HTTP on mw1179 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.079 second response time  
[08:44:13] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0]  
[09:24:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[10:26:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[10:54:03] <icinga-wm>	 PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 02 Aug 2014 08:53:12 UTC  
[11:01:03] <icinga-wm>	 PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Sat 02 Aug 2014 09:00:43 UTC  
[11:25:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[11:33:54] <icinga-wm>	 RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Aug  2 11:33:49 UTC 2014  
[12:00:33] <icinga-wm>	 RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sat Aug  2 12:00:32 UTC 2014  
[12:27:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[13:26:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[14:00:10] <Krinkle>	 !log Restarting Jenkins in attempt to unstuck the clogged Zuul pipeline for gallium
[14:00:16] <morebots>	 Logged the message, Master
[14:08:55] <hashar>	 !log Jenkins / Zuul stuck {{bug|69045}}
[14:09:00] <morebots>	 Logged the message, Master
[14:10:01] <Krinkle>	 hashar: Yikes, now what?
[14:10:08] <Krinkle>	 I've restarted it twice, but no effect
[14:10:14] <Krinkle>	 I'm restarting Zuul now
[14:10:17] <hashar>	 Krinkle: will look at it
[14:10:18] <hashar>	 hold on
[14:10:19] <Krinkle>	 !log Restarting Zuul
[14:10:29] <morebots>	 Logged the message, Master
[14:12:21] <hashar>	 bah no traces for me :D
[14:12:44] <Krinkle>	 Sorry.. Been broken for almost 3 hours, didnt' want to wait longer.
[14:12:47] <Krinkle>	 We've got logs though
[14:28:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[15:09:08] <grrrit-wm>	 (03PS1) 10Hedonil: exec_environ.pp: Install package python-pygments (syntax highlighting) [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) 
[15:27:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[15:28:06] <godog>	 !log reboot ms-be1008, stuck on xfs errors and most processes in D state
[15:28:10] <morebots>	 Logged the message, Master
[15:30:24] <icinga-wm>	 PROBLEM - swift-object-replicator on ms-be1008 is CRITICAL: Connection refused by host  
[15:31:03] <icinga-wm>	 PROBLEM - swift-account-auditor on ms-be1008 is CRITICAL: Connection refused by host  
[15:31:03] <icinga-wm>	 PROBLEM - swift-object-updater on ms-be1008 is CRITICAL: Connection refused by host  
[15:31:04] <icinga-wm>	 PROBLEM - swift-account-replicator on ms-be1008 is CRITICAL: Connection refused by host  
[15:31:13] <icinga-wm>	 PROBLEM - swift-account-reaper on ms-be1008 is CRITICAL: Connection refused by host  
[15:42:03] <icinga-wm>	 RECOVERY - swift-account-auditor on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor  
[15:42:04] <icinga-wm>	 RECOVERY - swift-object-updater on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater  
[15:42:04] <icinga-wm>	 RECOVERY - swift-account-replicator on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator  
[15:42:13] <icinga-wm>	 RECOVERY - swift-account-reaper on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper  
[15:42:24] <icinga-wm>	 RECOVERY - swift-object-replicator on ms-be1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator  
[16:29:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[17:04:03] <icinga-wm>	 PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.001 second response time  
[17:04:44] <icinga-wm>	 PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 28 data above and 0 below the confidence bounds  
[17:04:44] <icinga-wm>	 PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 28 data above and 0 below the confidence bounds  
[17:05:03] <icinga-wm>	 RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.012 second response time  
[17:28:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[17:54:03] <icinga-wm>	 PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 02 Aug 2014 15:53:18 UTC  
[18:30:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[18:33:03] <icinga-wm>	 RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Aug  2 18:32:57 UTC 2014  
[19:29:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[20:21:34] <grrrit-wm>	 (03PS2) 10Yuvipanda: exec_environ.pp: Install package python-pygments [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil)
[20:21:52] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 031] exec_environ.pp: Install package python-pygments [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil)
[20:31:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[20:48:39] <grrrit-wm>	 (03PS1) 10Nemo bis: [Italian Planet] Update fcvg.it [operations/puppet] - 10https://gerrit.wikimedia.org/r/151362 
[20:50:16] <grrrit-wm>	 (03PS3) 10Yuvipanda: tools: Install package python-pygments [operations/puppet] - 10https://gerrit.wikimedia.org/r/151295 (https://bugzilla.wikimedia.org/69050) (owner: 10Hedonil)
[21:30:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC  
[22:32:03] <icinga-wm>	 PROBLEM - Puppet freshness on labsdb1005 is CRITICAL: Last successful Puppet run was Thu 31 Jul 2014 16:06:51 UTC  
[23:03:48] <YuviPanda>	 _joe_: have you used signal handlers in python before?
[23:04:44] <_joe_>	 uh, yes, but I would have to use pydoc to remind anything
[23:04:53] <_joe_>	 or grep through my old code
[23:05:06] <YuviPanda>	 hmm, ok. 's fine if it isn't in cache :)
[23:31:03] <icinga-wm>	 PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Fri 01 Aug 2014 19:17:00 UTC