[00:07:46] <icinga-wm>	 RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0  
[00:27:38] <grrrit-wm>	 (03CR) 10Yurik: "recheck" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138049 (owner: 10Yurik)
[01:00:46] <icinga-wm>	 PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Fri 06 Jun 2014 22:00:07 UTC  
[01:06:28] <grrrit-wm>	 (03PS2) 10Ori.livneh: role::mediawiki::webserver: set maxclients dynamically, dissolve bits role [operations/puppet] - 10https://gerrit.wikimedia.org/r/137947 
[01:07:46] <icinga-wm>	 PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.11  
[01:08:16] <grrrit-wm>	 (03CR) 10Ori.livneh: role::mediawiki::webserver: set maxclients dynamically, dissolve bits role (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/137947 (owner: 10Ori.livneh)
[01:17:46] <icinga-wm>	 RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0  
[01:32:17] <grrrit-wm>	 (03PS1) 10Yurik: Fixing ZeroPortal labs rollout [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138126 
[01:34:25] <grrrit-wm>	 (03CR) 10Yurik: [C: 032] "labs-only, prod noop" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138126 (owner: 10Yurik)
[01:34:30] <grrrit-wm>	 (03Merged) 10jenkins-bot: Fixing ZeroPortal labs rollout [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/138126 (owner: 10Yurik)
[01:44:46] <icinga-wm>	 PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC  
[02:17:34] <logmsgbot>	 !log LocalisationUpdate completed (1.24wmf7) at 2014-06-07 02:16:30+00:00
[02:17:41] <morebots>	 Logged the message, Master
[02:30:46] <icinga-wm>	 RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sat Jun  7 02:30:45 UTC 2014  
[02:31:01] <logmsgbot>	 !log LocalisationUpdate completed (1.24wmf8) at 2014-06-07 02:29:57+00:00
[02:31:06] <morebots>	 Logged the message, Master
[03:24:39] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun  7 03:23:32 UTC 2014 (duration 23m 31s)
[03:24:43] <morebots>	 Logged the message, Master
[04:45:46] <icinga-wm>	 PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC  
[05:22:46] <icinga-wm>	 PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0066889632107  
[05:27:46] <icinga-wm>	 RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0  
[05:35:10] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: "@gwicke ok, it's a little baffling how we can't find a feature complete repository software :) My -1 is purely on technicalities though." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 (owner: 10Filippo Giunchedi)
[06:03:36] <grrrit-wm>	 (03CR) 10GWicke: "I guess full-blown dak could do it ;)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 (owner: 10Filippo Giunchedi)
[06:12:37] <_joe_>	 hey gwicke :)
[06:13:02] <_joe_>	 my -1 there was just for the enhancements to the puppet manifests, don't get me wrong
[06:13:04] * gwicke unlurks
[06:13:19] <gwicke>	 yeah, I know - I only responded to the other part
[06:13:33] <_joe_>	 I just hate we need two different softwares for doing almost exactly the same thing
[06:13:36] <gwicke>	 currently reading the aptly source ;)
[06:14:09] <_joe_>	 well, I don't think there is anything simple enough covering all our needs
[06:14:34] <_joe_>	 so in the end, we can live with this :)
[06:15:31] <gwicke>	 yeah, I think so too
[06:16:10] <gwicke>	 my guess is that mini-dinstall wouldn't work so well to handle a repo the size of full debian
[06:16:31] <gwicke>	 but I'll be happy when we reach two dozen different packages
[06:18:21] <gwicke>	 adding a <n latest> feature to aptly would probably also not be too hard: https://github.com/smira/aptly/blob/c72ef05a2a4d65c7d245aa867840d35f5c886144/deb/reflist.go#L309
[06:27:46] <icinga-wm>	 PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0133333333333  
[07:20:56] <icinga-wm>	 PROBLEM - LighttpdHTTP on dataset1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[07:21:46] <icinga-wm>	 RECOVERY - LighttpdHTTP on dataset1001 is OK: HTTP OK: HTTP/1.1 200 OK - 5122 bytes in 0.002 second response time  
[07:46:46] <icinga-wm>	 PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC  
[09:35:16] <icinga-wm>	 PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[10:47:46] <icinga-wm>	 PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC  
[12:40:36] <icinga-wm>	 PROBLEM - SSH on searchidx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[12:41:27] <icinga-wm>	 RECOVERY - SSH on searchidx1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0)  
[13:48:46] <icinga-wm>	 PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC  
[16:10:23] <grrrit-wm>	 (03PS2) 10Ori.livneh: beta: fix scap for videoscalers [operations/puppet] - 10https://gerrit.wikimedia.org/r/137274 (owner: 10BryanDavis)
[16:11:13] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 031] beta: fix scap for videoscalers [operations/puppet] - 10https://gerrit.wikimedia.org/r/137274 (owner: 10BryanDavis)
[16:49:46] <icinga-wm>	 PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC  
[17:01:45] <Nemo_bis>	 _joe_ or godog , can you give a look to elastic1017.eqiad.wmnet ?
[17:04:33] <godog>	 Nemo_bis: is it misbehaving?
[17:10:03] <godog>	 indeed ES isn't very keen on talking to me
[17:11:20] <quiddity>	 Yup, came to report same.  Timeouts on all non-default-namespace searches.
[17:12:28] <Nemo_bis>	 It's been like that since this morning, en.wiki and commons alike
[17:13:34] <godog>	 kk, I'm trying to capture a jstack and then restarting
[17:18:43] <Krenair>	 Any ops around?
[17:18:57] <pajz>	 Hi, we have not received any email through OTRS for about 21 hours. Can someone look into that?
[17:18:58] <Krenair>	 OTRS has not received any mail for 21 (!) hours!
[17:20:06] <Nemo_bis>	 ou https://ganglia.wikimedia.org/latest/graph_all_periods.php?title=mchenry+mail+delivery&vl=&x=&n=&hreg[]=mchenry&mreg[]=exim.%2B&gtype=line&glegend=show&aggregate=1
[17:20:52] <Nemo_bis>	 Timing coincides with the light blue line
[17:22:28] <godog>	 Nemo_bis quiddity bah I've captured a jstack and restarted ES
[17:23:07] <godog>	 looking at mr henry
[17:26:01] <godog>	 or iodine rather, it doesn't look like it is happy to talk to mrchenry
[17:27:46] <icinga-wm>	 RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0  
[17:42:46] <grrrit-wm>	 (03PS1) 10Jgreen: re-quote booleans for exim::roled class call because unquoting seems to break the template [operations/puppet] - 10https://gerrit.wikimedia.org/r/138182 
[17:44:07] <grrrit-wm>	 (03CR) 10Jgreen: [C: 032 V: 032] re-quote booleans for exim::roled class call because unquoting seems to break the template [operations/puppet] - 10https://gerrit.wikimedia.org/r/138182 (owner: 10Jgreen)
[17:49:58] <godog>	 looks like Jeff_Green might be on it
[17:50:25] <Jeff_Green>	 yeah I think I just fixed it
[17:55:06] <Nemo_bis>	 godog: did you !log the fix to CirrusSearch which icinga-wm just announced? :)
[17:55:26] <godog>	 Nemo_bis: oops no, sorry
[17:56:05] <godog>	 !log restarted ES on elastic1017.eqiad.wmnet (at 17:22 UTC)
[17:56:12] <morebots>	 Logged the message, Master
[18:01:06] <icinga-wm>	 PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 1944 MB (3% inode=89%):  
[18:21:46] <icinga-wm>	 PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sat 07 Jun 2014 15:20:52 UTC  
[18:42:46] <icinga-wm>	 PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.746666666667  
[18:57:13] <hoo>	 ori: Do you think we could change mwgrep to also work on Module: ?
[19:09:16] <icinga-wm>	 PROBLEM - SSH on mw1053 is CRITICAL: Server answer:  
[19:09:46] <icinga-wm>	 PROBLEM - puppet disabled on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[19:10:06] <icinga-wm>	 PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[19:10:56] <icinga-wm>	 PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[19:12:06] <icinga-wm>	 RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output  
[19:12:46] <icinga-wm>	 RECOVERY - puppet disabled on mw1053 is OK: OK  
[19:13:16] <icinga-wm>	 PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[19:13:26] <icinga-wm>	 PROBLEM - Disk space on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[19:14:06] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1053 is CRITICAL: NRPE: Call to popen() failed  
[19:15:06] <icinga-wm>	 PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[19:15:17] <icinga-wm>	 RECOVERY - SSH on mw1053 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0)  
[19:15:26] <icinga-wm>	 RECOVERY - Disk space on mw1053 is OK: DISK OK  
[19:16:06] <icinga-wm>	 RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output  
[19:18:06] <icinga-wm>	 RECOVERY - check if dhclient is running on mw1053 is OK: PROCS OK: 0 processes with command name dhclient  
[19:18:16] <icinga-wm>	 RECOVERY - DPKG on mw1053 is OK: All packages OK  
[19:18:34] <Krenair>	 Maybe I'm missing something, but...
[19:18:54] <Krenair>	 0 processes with that name, but the check to make sure the process is running succeeds?
[19:18:56] <icinga-wm>	 RECOVERY - RAID on mw1053 is OK: OK: no RAID installed  
[19:23:26] <ori>	 hoo|away: sure; submit a patch
[19:24:07] <ori>	 Krenair: the executable that implements the check takes an argument specifying what is the desired number of processes, and what the thresholds are for warning / alerting
[19:24:25] <ori>	 Krenair: in the case of dhclient, it makes sense for the check to issue an alert if the number of processes exceeds 0
[19:24:40] <ori>	 Krenair: because the idea is that dhclient ought to run very quickly and terminate; if it lingers it's a sign of some issue
[19:24:48] <Krenair>	 Ah, I see.
[19:25:10] <Krenair>	 Admittedly I didn't check to find out what dhclient does :p
[19:50:46] <icinga-wm>	 PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC  
[20:01:56] <icinga-wm>	 PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:02:16] <icinga-wm>	 PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:02:56] <icinga-wm>	 RECOVERY - RAID on mw1053 is OK: OK: no RAID installed  
[20:03:16] <icinga-wm>	 RECOVERY - DPKG on mw1053 is OK: All packages OK  
[20:05:56] <icinga-wm>	 PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:06:06] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:06:16] <icinga-wm>	 PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:06:16] <icinga-wm>	 PROBLEM - SSH on mw1053 is CRITICAL: Server answer:  
[20:07:16] <icinga-wm>	 RECOVERY - SSH on mw1053 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0)  
[20:07:56] <icinga-wm>	 RECOVERY - RAID on mw1053 is OK: OK: no RAID installed  
[20:08:06] <icinga-wm>	 PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:10:06] <icinga-wm>	 RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output  
[20:11:06] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:12:06] <icinga-wm>	 RECOVERY - check if dhclient is running on mw1053 is OK: PROCS OK: 0 processes with command name dhclient  
[20:12:16] <icinga-wm>	 RECOVERY - DPKG on mw1053 is OK: All packages OK  
[20:13:06] <icinga-wm>	 PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:15:06] <icinga-wm>	 PROBLEM - check if dhclient is running on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:16:06] <icinga-wm>	 RECOVERY - check if dhclient is running on mw1053 is OK: PROCS OK: 0 processes with command name dhclient  
[20:17:16] <icinga-wm>	 PROBLEM - DPKG on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:17:56] <icinga-wm>	 PROBLEM - RAID on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:18:06] <icinga-wm>	 PROBLEM - check configured eth on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:19:32] <greg-g>	 cmjohnson1: robh ^^ seems 1053 is berzerk again
[20:20:16] <icinga-wm>	 PROBLEM - SSH on mw1053 is CRITICAL: Server answer:  
[20:20:46] <icinga-wm>	 PROBLEM - puppet disabled on mw1053 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.  
[20:20:56] <icinga-wm>	 RECOVERY - RAID on mw1053 is OK: OK: no RAID installed  
[20:20:57] <greg-g>	 mutante: ^^ 1053 is down, we can deal, but, that stupid machine never wants to stay up
[20:21:06] <icinga-wm>	 RECOVERY - check configured eth on mw1053 is OK: NRPE: Unable to read output  
[20:21:16] <icinga-wm>	 RECOVERY - DPKG on mw1053 is OK: All packages OK  
[20:21:16] <icinga-wm>	 RECOVERY - SSH on mw1053 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0)  
[20:21:40] <cmjohnson1>	 greg-g: not sure what's going on with it.
[20:21:46] <icinga-wm>	 RECOVERY - puppet disabled on mw1053 is OK: OK  
[20:22:13] <greg-g>	 not sure, don't matter much, but if it continually goes down/up, maybe depool/disable icinga on it?
[20:22:54] <cmjohnson1>	 I agree, let's depool it
[20:25:41] <sjoerddebruin>	 cmjohnson1, greg-g: was this also causing some errors regarding the search function?
[20:27:36] <greg-g>	 sjoerddebruin: shouldn't, what errors were you seeing?
[20:28:43] <sjoerddebruin>	 greg-g: There was someone complaining in the nlwiki-village pumb. It was "We could not complete your search due to a temporary problem. Please try again later."
[20:29:07] <sjoerddebruin>	 A error returned by the backend of the cirrussearch. 
[20:29:10] <greg-g>	 maybe? /me shrugs
[20:29:14] <greg-g>	 oh, odd
[20:29:27] <greg-g>	 is there any info on the backend error?
[20:29:49] <sjoerddebruin>	 No. I tried myself but didn't get any.
[20:29:53] <greg-g>	 k
[20:30:05] <sjoerddebruin>	 But in the past weeks there seems to be a higher error count. :/
[20:30:09] <greg-g>	 hrmmm
[20:30:47] <Betacommand>	 how do you mark a change in gerrit as abandoned
[20:31:07] <greg-g>	 grr, no nik online
[20:31:11] <greg-g>	 sjoerddebruin: you're right
[20:31:19] <sjoerddebruin>	 :D
[20:32:09] <greg-g>	 http://imgur.com/8ZUjyjT
[20:32:20] <greg-g>	 that's "CirrusSearch-failed" events
[20:32:25] * greg-g emails nik
[20:32:55] <sjoerddebruin>	 Yeah, it's a crucial part of the wiki. :/
[20:33:47] <greg-g>	 yeah, just emailed, sorry
[20:35:17] <greg-g>	 cmjohnson1: elastic1017 is down
[20:35:26] <greg-g>	 not sure if that's known, emailing nik that as well
[20:35:26] <icinga-wm>	 PROBLEM - Disk space on analytics1013 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/h 100792 MB (5% inode=99%): /var/lib/hadoop/data/j 89442 MB (4% inode=99%): /var/lib/hadoop/data/f 100982 MB (5% inode=99%): /var/lib/hadoop/data/l 72878 MB (3% inode=99%):  
[20:38:31] <cmjohnson1>	 i noticed it early but I assumed nik was working on it
[20:38:34] <Nemo_bis>	 again elastic1017.eqiad.wmnet acting up
[20:38:36] <cmjohnson1>	 or ottomata
[20:38:50] <greg-g>	 Nemo_bis: yeah, just emailed Ops list about it
[20:38:58] <greg-g>	 hopefully someone's checking email on Saturday
[20:38:58] <cmjohnson1>	 elastic1017 was added yesterday afternoon
[20:39:20] <Nemo_bis>	 depool?
[20:39:23] <greg-g>	 cmjohnson1: it just keeps going up/down
[20:39:26] <Nemo_bis>	 Oh well, to bed now
[20:39:32] <greg-g>	 Nemo_bis: not necissarily
[20:39:41] <greg-g>	 I wouldn't know which ones are masters
[20:39:46] <greg-g>	 g'night
[20:40:06] <greg-g>	 elastic isn't like the mw pool, depooling the wrong one (or two) could be bad
[20:40:11] <greg-g>	 they're databases :)
[20:40:33] <cmjohnson1>	 I believe the masters are identified
[20:40:39] * cmjohnson1 goes to look
[20:40:43] <greg-g>	 well, hopefully just one would only cause a short spike as the cluster reshuffles shards, but.. again, I'm wary personally ;)
[20:46:04] <cmjohnson1>	  $master_eligible = $::hostname ? {
[20:46:05] <cmjohnson1>	             'elastic1002' => true,
[20:46:05] <cmjohnson1>	             'elastic1007' => true,
[20:46:07] <cmjohnson1>	             'elastic1014' => true,
[20:46:37] <greg-g>	 weird, so just a slave but causing tons of errors
[20:46:43] <greg-g>	 user facing errors
[20:47:08] <matanya>	 what is the offecnding box ?
[20:47:44] <matanya>	 1017 ?
[20:47:46] <greg-g>	 yeah
[20:49:04] * greg-g just texted nik
[20:50:14] <matanya>	 greg-g: hint: https://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Search+eqiad&h=search1017.eqiad.wmnet&jr=&js=&v=3.362&m=search_threads&vl=threads ?
[20:51:35] <greg-g>	 on that time line not sure
[20:51:52] <greg-g>	 https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Search+eqiad&h=search1017.eqiad.wmnet&jr=&js=&v=3.362&m=search_threads&vl=threads
[20:51:55] <greg-g>	 that's day
[20:52:08] <matanya>	 yeah
[20:52:19] <matanya>	 some spikes
[20:52:21] <greg-g>	 yeah
[20:52:38] <greg-g>	 vs https://ganglia.wikimedia.org/latest/graph.php?r=day&z=xlarge&c=Search+eqiad&h=search1011.eqiad.wmnet&jr=&js=&v=3.362&m=search_threads&vl=threads
[20:52:41] <matanya>	 ok, nik is got it
[20:52:55] <matanya>	 *has
[20:53:12] <greg-g>	 oh man
[21:10:16] <icinga-wm>	 PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[21:13:38] <manybubbles>	 greg-g: hey, sorry you had to call me 
[21:17:06] <icinga-wm>	 RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.011 second response time  
[21:22:09] <greg-g>	 manybubbles: no worries man, sorry you had to come home from maker faire
[21:24:16] <greg-g>	 manybubbles: hope it's easy/quick :/
[21:31:28] <manybubbles>	 !log elastic1017 is sick - thrashing to death on io - restarting Elasticsearch to see if it recovers unthrashed
[21:31:33] <morebots>	 Logged the message, Master
[21:35:26] <icinga-wm>	 PROBLEM - ElasticSearch health check on elastic1017 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.39  
[21:35:31] <manybubbles>	 !log after consulting logs - elastic1017 has had high io wait since it was deployed - I'm taking it out of rotation
[21:35:34] <morebots>	 Logged the message, Master
[21:36:25] <manybubbles>	 !log that means I turned off puppet and shut down Elasticsearch on elastic1017 - you can expect the cluster to go yellow for half an hour or so while the other nodes take rebuild the redundency that elastic1017 had
[21:36:30] <morebots>	 Logged the message, Master
[21:37:38] <greg-g>	 thanks manybubbles 
[21:37:58] <manybubbles>	 greg-g: no problem - it was stupid to try to add elastic1017 tot he cluster yesterday afternoon
[21:38:20] <manybubbles>	 I shouldn't have done it
[21:38:43] <Amgine>	 bad dev, no biscuit?
[21:39:43] <greg-g>	 calling it while the cluster rebuilds
[21:42:46] <icinga-wm>	 RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0  
[21:43:39] <greg-g>	 wee
[22:07:15] <manybubbles>	 greg-g: ok - I'm going to take my laptop over to dinner - it everything looks a-o-better
[22:51:46] <icinga-wm>	 PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 03 Jun 2014 16:21:49 UTC  
[23:32:36] <icinga-wm>	 PROBLEM - Kafka Broker Under Replicated Partitions on analytics1022 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 15.0  
[23:34:36] <icinga-wm>	 PROBLEM - Disk space on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[23:34:36] <icinga-wm>	 PROBLEM - Kafka Broker Server on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[23:34:36] <icinga-wm>	 PROBLEM - RAID on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[23:34:46] <icinga-wm>	 PROBLEM - check if dhclient is running on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[23:34:56] <icinga-wm>	 PROBLEM - puppet disabled on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[23:35:06] <icinga-wm>	 PROBLEM - jmxtrans on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[23:35:16] <icinga-wm>	 PROBLEM - check configured eth on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[23:35:16] <icinga-wm>	 PROBLEM - DPKG on analytics1021 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[23:37:06] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on amssq47 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1.233333  
[23:37:06] <icinga-wm>	 RECOVERY - DPKG on analytics1021 is OK: All packages OK  
[23:37:06] <icinga-wm>	 RECOVERY - check configured eth on analytics1021 is OK: NRPE: Unable to read output  
[23:37:26] <icinga-wm>	 RECOVERY - Disk space on analytics1021 is OK: DISK OK  
[23:37:26] <icinga-wm>	 RECOVERY - RAID on analytics1021 is OK: OK: no disks configured for RAID  
[23:37:36] <icinga-wm>	 RECOVERY - check if dhclient is running on analytics1021 is OK: PROCS OK: 0 processes with command name dhclient  
[23:37:46] <icinga-wm>	 RECOVERY - puppet disabled on analytics1021 is OK: OK  
[23:37:56] <icinga-wm>	 RECOVERY - jmxtrans on analytics1021 is OK: PROCS OK: 1 process with command name java, args -jar jmxtrans-all.jar  
[23:38:16] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1037 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 0.016667  
[23:38:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1038 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 0.016667  
[23:39:06] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on amssq47 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:39:16] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1037 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:39:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1038 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:41:06] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on amssq48 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 95.800003  
[23:41:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1046 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 75.666664  
[23:41:46] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on amssq55 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 88.366669  
[23:42:06] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on amssq51 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 201.600006  
[23:42:16] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on amssq57 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 131.300003  
[23:42:16] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on amssq48 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:42:16] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on amssq49 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 458.033325  
[23:42:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1046 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:42:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4008 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 524.299988  
[23:42:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1059 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 532.299988  
[23:42:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4018 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 484.733337  
[23:42:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4001 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 579.599976  
[23:42:37] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4010 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 518.56665  
[23:42:37] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4017 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 206.166672  
[23:42:38] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4009 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 655.200012  
[23:42:46] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on amssq55 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:06] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on amssq51 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:06] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on amssq57 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:16] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on amssq49 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:27] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 638.400024  
[23:43:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4004 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 39.433334  
[23:43:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1059 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4008 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4018 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4001 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:37] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4016 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 259.333344  
[23:43:37] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4003 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 688.799988  
[23:43:38] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4010 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:38] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4009 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:43:39] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4017 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:44:26] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:44:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4004 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:44:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1067 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 571.200012  
[23:44:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1055 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 503.066681  
[23:44:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4016 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:44:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4003 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:44:37] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1053 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 554.400024  
[23:44:37] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1065 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 493.166656  
[23:44:38] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 562.06665  
[23:44:56] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1068 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 571.200012  
[23:44:56] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1066 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 571.200012  
[23:44:56] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1054 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 596.400024  
[23:45:06] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 529.200012  
[23:45:06] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 571.200012  
[23:45:06] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 436.799988  
[23:45:16] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1052 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 596.400024  
[23:46:56] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1066 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:47:06] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 483.0  
[23:47:16] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 579.599976  
[23:47:16] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 475.166656  
[23:47:16] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1052 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:47:36] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 510.799988  
[23:47:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1067 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:47:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1055 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:47:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1065 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:47:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1053 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:47:37] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:47:56] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1068 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:47:56] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1054 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:47:56] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:48:06] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:48:14] <hoo>	 !log Fixed four CentralAuth log entries on meta which were logged for WikiSets/0
[23:48:20] <morebots>	 Logged the message, Master
[23:48:48] <Krenair>	 out of interest what were they, hoo?
[23:49:06] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:49:21] <hoo>	 oh, it were five actually, doh
[23:49:44] <hoo>	 UPDATE logging SET page_title = 'WikiSets/11' WHERE log_id = 1934956;
[23:49:44] <hoo>	 UPDATE logging SET log_title = 'WikiSets/11' WHERE log_id = 1934956;
[23:49:44] <hoo>	 UPDATE logging SET log_title = 'WikiSets/12' WHERE log_id = 2263003;
[23:49:44] <hoo>	 UPDATE logging SET log_title = 'WikiSets/13' WHERE log_id = 3947997;
[23:49:44] <hoo>	 UPDATE logging SET log_title = 'WikiSets/14' WHERE log_id = 3948158;
[23:50:06] <hoo>	 oh no, it were 4... screwed the first one :P
[23:50:20] <hoo>	 Krenair: ^
[23:51:06] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:51:16] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:51:16] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:51:36] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[23:52:55] <Krenair>	 | log_id  | log_timestamp  | log_comment                 | log_user_text |
[23:52:56] <Krenair>	 | 1934956 | 20110702185233 | testing                     | Ruslik0       |
[23:52:56] <Krenair>	 | 2263003 | 20111108101428 | For New wiki importer group | Ruslik0       |
[23:52:57] <Krenair>	 | 3947997 | 20121213154417 |                             | Vituzzu       |
[23:52:57] <Krenair>	 | 3948158 | 20121213160615 |                             | Vituzzu       |
[23:52:58] <Krenair>	 ok
[23:53:18] <Krenair>	 interesting. was there a bug about this hoo? any idea why it happened?
[23:53:37] <hoo>	 Krenair: Not sure we had a bug, but it has been fixed in LA
[23:53:39] <hoo>	 * CA
[23:55:55] <Krenair>	 ah, ok
[23:56:54] <Krenair>	 hoo, bug 27031 right?
[23:57:20] <hoo>	 yep :)