[00:01:42] <icinga-wm>	 PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 6.607 second response time  
[00:03:37] <Reedy>	 Yup
[00:08:42] <icinga-wm>	 RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 195878 bytes in 7.352 second response time  
[01:53:12] <icinga-wm>	 PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC  
[02:11:05] <logmsgbot>	 !log LocalisationUpdate completed (1.23wmf13) at 2014-02-16 02:11:05+00:00
[02:11:16] <morebots>	 Logged the message, Master
[02:20:57] <logmsgbot>	 !log LocalisationUpdate completed (1.23wmf14) at 2014-02-16 02:20:57+00:00
[02:21:05] <morebots>	 Logged the message, Master
[02:42:25] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at 2014-02-16 02:42:25+00:00
[02:42:32] <morebots>	 Logged the message, Master
[04:08:07] * Gloria  tickles Reedy.
[04:23:54] <p858snake|l>	 Gloria: Keep that for the hotel room! /Reedy
[04:43:33] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000  
[04:54:12] <icinga-wm>	 PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC  
[05:39:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 73.833336  
[05:40:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[05:43:33] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000  
[06:13:07] <grrrit-wm>	 (03PS1) 10Gerrit Patch Uploader: Remove C: namespace alias (for categories) from hiwiki config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113656 
[06:13:13] <grrrit-wm>	 (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113656 (owner: 10Gerrit Patch Uploader)
[07:55:12] <icinga-wm>	 PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC  
[08:03:31] <GerardM->	 hoi there are problems with labs again
[08:03:52] <matanya>	 yes
[08:03:55] <matanya>	 nfs issues
[08:03:58] <GerardM->	 Your browser sent a request that this server could not understand. for http://tools.wmflabs.org/reasonator/test/?q=Q7057025&live
[08:04:14] <GerardM->	 ... :(
[08:04:41] <GerardM->	 what is the problem, not enough space or a broken implementation ?
[08:04:57] <matanya>	 seems broken
[08:05:12] <icinga-wm>	 PROBLEM - Disk space on labstore4 is CRITICAL: DISK CRITICAL - /srv is not accessible: Input/output error  
[08:05:19] <GerardM->	 it was that same way yesterday ...
[08:05:36] <matanya>	 see the icinga-wm errors
[08:07:02] <GerardM->	 apergos ^^
[08:07:15] <matanya>	 !log Labs NFS Issues: cannot open directory .: Stale NFS file handle XFS seems broken again
[08:07:23] <morebots>	 Logged the message, Master
[08:16:22] <matanya>	 ori: can you reboot labstore4 ?
[08:29:24] <sDrewthedoff>	 whole lot is kaput
[08:30:15] <sDrewthedoff>	 ping coren, your baby is a problem
[08:31:02] <matanya>	 I don't get why XFS goes kaboom every so often
[08:32:41] <ori>	 is labstore4 in use?
[08:34:06] <matanya>	 iirc it is ori 
[08:34:14] <matanya>	 it is the main NFS mount
[08:34:21] <ori>	 the MOTD says 'labstore4 is a Wikimedia DECOMMISSIONED server (base::decommissioned).'
[08:34:46] <matanya>	 hmm
[08:35:01] <matanya>	 I see in admin log: 23:31 Coren: tools Rebooted labstore4 -- XFS done got broken agun
[08:35:15] <ori>	 is anything actually affected?
[08:35:41] <ori>	 if it is, I can reboot, on the basis of Coren's precedence
[08:36:05] <matanya>	 no one can write any file in labs ori 
[08:36:35] <matanya>	 or wvwn list
[08:36:38] <matanya>	 *even
[08:36:44] <matanya>	 when doing ls: 
[08:36:46] <matanya>	 ls: cannot open directory .: Stale NFS file handle
[08:37:12] <ori>	 !log labstore4. dmesg: XFS (dm-0): xfs_log_force: error 5 returned. Rebooting.
[08:37:45] <matanya>	 thank you
[08:38:00] <ori>	 morebots: ?
[08:38:37] <matanya>	 sunday for him
[08:38:52] <icinga-wm>	 PROBLEM - Host labstore4 is DOWN: PING CRITICAL - Packet loss = 100%  
[08:42:30] <ori>	 well, it's not coming back up
[08:44:03] <matanya>	 great :/
[08:45:17] <matanya>	 no there is no login, seems like because getty can't land user @home
[09:08:12] <ori>	 !log labstore4 failed to boot. I don't have access to mgmt and can't troubleshoot further.
[09:09:15] <sDrewthedoff>	 leslie carr left WMF? yes?
[09:09:54] <Gloria>	 sDrewthedoff: Yes, on January 17.
[09:10:12] <sDrewthedoff>	 okay, still in WMF host records
[09:10:23] <sDrewthedoff>	 whois
[09:16:26] <Nemo_bis>	 And not announced on https://identi.ca/wikimediaatwork either FWIW
[09:51:57] <apergos>	 !log powercycling labstore4:  [BUG: soft lockup - CPU#6 stuck for 22s! [xfsaild/dm-0:5320]  on mgmt console
[09:53:43] <matanya>	 thanks apergos 
[09:54:04] <kaldari>	 thanks!!!
[09:54:13] <icinga-wm>	 RECOVERY - Disk space on labstore4 is OK: DISK OK  
[09:54:22] <icinga-wm>	 RECOVERY - Host labstore4 is UP: PING OK - Packet loss = 0%, RTA = 35.51 ms  
[09:54:26] <apergos>	 don't thank me yet, we might not have those volumes
[09:55:58] <matanya>	 ssh tools : ssh_exchange_identification: Connection closed by remote host
[09:57:03] <ori>	 apergos: thank you
[10:03:57] <GerardM->	 Reasonator.info (relies on labs) is still down
[10:08:47] <matanya>	 apergos: did you look into the ssh issue?
[10:11:12] <GerardM->	 http://tools.wmflabs.org/reasonator no joy either
[10:11:55] <GerardM->	 (no https)
[10:15:46] <apergos>	 it's not an ssh issue per se
[10:15:53] <apergos>	 I am working on bring the fiflesystem back up
[10:16:17] <apergos>	 I'll be able to tell you in a few minutes if it's going to be ok or whether it will take coren's expertise
[10:37:15] <apergos>	 ok we are going to reboot again (now that it's probably in a state to reboot ok) and see if the normal scripts run on startup get it (or if I can do it in the two commands I'm supposed to be able to run)
[10:37:20] <apergos>	 here we go again
[10:37:38] * matanya  is crissing fingers
[10:37:41] <matanya>	 *o
[10:38:22] <hoo>	 matanya: Still the labs troubles?
[10:38:27] <apergos>	 working on it
[10:38:31] <matanya>	 yes hoo
[10:38:32] <hoo>	 :)
[10:39:22] <icinga-wm>	 PROBLEM - Host labstore4 is DOWN: PING CRITICAL - Packet loss = 100%  
[10:40:22] <icinga-wm>	 RECOVERY - Host labstore4 is UP: PING OK - Packet loss = 0%, RTA = 35.85 ms  
[10:41:52] <matanya>	 YAY
[10:42:02] <matanya>	 works, thanks apergos
[10:42:47] <matanya>	 or not :/
[10:42:48] <hoo>	 still can't ssh into
[10:43:06] <hoo>	 debug1: Exit status 254
[10:43:16] <matanya>	 can ssh, but thrown out
[10:43:35] <matanya>	 Unable to create and initialize directory '/home/matanya'.
[10:44:09] <sjoerddebruin>	 Our bot is working again, I think...
[10:44:16] <sjoerddebruin>	 Jup. :)
[10:47:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1963.466675  
[10:48:32] <apergos>	 !log for the record, after the reboot I added back the 10.0.0.45 and ran start-nfs, still not happy
[10:48:40] <morebots>	 Logged the message, Master
[10:51:58] <apergos>	 going to give one more try at the commands one at a time from the script
[10:56:12] <icinga-wm>	 PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC  
[10:56:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[10:56:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1591.099976  
[10:58:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1372.366699  
[11:04:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:05:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 939.466675  
[11:07:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2883.0  
[11:08:32] <apergos>	 I'm pretty sure part of it did not go correctly (the /public directories are missing)
[11:08:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:08:43] <apergos>	 I can see everything else on /srv
[11:08:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:08:55] <apergos>	  I will keep looking for a little while longer
[11:10:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:11:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 325.866669  
[11:12:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:14:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1214.199951  
[11:16:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:19:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 469.566681  
[11:23:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:24:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 284.066681  
[11:26:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3022.06665  
[11:28:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:29:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 400.833344  
[11:30:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:31:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:33:53] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2816.800049  
[11:34:37] <apergos>	 ok the /public dirs are a red herring, they come from labstore1, and seem to be available
[11:36:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1522.599976  
[11:37:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2618.333252  
[11:38:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:39:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:41:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:44:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 920.666687  
[11:45:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:47:29] <GerardM->	 any news on the availability of labs functionality ... as it is I am about to cancel a demo
[11:47:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 217.833328  
[11:48:05] <hoo>	 GerardM-: Don't think so
[11:48:54] <GerardM->	 :(
[11:49:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:51:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1506.366699  
[11:52:40] <apergos>	 no, I am sorry but at this point I don't know why home directories aren't showing up on the projects
[11:52:49] <apergos>	 since I do indeed see them from labstore4
[11:53:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 133.566666  
[11:53:57] <hoo>	 apergos: mh... just for the sake of it... try rebooting tools-dev maybe?
[11:54:20] <apergos>	 no point in it until the home directory issue is sorted out
[11:54:41] <matanya>	 apergos: where do the reside?
[11:54:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[11:54:46] <apergos>	 I tried eg rebooting a labs instance in beta (apache32) and it doesn't see the various home dirs
[11:54:46] <hoo>	 mh... but the NFS is working?
[11:54:50] <apergos>	 well they come from labstore4
[11:55:30] <apergos>	 no, some part of it is not working and it is beyond my ability at this point to figure out what is wrong
[11:56:31] <hoo>	 yikes
[11:56:56] <apergos>	 well I don't know the lab setup particularly
[11:59:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 91.23333  
[12:00:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:00:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:03:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3406.800049  
[12:03:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 742.366638  
[12:05:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:10:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 158.766663  
[12:12:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:15:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 198.733337  
[12:18:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:22:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3288.866699  
[12:23:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:24:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:25:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2325.100098  
[12:26:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:27:28] <apergos>	 so I'm now officially giving up; exportfs shows the directories being exported, when I look in the dirs on labstore 4 being exported they have home dirs and files... I ried rebooting labs instance deployment-prep-apache32 just to see if it would find the home dirs and it failed to see them so I dunno what's wrong; labstore4 does have the right secondary ip too
[12:28:36] <scfc_de>	 Has someone pinged Coren yet?  It's about 8:00L on the east coast.
[12:29:18] <scfc_de>	 (Well, 7:30L, but maybe he's an early bird.)
[12:29:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 542.400024  
[12:31:11] <hoo>	 apergos: ok... thanks for trying then
[12:32:33] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000  
[12:33:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:35:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 79.5  
[12:36:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:36:56] <sDrewth>	 scfc_de: he is on this channel, and has been msgd, so probably not early riser, or not in channel
[12:37:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1200.599976  
[12:37:48] <sDrewth>	 thx for the attempts apergos 
[12:38:47] <apergos>	 he would have to be a very very early riser
[12:39:21] <apergos>	 going to see if I can get on the instance and figure out anything from over there ... 
[12:39:37] <apergos>	 but I do expect to give up shortly :-D
[12:39:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1615.033325  
[12:40:10] <hoo>	 heh :D Try to manually remount maybe :P
[12:40:26] <apergos>	 well I gotta first see if I can get on
[12:40:39] <apergos>	 then maybe find out what the state of things is
[12:40:41] <scfc_de>	 sDrewth: I was thinking about something out of band, à la SMS :-).
[12:40:55] <apergos>	 heh
[12:41:02] <hoo>	 I can't, I can get on a host I have root on (as the bastion already kicks me)
[12:41:09] <hoo>	 * I can't get
[12:41:26] <apergos>	 well it's a matter of not using uer keys but the root keys
[12:41:32] <apergos>	 or root auth eys
[12:41:35] <apergos>	 *keys
[12:42:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 298.399994  
[12:42:49] <hoo>	 whcih I don't have...
[12:43:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:46:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:48:33] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:49:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 106.900002  
[12:52:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[12:53:50] <apergos>	 yep.  ok I'm on, lt's see if there's anything I can gain from here
[12:54:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1513.099976  
[12:56:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2595.333252  
[12:57:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:00:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 273.899994  
[13:02:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:03:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:04:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 35.433334  
[13:05:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 69.033333  
[13:06:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:11:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1013.200012  
[13:12:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:15:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 483.0  
[13:16:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:17:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:18:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 199.233337  
[13:19:03] <scfc_de>	 apergos: Any news?  Looking at Puppet, is "labnfs.pmtpa.wmnet" a CNAME for the active NFS server?
[13:20:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:21:41] <apergos>	 the ip is already set on labstore4 for labnfs, I can showmount labnfs and see the mounts as available from the instance I"m working on
[13:21:59] <apergos>	 sorry, I'm jut haivng to slug through all this one script and conf file at a time
[13:24:13] <scfc_de>	 So we either need a 24/7 Coren or better documentation/puppetization :-).
[13:24:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 444.533325  
[13:25:03] <apergos>	 coren clones :-)
[13:26:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:26:53] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 57.433334  
[13:28:18] <Nemo_bis>	 documentator clones
[13:29:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 210.399994  
[13:29:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 284.0  
[13:29:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:30:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:34:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:37:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1111.400024  
[13:38:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 178.46666  
[13:40:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:40:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1781.233276  
[13:44:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:46:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:48:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 767.06665  
[13:51:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:54:33] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 742.93335  
[13:55:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:55:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 183.633331  
[13:56:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[13:57:12] <icinga-wm>	 PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC  
[13:58:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 212.666672  
[13:59:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:00:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 144.066666  
[14:01:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:03:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 671.93335  
[14:07:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2777.133301  
[14:07:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:11:33] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:12:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 57.400002  
[14:12:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 633.666687  
[14:13:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:15:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:15:45] <matanya>	 spam
[14:17:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 193.933334  
[14:18:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 266.933319  
[14:18:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:20:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:23:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 319.633331  
[14:24:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:25:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 373.933319  
[14:28:33] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:28:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1594.866699  
[14:28:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 70.76667  
[14:29:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:29:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:32:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 183.5  
[14:32:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 69.73333  
[14:33:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:36:33] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 255.266663  
[14:36:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:37:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:39:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 282.0  
[14:40:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:43:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 357.200012  
[14:44:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:47:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 120.800003  
[14:49:33] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:51:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 163.066666  
[14:52:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 83.5  
[14:53:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:54:33] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000  
[14:55:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[14:58:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 144.899994  
[14:59:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:04:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2051.166748  
[15:06:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:09:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 364.100006  
[15:10:12] <icinga-wm>	 PROBLEM - Host labstore4 is DOWN: PING CRITICAL - Packet loss = 100%  
[15:10:42] <icinga-wm>	 RECOVERY - Host labstore4 is UP: PING OK - Packet loss = 0%, RTA = 36.29 ms  
[15:10:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:12:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 599.666687  
[15:13:07] <ottomata>	 woo paravoid, lets seeee!
[15:13:07] <paravoid>	 heya :)
[15:13:08] <ottomata>	 :)
[15:13:10] <paravoid>	 I hope I didn't wake you up
[15:13:17] <ottomata>	 naw
[15:13:24] <ottomata>	 just up, eating breakfast, drinking coffee, playing battleship :)
[15:13:27] <paravoid>	 it's been flapping for 5 hours, but I waited for some sensible hour
[15:13:35] <ottomata>	 aw, cool, thanks!
[15:13:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:14:48] <ottomata>	 Snaps, you around?
[15:14:58] <MaxSem>	 btw hi paravoid, as tomorrow will be a holiday - shall we move WAP deprecation to Tuesday?
[15:15:31] <paravoid>	 MaxSem: oh yeah, you're right
[15:15:38] <paravoid>	 yes, let's do that, why not
[15:15:44] <MaxSem>	 :)
[15:15:49] <paravoid>	 good idea :)
[15:15:53] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 392.266663  
[15:16:39] <paravoid>	 (I'll be working, though)
[15:17:03] <paravoid>	 brb
[15:17:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 922.233337  
[15:18:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:19:01] <matanya>	 no XFS in eqiad?
[15:20:23] <ottomata>	 hm, cp3020 is the only esams bits varnish that is not having these vk errors
[15:20:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 184.800003  
[15:22:33] <ottomata>	 vk is using twice as much memory on cp3019 than on cp3020
[15:23:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:25:14] <ottomata>	 hm they aren't getting disproportionate amounts of traffic
[15:25:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:31:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 276.733337  
[15:32:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3174.333252  
[15:32:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:33:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:34:15] <ottomata>	 hu!  only probems to analytics1022!
[15:34:17] <ottomata>	 analytics1021 is fine
[15:34:33] <ottomata>	 tx_queue to analytics1022 is very full, but empty to analytics1011
[15:36:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1029.966675  
[15:37:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 26.0  
[15:38:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:40:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[15:41:52] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 501.666656  
[15:44:30] <ottomata>	 hmm, well, i mean, analytics1022 is the leader for all topics
[15:44:30] <ottomata>	 hmmm
[15:44:33] <ottomata>	 how did that happen!?
[15:44:34] <ottomata>	 hm
[15:44:44] <ottomata>	 i don't see anything in logs, at least not for the time when this started happening
[15:45:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 43.533333  
[15:47:53] <ottomata>	 !log starting kafka leader replica election to production load across both brokers evenly.  Not yet sure why analytics1022 was the leader for all toppars…
[15:48:01] <morebots>	 Logged the message, Master
[15:48:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1957.633301  
[15:48:42] <icinga-wm>	 RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 2056.45129269  
[15:58:31] <ottomata>	 !log restarted varnishkafka on cp3019
[15:58:38] <morebots>	 Logged the message, Master
[15:59:33] <ottomata>	 ok paravoid, here is what I know at the moment
[15:59:39] <ottomata>	 during both times when this has happened
[15:59:48] <ottomata>	 there has only be a single leader for all topics
[16:00:37] <ottomata>	 at around 21:54 yesterday (02-15), analytics1022 became the leader for all topics
[16:00:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[16:00:49] <ottomata>	 which is fine most of the time, except when traffic gets high
[16:01:16] <ottomata>	 but i am not yet sure why analytics1022 became the leader
[16:01:24] <ottomata>	 i'm going to add alerts for leader elections and leader counts
[16:01:32] <ottomata>	 and will keep my eye on that
[16:01:39] <ottomata>	 but at least I have something to go on now, 
[16:01:42] <ottomata>	 mystery is not yet 100% solved
[16:02:23] <ottomata>	 gonna restart vk on the remaining bits varnishes. this is just to clear out their production queues, they are a bit behind now, might as well start fresh.
[16:03:07] <ottomata>	 !log restarted  varnishkafka on esams bits varnishes
[16:03:15] <morebots>	 Logged the message, Master
[16:03:52] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[16:04:33] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000  
[16:04:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[16:05:15] <ottomata>	 it took about an hour from when the election happened and when txerrs started  happening
[16:05:15] <ottomata>	 so
[16:05:27] <ottomata>	 i'm going to let this be for now, and continue with my morning, and then check on this again in an hour
[16:05:31] <ottomata>	 be back lata
[16:05:39] <ottomata>	 text me again if you need to
[16:58:12] <icinga-wm>	 PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC  
[17:04:33] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000  
[18:03:12] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 227.800003  
[18:10:12] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4001 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 604.266663  
[18:10:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp4003 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 12.3  
[18:11:12] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4001 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[18:11:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp4003 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[18:12:12] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[18:55:42] <icinga-wm>	 PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 370.5  
[18:59:41] <grrrit-wm>	 (03PS1) 10Brion VIBBER: Work in progress: set .svg and .ico files to be compressed on bits.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/113687 
[19:19:08] <grrrit-wm>	 (03CR) 10MaxSem: "Shouldn't you also check Accept-Encoding and vary on it to avoid screwing up clients that don't support gzipping?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/113687 (owner: 10Brion VIBBER)
[19:23:42] <icinga-wm>	 RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0  
[19:32:27] <logmsgbot>	 !log aaron synchronized php-1.23wmf13/includes/filebackend/SwiftFileBackend.php  'e14a87489d9f65fec85347c8e4a7825576f15be6'
[19:32:35] <morebots>	 Logged the message, Master
[19:49:07] <hashar>	 paravoid: 
[19:59:12] <icinga-wm>	 PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC  
[20:16:59] <matanya>	 hashar: what is up with gerrit?
[20:17:12] <hashar>	 matanya: why do you ask?
[20:17:21] <matanya>	 error: RPC failed; result=56, HTTP code = 200
[20:17:21] <matanya>	 fatal: The remote end hung up unexpectedly
[20:17:21] <matanya>	 fatal: early EOF
[20:17:21] <matanya>	 fatal: index-pack failed
[20:17:40] <hashar>	 try again
[20:18:13] <hashar>	 and for mw/core I think we had some issues with it at one point
[20:18:27] <matanya>	 tried 3 times
[20:18:32] <hashar>	 on core?
[20:18:34] <matanya>	 every time same error
[20:18:38] <matanya>	 on pywikibot
[20:18:41] <hashar>	 ah
[20:18:58] <matanya>	 raised the buffer size too
[20:19:57] <hashar>	 matanya:  works for me ( ssh://gerrit.wikimedia.org:29418/pywikibot/core.git )
[20:19:59] <matanya>	 and after two more times, now it works
[20:20:07] <matanya>	 weird
[20:20:24] <hashar>	 it might had some issue
[20:21:27] <matanya>	 nvm, pushing a fix. thanks hashar 
[20:21:35] <hashar>	 :-]
[20:21:39] <hashar>	 I am off, see you tomorrow
[20:21:46] <matanya>	 night :)
[21:47:01] <grrrit-wm>	 (03PS1) 10Hoo man: Make labs' sql command work with -v and remove cruft [operations/puppet] - 10https://gerrit.wikimedia.org/r/113755 
[21:48:23] <grrrit-wm>	 (03PS2) 10Hoo man: Make labs' sql command work with -v and remove cruft [operations/puppet] - 10https://gerrit.wikimedia.org/r/113755 
[21:54:30] <grrrit-wm>	 (03PS2) 10Hoo man: Don't use a hard coded -D in the sql utility script [operations/puppet] - 10https://gerrit.wikimedia.org/r/113661 
[22:28:00] <marymark>	 ugb
[22:46:14] <grrrit-wm>	 (03PS1) 10Odder: Raise account creation throttle for a SMA session [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113757 
[22:46:40] <grrrit-wm>	 (03CR) 10Odder: "This needs to be merged & deployed ASAP." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113757 (owner: 10Odder)
[23:00:12] <icinga-wm>	 PROBLEM - Puppet freshness on dysprosium is CRITICAL: Last successful Puppet run was Fri 14 Feb 2014 07:45:00 PM UTC  
[23:01:09] <grrrit-wm>	 (03CR) 10MaxSem: [C: 04-2] "10.*.*.* is an internal IP:)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113757 (owner: 10Odder)
[23:01:24] <OuKB>	 twkozlowski, ^ :)
[23:01:31] <OuKB>	 I'll ask on the bug
[23:02:48] <grrrit-wm>	 (03CR) 10Odder: "Hah! That's why it WHOIS-ed so weird! I knew it!" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113757 (owner: 10Odder)
[23:03:07] <twkozlowski>	 I see
[23:03:13] * twkozlowski  no good with IPs
[23:06:46] <Krenair>	 twkozlowski, 10.0.0.0 - 10.255.255.255, 172.16.0.0 - 172.31.255.255, and 192.168.0.0 - 192.168.255.255 are all private IPv4s
[23:07:40] <twkozlowski>	 Thanks Krenair
[23:07:49] <twkozlowski>	 I only knew about the last bit
[23:08:14] <Krenair>	 Basically, 10.*.*.*, 172.[16-31].*.*, 192.168.*.*
[23:09:43] <Krenair>	 Also I hope they don't actually mean 2013-02-17
[23:10:31] <twkozlowski>	 Yes, tomorrow
[23:10:35] <Krenair>	 no
[23:10:39] <Krenair>	 that's last year :)
[23:10:50] <twkozlowski>	 Oh, we're in 2014
[23:10:56] <Krenair>	 :D
[23:11:40] <JohnLewis>	 This is why things like this need more planning :p
[23:11:53] <JohnLewis>	 Well ACC kinda exists for this purpose.
[23:12:14] <Krenair>	 Yes, people continue to think it's okay to do this night before the event.
[23:12:50] <JohnLewis>	 Krenair: And some just use ACC regardless for this purpose anyway.
[23:31:13] <OuKB>	 jeremyb, "known non-vandals":P
[23:38:40] <grrrit-wm>	 (03CR) 10Jeremyb: "-2 is inappropriate in this case." (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113757 (owner: 10Odder)
[23:38:48] <grrrit-wm>	 (03CR) 10Jeremyb: [C: 04-1] Raise account creation throttle for a SMA session [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113757 (owner: 10Odder)
[23:39:12] <jeremyb>	 OuKB: yes? :)
[23:39:22] <jeremyb>	 OuKB: who art thou anyway?
[23:40:25] <jeremyb>	 hi spartacus!
[23:42:38] <grrrit-wm>	 (03CR) 10Odder: [C: 04-1] Raise account creation throttle for a SMA session (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/113757 (owner: 10Odder)
[23:43:45] * jeremyb  is still getting used to odder's new nick :)
[23:45:00] <jeremyb>	 yes, i remembered, i just had to think about it for a few secs first :)
[23:45:40] <odder>	 sorry :)
[23:47:33] <hoo>	 apergos: Are files still being uploaded to nas-1?
[23:47:45] <jeremyb>	 OuKB: i wanted to set a low bar! :)
[23:58:02] <icinga-wm>	 PROBLEM - Varnish traffic logger on cp1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.  
[23:58:22] <icinga-wm>	 PROBLEM - Varnish HTTP text-backend on cp1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[23:58:31] <OuKB>	 uh-oh
[23:58:32] <icinga-wm>	 PROBLEM - Varnish HTCP daemon on cp1053 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.