Fork me on GitHub

Wikimedia IRC logs browser - #wikimedia-operations

Filter:
Start date
End date

Displaying 1398 items:

2014-04-08 00:00:35 <RoanKattouw> Sure thing
2014-04-08 00:00:42 <RoanKattouw> I think that's the SWAT all done
2014-04-08 00:00:44 <RoanKattouw> Sorry for the slowness everyone
2014-04-08 00:01:16 <bd808> RoanKattouw: If it makes my mailbox less full of debate about font faces...
2014-04-08 00:01:36 <bd808> is sure that muting those threads will continue
2014-04-08 00:02:28 <icinga-wm> PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 7.426 second response time
2014-04-08 00:08:52 <bd808> looks for a python reviewer for: https://gerrit.wikimedia.org/r/#/c/124500/
2014-04-08 00:09:10 <bd808> I think that will fix the 1.23wmf21 l10n problems
2014-04-08 00:09:30 <bd808> Because … mystery action at a distance!
2014-04-08 00:12:27 <icinga-wm> RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 219118 bytes in 8.455 second response time
2014-04-08 00:24:56 <logmsgbot> !log catrope synchronized php-1.23wmf20/extensions/VisualEditor 'it helps if you run git submodule update first'
2014-04-08 00:25:02 <morebots> Logged the message, Master
2014-04-08 00:25:05 <logmsgbot> !log catrope synchronized php-1.23wmf21/extensions/VisualEditor 'it helps if you run git submodule update first'
2014-04-08 00:25:11 <morebots> Logged the message, Master
2014-04-08 00:27:34 <grrrit-wm> ('PS1') 'BryanDavis': test2wiki to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124505'
2014-04-08 00:28:54 <bd808> RoanKattouw_away: Are you {{done}} done now? I'd like to run some more scap tests
2014-04-08 00:38:27 <grrrit-wm> ('Abandoned') 'BryanDavis': l10nupdate: Add temporary debugging captures [operations/puppet] - 'https://gerrit.wikimedia.org/r/124467' (owner: 'BryanDavis')
2014-04-08 00:38:40 <grrrit-wm> ('PS2') 'BryanDavis': test2wiki to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124505'
2014-04-08 00:39:44 <grrrit-wm> ('Abandoned') 'BryanDavis': test2wiki to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124505' (owner: 'BryanDavis')
2014-04-08 00:41:34 <grrrit-wm> ('PS1') 'BryanDavis': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124506'
2014-04-08 00:43:55 <bd808> greg-g: Are you still on a bus? I'd like to scap group0 to 1.23wmf21 to test my band aid fix. I would be on the hook to revert immediately following if ExtensionMessages looks like it will cause a problem for l10nupdate.
2014-04-08 00:44:03 <RoanKattouw_away> bd808: Yes, sorry
2014-04-08 00:44:43 <bd808> RoanKattouw_away: :) thanks. I watched your idle time on tin climb until I felt safe.
2014-04-08 00:45:28 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 00:46:57 <bd808> decides that greg-g won't have changed his mind in the last 1:30 and proceeds
2014-04-08 00:48:38 <grrrit-wm> ('CR') 'BryanDavis': [C: '2'] "Approving to test band aid fix for ExtensionMessages generation problem. Will revert if ExtensionMessages doesn't look right after scap." [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124506' (owner: 'BryanDavis')
2014-04-08 00:48:45 <grrrit-wm> ('Merged') 'jenkins-bot': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124506' (owner: 'BryanDavis')
2014-04-08 00:50:53 <logmsgbot> !log bd808 Started scap: group0 to 1.23wmf21 (testing python change for mwversionsinuse)
2014-04-08 00:50:58 <morebots> Logged the message, Master
2014-04-08 00:53:12 <bd808> sees l10n cache updating yet again for 1.23wmf21 and loses all confidence in his "fix"
2014-04-08 00:53:51 <logmsgbot> !log bd808 scap aborted: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (duration: 02m 57s)
2014-04-08 00:53:56 <morebots> Logged the message, Master
2014-04-08 00:54:30 <logmsgbot> !log bd808 Started scap: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (again)
2014-04-08 00:54:35 <morebots> Logged the message, Master
2014-04-08 00:54:56 <logmsgbot> !log bd808 scap aborted: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (again) (duration: 00m 25s)
2014-04-08 00:55:01 <morebots> Logged the message, Master
2014-04-08 00:55:12 <grrrit-wm> ('PS1') 'BryanDavis': Revert "Group0 wikis to 1.23wmf21" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124507'
2014-04-08 00:55:34 <grrrit-wm> ('CR') 'BryanDavis': [C: '2'] Revert "Group0 wikis to 1.23wmf21" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124507' (owner: 'BryanDavis')
2014-04-08 00:55:42 <grrrit-wm> ('Merged') 'jenkins-bot': Revert "Group0 wikis to 1.23wmf21" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124507' (owner: 'BryanDavis')
2014-04-08 00:56:51 <logmsgbot> !log bd808 Started scap: revert group0 to 1.23wmf21 (testwiki still on 1.23wmf21)
2014-04-08 00:56:55 <morebots> Logged the message, Master
2014-04-08 01:01:33 <grrrit-wm> ('PS3') 'Ori.livneh': Add EventLogging Kafka writer plug-in [operations/puppet] - 'https://gerrit.wikimedia.org/r/85337'
2014-04-08 01:06:45 <logmsgbot> !log bd808 Finished scap: revert group0 to 1.23wmf21 (testwiki still on 1.23wmf21) (duration: 09m 54s)
2014-04-08 01:06:53 <morebots> Logged the message, Master
2014-04-08 01:22:25 <StevenW> ori: working now
2014-04-08 01:22:29 <StevenW> \o/
2014-04-08 02:07:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 02:07:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 02:07:08 <icinga-wm> PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 02:07:08 <icinga-wm> PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 02:15:58 <logmsgbot> !log LocalisationUpdate completed (1.23wmf20) at 2014-04-08 02:15:58+00:00
2014-04-08 02:16:06 <morebots> Logged the message, Master
2014-04-08 02:34:57 <logmsgbot> !log LocalisationUpdate completed (1.23wmf21) at 2014-04-08 02:34:56+00:00
2014-04-08 02:35:02 <morebots> Logged the message, Master
2014-04-08 02:45:57 <icinga-wm> PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 02:48:37 <icinga-wm> PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 02:48:57 <icinga-wm> RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds
2014-04-08 02:49:06 <ori> springle_: db1047 has been very sad lately
2014-04-08 02:49:27 <icinga-wm> RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds
2014-04-08 03:00:17 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 03:08:06 <bawolff> With 1.23wmf21 not getting deployed to mediawiki.org last thursday, does that mean the deployment schedule for 1.23wmf22 will be off by a week?
2014-04-08 03:11:07 <logmsgbot> !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Apr 8 03:11:04 UTC 2014 (duration 11m 3s)
2014-04-08 03:11:11 <morebots> Logged the message, Master
2014-04-08 03:31:47 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 03:38:12 <aude> greg-g: still around?
2014-04-08 03:53:36 <aude> greg-g: check your mail
2014-04-08 04:03:35 <TimStarling> !log upgrading libssl on ssl1001,ssl1002,ssl1003,ssl1004,ssl1005,ssl1006,ssl1007,ssl1008,ssl1009,ssl3001.esams.wikimedia.org,ssl3002.esams.wikimedia.org,ssl3003.esams.wikimedia.org
2014-04-08 04:03:41 <morebots> Logged the message, Master
2014-04-08 04:03:57 <Jasper_Deng> TimStarling: is this the heartbleed.com thing?
2014-04-08 04:04:07 <Jasper_Deng> didn't know we used openssl
2014-04-08 04:15:22 <TimStarling> Jasper_Deng: yes
2014-04-08 04:15:47 <TimStarling> !log also upgraded libssl on cp4001-4019. Restarted nginx on these servers and also the previous list.
2014-04-08 04:15:51 <morebots> Logged the message, Master
2014-04-08 04:37:40 <Ryan_Lane> !log upgrading libssl on virt1000
2014-04-08 04:37:44 <morebots> Logged the message, Master
2014-04-08 04:38:21 <Ryan_Lane> !log upgrading libssl on virt0
2014-04-08 04:38:26 <morebots> Logged the message, Master
2014-04-08 04:41:03 <TimStarling> !log upgraded libssl on zirconium.wikimedia.org,neon.wikimedia.org,netmon1001.wikimedia.org,iodine.wikimedia.org,ytterbium.wikimedia.org,gerrit.wikimedia.org,virt1000.wikimedia.org,labs-ns1.wikimedia.org,stat1001.wikimedia.org
2014-04-08 04:43:13 <TimStarling> !log restarted apache on the above list, failed on labs-ns1, virt1000, ytterbium
2014-04-08 04:43:18 <morebots> Logged the message, Master
2014-04-08 04:43:47 <^d> TimStarling: I'll poke ytterbium
2014-04-08 04:44:00 <^d> Keep moving on to other boxes if you need.
2014-04-08 04:44:35 <^d> Seems up now.
2014-04-08 04:45:04 <TimStarling> yeah, labs-ns1 and virt1000 are actually the same server
2014-04-08 04:45:19 <TimStarling> and apache is running there with stime after the upgrade
2014-04-08 04:46:30 <TimStarling> !log on dataset1001: upgraded libssl and restarted lighttpd
2014-04-08 04:46:34 <morebots> Logged the message, Master
2014-04-08 04:53:47 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 05:08:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 05:08:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 05:08:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 05:08:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 05:25:10 <grrrit-wm> ('PS1') 'Aude': Enable Wikibase on Wikiquote [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516'
2014-04-08 05:26:24 <grrrit-wm> ('CR') 'Aude': [C: '-2'] "requires sites and site_identifiers tables to be added and populated on wikiquote" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
2014-04-08 05:31:00 <_joe_> !log upgraded openssl on cp10* and cp30* servers as well
2014-04-08 05:31:06 <morebots> Logged the message, Master
2014-04-08 05:39:29 <apergos> !log restarted apache on fenari magnesium yterrbium antimony
2014-04-08 05:39:33 <morebots> Logged the message, Master
2014-04-08 05:39:51 <apergos> with some mispellings but people will get the point
2014-04-08 05:47:01 <apergos> !log shot many old apache processes running as stats user from 2013, on stat1001 (restarting apache runs it as www-data user)
2014-04-08 05:47:06 <morebots> Logged the message, Master
2014-04-08 06:34:37 <grrrit-wm> ('PS3') 'Matanya': dataset: fix module path [operations/puppet] - 'https://gerrit.wikimedia.org/r/119212'
2014-04-08 06:37:44 <grrrit-wm> ('PS3') 'Matanya': exim: fix scoping [operations/puppet] - 'https://gerrit.wikimedia.org/r/119496'
2014-04-08 06:43:48 <matanya> springle: did you hear from otto regarding https://gerrit.wikimedia.org/r/#/c/122406/ ?
2014-04-08 06:45:27 <springle> matanya: no
2014-04-08 06:45:41 <matanya> :/ i need to chase him down, thanks
2014-04-08 06:46:04 <springle> not sure otto knows about it? i emailed analytics lists directly
2014-04-08 06:46:29 <springle> so far the answer is: probably fine to decom db67, but lets wait for enveryone to chime in
2014-04-08 06:46:43 <springle> i'll bump it this week
2014-04-08 06:47:05 <matanya> thank you
2014-04-08 07:30:44 <grrrit-wm> ('PS1') 'Faidon Liambotis': base: add debian-goodies [operations/puppet] - 'https://gerrit.wikimedia.org/r/124524'
2014-04-08 07:47:07 <_joe|away> !log restarted nginx on cp1044 and cp1043
2014-04-08 07:47:12 <morebots> Logged the message, Master
2014-04-08 07:53:07 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 07:53:07 <grrrit-wm> ('CR') 'coren': [C: '2'] base: add debian-goodies [operations/puppet] - 'https://gerrit.wikimedia.org/r/124524' (owner: 'Faidon Liambotis')
2014-04-08 08:02:57 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 08:09:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 08:09:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 08:09:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 08:09:07 <icinga-wm> PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 08:11:47 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 08:15:17 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 08:36:30 <siebrand> ori: still working?
2014-04-08 09:03:47 <icinga-wm> PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:04:07 <YuviPanda> hashar: help with setting up zuul for the apps? https://gerrit.wikimedia.org/r/#/c/124539/
2014-04-08 09:08:37 <icinga-wm> PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:08:47 <icinga-wm> RECOVERY - RAID on labstore3 is OK: OK: optimal, 12 logical, 12 physical
2014-04-08 09:08:57 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 09:11:47 <icinga-wm> PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:16:55 <grrrit-wm> ('PS1') 'RobH': Replacing the unified certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124542'
2014-04-08 09:24:34 <grrrit-wm> ('CR') 'RobH': [C: '2'] Replacing the unified certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124542' (owner: 'RobH')
2014-04-08 09:29:47 <icinga-wm> RECOVERY - RAID on labstore3 is OK: OK: optimal, 12 logical, 12 physical
2014-04-08 09:33:47 <icinga-wm> PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:36:37 <icinga-wm> RECOVERY - RAID on labstore3 is OK: OK: optimal, 12 logical, 12 physical
2014-04-08 09:37:37 <icinga-wm> RECOVERY - Disk space on labstore3 is OK: DISK OK
2014-04-08 09:39:19 <hashar> YuviPanda: hello
2014-04-08 09:39:25 <YuviPanda> hashar: hello!
2014-04-08 09:40:00 <icinga-wm> PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:40:37 <icinga-wm> PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:40:57 <grrrit-wm> ('PS1') 'Andrew Bogott': Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560'
2014-04-08 09:44:12 <hashar> and we lost YuviPanda
2014-04-08 09:45:10 <sjoerddebruin> Noooo not our panda. :(
2014-04-08 09:46:25 <Steinsplitter> panda \O/
2014-04-08 09:46:28 <icinga-wm> PROBLEM - SSH on labstore3 is CRITICAL: Connection refused
2014-04-08 09:46:28 <icinga-wm> PROBLEM - DPKG on labstore3 is CRITICAL: Connection refused by host
2014-04-08 09:46:47 <icinga-wm> PROBLEM - puppet disabled on labstore3 is CRITICAL: Connection refused by host
2014-04-08 09:47:00 <andrewbogott> mutante: https://gerrit.wikimedia.org/r/#/c/124560/
2014-04-08 09:47:43 <icinga-wm> ACKNOWLEDGEMENT - DPKG on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:47:44 <icinga-wm> ACKNOWLEDGEMENT - Disk space on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:47:44 <icinga-wm> ACKNOWLEDGEMENT - RAID on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:47:44 <icinga-wm> ACKNOWLEDGEMENT - SSH on labstore3 is CRITICAL: Connection refused daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:47:44 <icinga-wm> ACKNOWLEDGEMENT - puppet disabled on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:49:57 <matanya> so nice to see all ops in an europian time zone :)
2014-04-08 09:50:37 <icinga-wm> PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100%
2014-04-08 09:57:12 <grrrit-wm> ('CR') 'Dzahn': [C: '-1'] Add eth1 checks to nova compute hosts. ('3' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560' (owner: 'Andrew Bogott')
2014-04-08 10:00:49 <springle> ori: what is udpprofile::collector, and can i move it from db1014 to... somewhere else?
2014-04-08 10:02:47 <ori> springle: oh, wow. is there any indication that continues to see activity? mediawiki's profiler class can be configured to write to a database, but i didn't know anyone was using it in production. is it not ancient?
2014-04-08 10:04:56 <andrewbogott> mutante, cmjohnson: https://wikitech.wikimedia.org/wiki/Help:Git_rebase#Don.27t_panic
2014-04-08 10:05:21 <thedj> andrewbogott: 42
2014-04-08 10:05:57 <ori> springle: it can go away
2014-04-08 10:06:34 <ori> springle: it was added in this commit: <https://gerrit.wikimedia.org/r/#/c/83953/>;. the message reads: "testing graphite 0.910 on db1014".
2014-04-08 10:07:04 <springle> yeah, asher stole db1014 for graphite
2014-04-08 10:07:12 <springle> trying to steal it back :)
2014-04-08 10:07:20 <springle> ori: thanks
2014-04-08 10:07:46 <ori> springle: it's not in any way implicated in our current graphite setup, which exists solely on tungsten.eqiad.wmnet (and labs)
2014-04-08 10:08:13 <grrrit-wm> ('PS2') 'Andrew Bogott': Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560'
2014-04-08 10:08:18 <andrewbogott> mutante: ^
2014-04-08 10:09:24 <grrrit-wm> ('PS1') 'Cmjohnson': adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572'
2014-04-08 10:11:07 <grrrit-wm> ('CR') 'jenkins-bot': [V: '-1'] adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
2014-04-08 10:12:49 <grrrit-wm> ('CR') 'Dzahn': [C: ''] Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560' (owner: 'Andrew Bogott')
2014-04-08 10:15:34 <Jeff_Green> !log update & reboot samarium
2014-04-08 10:15:38 <morebots> Logged the message, Master
2014-04-08 10:15:48 <grrrit-wm> ('CR') 'Andrew Bogott': [C: '2'] Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560' (owner: 'Andrew Bogott')
2014-04-08 10:16:26 <grrrit-wm> ('PS1') 'Springle': Remove unused db1014 block. db1014 was renamed tungsten rt5871. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124575'
2014-04-08 10:18:19 <grrrit-wm> ('CR') 'Springle': [C: '2'] Remove unused db1014 block. db1014 was renamed tungsten rt5871. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124575' (owner: 'Springle')
2014-04-08 10:21:04 <Jeff_Green> !log update & reboot barium
2014-04-08 10:21:09 <morebots> Logged the message, Master
2014-04-08 10:23:09 <grrrit-wm> ('PS1') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
2014-04-08 10:24:10 <grrrit-wm> ('CR') 'jenkins-bot': [V: '-1'] add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
2014-04-08 11:09:28 <icinga-wm> PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 11:09:28 <icinga-wm> PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 11:09:28 <icinga-wm> PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 11:09:28 <icinga-wm> PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 11:32:05 <grrrit-wm> ('PS20') 'Matanya': etherpad: convert into a module [operations/puppet] - 'https://gerrit.wikimedia.org/r/107567'
2014-04-08 11:32:32 <matanya> akosiaris: in a meeting or this ^ can be handled ?
2014-04-08 11:39:18 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 12:32:58 <grrrit-wm> ('PS2') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
2014-04-08 12:39:13 <akosiaris> matanya: in ops meeting
2014-04-08 12:39:19 <matanya> sorry
2014-04-08 12:39:27 <akosiaris> and please tell me you did not resubmit from your local repo
2014-04-08 12:39:48 <akosiaris> rebase* sorry
2014-04-08 12:39:50 <grrrit-wm> ('PS2') 'Cmjohnson': adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572'
2014-04-08 12:40:26 <grrrit-wm> ('CR') 'Andrew Bogott': [V: ''] "This looks good -- we'll see if it makes new alarms go off :)" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
2014-04-08 12:46:38 <grrrit-wm> ('PS3') 'Cmjohnson': adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572'
2014-04-08 12:48:28 <icinga-wm> PROBLEM - DPKG on strontium is CRITICAL: DPKG CRITICAL dpkg reports broken packages
2014-04-08 12:49:28 <icinga-wm> RECOVERY - DPKG on strontium is OK: All packages OK
2014-04-08 12:49:35 <grrrit-wm> ('CR') 'Matanya': [C: ''] add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
2014-04-08 12:50:21 <cmjohnson1> paravoid: can you review please https://gerrit.wikimedia.org/r/124572
2014-04-08 12:50:38 <andrewbogott> mutante: https://rt.wikimedia.org/Ticket/Display.html?id=5064
2014-04-08 12:51:29 <grrrit-wm> ('CR') 'Dzahn': [C: ''] "yep, if we want to monitor this on everything, then standard-packages sounds good to me" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
2014-04-08 12:52:38 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 12:53:10 <grrrit-wm> ('CR') 'Alexandros Kosiaris': [C: '2'] adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
2014-04-08 12:55:34 <manybubbles> can anyone around update Elasticsearch in apt?
2014-04-08 12:55:55 <manybubbles> and ack nagios errors (so they don't spam to irc) for a couple horus?
2014-04-08 12:56:39 <logmsgbot> !log reedy updated /a/common to {{Gerrit|Id15ddc665}}: Revert "Group0 wikis to 1.23wmf21"
2014-04-08 12:56:44 <morebots> Logged the message, Master
2014-04-08 12:57:23 <grrrit-wm> ('PS1') 'Reedy': Non wikipedias to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124591'
2014-04-08 12:59:03 <Reedy> pokes qchris_away and ^d
2014-04-08 13:01:42 <Reedy> Any idea why https://gerrit.wikimedia.org/changes/?q=status:merged+age%3A0d&o=DETAILED_ACCOUNTS&n=100 doesn't work?
2014-04-08 13:02:00 <grrrit-wm> ('CR') 'Cmjohnson': [C: '2'] adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
2014-04-08 13:03:24 <Reedy> versus
2014-04-08 13:03:24 <Reedy> http://review.cyanogenmod.org/changes/?q=status:open+age%3A0d&o=DETAILED_ACCOUNTS&n=100
2014-04-08 13:07:41 <grrrit-wm> ('PS3') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
2014-04-08 13:12:48 <grrrit-wm> ('PS4') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
2014-04-08 13:15:18 <apergos> test
2014-04-08 13:15:42 <apergos> test akosiaris
2014-04-08 13:15:43 <akosiaris> apergos: :-)
2014-04-08 13:15:51 <apergos> manybubbles:
2014-04-08 13:16:54 <mutante> already pinged
2014-04-08 13:17:06 <grrrit-wm> ('PS1') 'coren': Tool Labs: forcibly upgrade libssl [operations/puppet] - 'https://gerrit.wikimedia.org/r/124594'
2014-04-08 13:19:25 <grrrit-wm> ('CR') 'Dzahn': [C: '2'] "RT #80 :)" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
2014-04-08 13:21:58 <_joe_> ori: If you're here, please let me know :)
2014-04-08 13:26:57 <Reedy> _joe_: Couple of hours from now
2014-04-08 13:27:05 <Reedy> Though, he is around early sometimes
2014-04-08 13:27:31 <_joe_> Reedy: thanks
2014-04-08 13:30:38 <grrrit-wm> ('CR') 'RobH': [C: ''] Tool Labs: forcibly upgrade libssl [operations/puppet] - 'https://gerrit.wikimedia.org/r/124594' (owner: 'coren')
2014-04-08 13:31:20 <manybubbles> ottomata: welcome!
2014-04-08 13:31:34 <manybubbles> can you help me get started today?
2014-04-08 13:31:42 <grrrit-wm> ('CR') 'coren': [C: '2'] Tool Labs: forcibly upgrade libssl [operations/puppet] - 'https://gerrit.wikimedia.org/r/124594' (owner: 'coren')
2014-04-08 13:31:50 <Reedy> manybubbles: We have an extension for that
2014-04-08 13:31:51 <Reedy> grins
2014-04-08 13:31:57 <manybubbles> Reedy: thanks!
2014-04-08 13:32:01 <manybubbles> I totally used it a while ago
2014-04-08 13:32:27 <qchris_away> Reedy: Because we're using /r/ to mark the reverse proxy ...
2014-04-08 13:32:33 <qchris_away> Reedy: https://gerrit.wikimedia.org/r/changes/?q=status:merged+age%3A0d&o=DETAILED_ACCOUNTS&n=100
2014-04-08 13:32:37 <qchris_away> Reedy: ^ should work
2014-04-08 13:32:47 <Reedy> Aha, sweet!
2014-04-08 13:33:43 <grrrit-wm> ('PS1') 'RobH': replace blog.wikimedia.org certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124595'
2014-04-08 13:35:07 <manybubbles> ottomata: I need Elasticsearch 1.1.0 shoved into apt
2014-04-08 13:35:37 <grrrit-wm> ('PS2') 'RobH': replace blog.wikimedia.org certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124595'
2014-04-08 13:36:15 <Reedy> qchris: thanks
2014-04-08 13:36:22 <qchris> yw
2014-04-08 13:37:04 <icinga-wm> PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2014-04-08 13:37:33 <mutante> !log restarting gitblit
2014-04-08 13:37:33 <grrrit-wm> ('CR') 'RobH': [C: '2'] replace blog.wikimedia.org certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124595' (owner: 'RobH')
2014-04-08 13:37:37 <morebots> Logged the message, Master
2014-04-08 13:39:00 <RobH> !log replacing the blog cert, if holmium crashes I didn't do it correctly.
2014-04-08 13:39:01 <grrrit-wm> ('PS1') 'Faidon Liambotis': Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597'
2014-04-08 13:39:03 <ottomata> manybubbles: ok!
2014-04-08 13:39:03 <morebots> Logged the message, RobH
2014-04-08 13:39:04 <icinga-wm> RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 305803 bytes in 9.337 second response time
2014-04-08 13:39:08 <manybubbles> thanks!
2014-04-08 13:39:28 <Jeff_Green> !log update & reboot tellurium
2014-04-08 13:39:33 <morebots> Logged the message, Master
2014-04-08 13:39:47 <grrrit-wm> ('CR') 'jenkins-bot': [V: '-1'] Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597' (owner: 'Faidon Liambotis')
2014-04-08 13:41:14 <icinga-wm> PROBLEM - Host tellurium is DOWN: PING CRITICAL - Packet loss = 100%
2014-04-08 13:42:38 <grrrit-wm> ('PS2') 'Faidon Liambotis': Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597'
2014-04-08 13:43:27 <grrrit-wm> ('CR') 'Faidon Liambotis': [C: '2' V: '2'] Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597' (owner: 'Faidon Liambotis')
2014-04-08 13:44:28 <grrrit-wm> ('CR') 'Manybubbles': "Is there a better place to run this?" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597' (owner: 'Faidon Liambotis')
2014-04-08 13:45:14 <icinga-wm> RECOVERY - Host tellurium is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms
2014-04-08 13:46:13 <RobH> !log upgraded libssl on holmium
2014-04-08 13:46:18 <morebots> Logged the message, RobH
2014-04-08 13:48:49 <paravoid> ottomata: kafka upgrade doesn't work on an1004
2014-04-08 13:49:41 <ottomata> paravoid, analytics1004 (and analytics1003) were kafka test brokers, and were never productionized or puppetized
2014-04-08 13:49:50 <ottomata> i thought I had removed kafka from analytics1004, actually
2014-04-08 13:50:38 <manybubbles> ottomata: can you install git fat on tin?
2014-04-08 13:50:42 <manybubbles> I cannot
2014-04-08 13:50:46 <ottomata> hm, sure, why do you need git-fat there?
2014-04-08 13:50:55 <manybubbles> to git deploy
2014-04-08 13:50:58 <manybubbles> to Elasticsearch
2014-04-08 13:51:07 <manybubbles> the plugins
2014-04-08 13:51:14 <manybubbles> or is there another server
2014-04-08 13:51:17 <ottomata> you don't need git-fat on tin though
2014-04-08 13:51:23 <ottomata> the git-fat commands are run on deplo hsots
2014-04-08 13:51:27 <ottomata> on the targets
2014-04-08 13:51:46 <manybubbles> huh, I'm used to running it on the server to check the jars got there. I'll just do it without and see
2014-04-08 13:53:21 <manybubbles> ottomata: that worked as you said it would
2014-04-08 13:53:35 <manybubbles> !log synced first Elasticsearch plugin to production Elasticsearch servers
2014-04-08 13:53:39 <morebots> Logged the message, Master
2014-04-08 13:54:01 <manybubbles> !log they'll pick it up during the rolling restart today to upgrade to 1.1.0
2014-04-08 13:54:05 <morebots> Logged the message, Master
2014-04-08 13:54:08 <ottomata> cool
2014-04-08 13:54:18 <ottomata> manybubbles: , i was going to start reinstalling an elastic search server today
2014-04-08 13:54:33 <manybubbles> ottomata: not a _great_ day for it
2014-04-08 13:54:37 <manybubbles> because I'm upgrading to 1.1.0
2014-04-08 13:54:43 <ottomata> ok
2014-04-08 13:54:45 <manybubbles> that is on the deployment calendar and everything
2014-04-08 13:55:05 <manybubbles> maybe tomorrow?
2014-04-08 13:57:09 <ottomata> sure
2014-04-08 14:04:07 <manybubbles> ottomata: please ping me when you get a chance to update apt
2014-04-08 14:04:35 <ottomata> i was about to to do it, but am in standup now
2014-04-08 14:04:36 <ottomata> um
2014-04-08 14:04:41 <ottomata> q for akosiaris, if you are around
2014-04-08 14:04:54 <ottomata> I should change VerifyRelease, right?
2014-04-08 14:04:54 <icinga-wm> PROBLEM - DPKG on labstore4 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
2014-04-08 14:04:59 <ottomata> i'm trying to find the right thing to change it to
2014-04-08 14:05:14 <ottomata> i downloaded 1.1's Release.gpg and am doing what the reprepro man page says to do
2014-04-08 14:05:17 <ottomata> but am not sure
2014-04-08 14:05:23 <ottomata> the output doesn't look like what you have
2014-04-08 14:05:54 <icinga-wm> RECOVERY - DPKG on labstore4 is OK: All packages OK
2014-04-08 14:09:44 <icinga-wm> PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 14:09:44 <icinga-wm> PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 14:09:44 <icinga-wm> PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 14:09:44 <icinga-wm> PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 14:11:17 <grrrit-wm> ('PS1') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:18:13 <grrrit-wm> ('PS2') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:19:21 <grrrit-wm> ('PS1') 'Ottomata': reprepro/updates - upgrading elasticsearch to 1.1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124603'
2014-04-08 14:20:08 <grrrit-wm> ('CR') 'Ottomata': [C: '2' V: '2'] reprepro/updates - upgrading elasticsearch to 1.1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124603' (owner: 'Ottomata')
2014-04-08 14:23:54 <icinga-wm> PROBLEM - HTTPS on ssl1002 is CRITICAL: Connection refused
2014-04-08 14:24:06 <ottomata> manybubbles: http://apt.wikimedia.org/wikimedia/pool/main/e/elasticsearch/
2014-04-08 14:24:09 <ottomata> look ok?
2014-04-08 14:28:54 <icinga-wm> RECOVERY - HTTPS on ssl1002 is OK: OK - Certificate will expire on 01/20/2016 12:00.
2014-04-08 14:29:45 <manybubbles> ottomata: looks good - let me try elastic1001
2014-04-08 14:30:35 <grrrit-wm> ('PS3') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:30:57 <andrewbogott> mutante, ^ pls?
2014-04-08 14:31:37 <manybubbles> !log upgrading elastic1001
2014-04-08 14:31:42 <morebots> Logged the message, Master
2014-04-08 14:32:38 <manybubbles> !log woops, just restarted elastic1002. silly me
2014-04-08 14:32:42 <morebots> Logged the message, Master
2014-04-08 14:32:46 <manybubbles> !log no harm done, just lost time
2014-04-08 14:32:50 <morebots> Logged the message, Master
2014-04-08 14:33:53 <manybubbles> ottomata: can you make nagios not bother us about Elasticsearch warning over the next few hours?
2014-04-08 14:33:56 <manybubbles> I'm paying attention
2014-04-08 14:34:25 <ottomata> uh hm
2014-04-08 14:35:43 <ottomata> i think so, how long manybubbles
2014-04-08 14:35:45 <ottomata> 4 hours?
2014-04-08 14:35:48 <manybubbles> sure!
2014-04-08 14:36:14 <icinga-wm> PROBLEM - NTP peers on linne is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
2014-04-08 14:38:14 <icinga-wm> RECOVERY - NTP peers on linne is OK: NTP OK: Offset 0.016747 secs
2014-04-08 14:44:43 <mutante> andrewbogott: https://gerrit.wikimedia.org/r/#/c/77332/7/modules/base/manifests/monitoring/host.pp
2014-04-08 14:44:51 <grrrit-wm> ('PS4') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:54:18 <grrrit-wm> ('PS5') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:54:59 <grrrit-wm> ('PS3') 'Cmjohnson': add interface speed check for all hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124606'
2014-04-08 15:01:42 <cmjohnson> mutante: can you review https://gerrit.wikimedia.org/r/124606
2014-04-08 15:02:06 <grrrit-wm> ('CR') 'Alexandros Kosiaris': [C: '-1'] "Great idea. Minor stuff here and there like making it parameterizable but looks nice." ('6' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124606' (owner: 'Cmjohnson')
2014-04-08 15:03:10 <ottomata> manybubbles: i think I just scheduled downtime in icinga for elastic search for the next ~4 hours
2014-04-08 15:03:19 <ottomata> never done that before, so not sure what it will do
2014-04-08 15:03:47 <grrrit-wm> ('PS1') 'Rush': module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608'
2014-04-08 15:04:54 <manybubbles> ottomata: its cool!
2014-04-08 15:04:56 <manybubbles> thanks
2014-04-08 15:07:45 <grrrit-wm> ('CR') 'Ottomata': module to manage new python-diamond package ('5' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608' (owner: 'Rush')
2014-04-08 15:08:18 <grrrit-wm> ('CR') 'Dzahn': [C: ''] Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601' (owner: 'Andrew Bogott')
2014-04-08 15:12:34 <grrrit-wm> ('PS2') 'Rush': module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608'
2014-04-08 15:13:35 <grrrit-wm> ('CR') 'jenkins-bot': [V: '-1'] module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608' (owner: 'Rush')
2014-04-08 15:15:36 <grrrit-wm> ('PS3') 'Rush': module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608'
2014-04-08 15:16:34 <icinga-wm> PROBLEM - Host virt1000 is DOWN: CRITICAL - Host Unreachable (208.80.154.18)
2014-04-08 15:16:42 <RobH> !log all ssl servers in eqiad have been updated with new cert and restarted
2014-04-08 15:16:51 <RobH> !log rolling updates on ssl3001-3003 presently
2014-04-08 15:17:10 <grrrit-wm> ('PS1') 'Dzahn': enable base monitoring for ALL hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124609'
2014-04-08 15:17:24 <icinga-wm> PROBLEM - Host labs-ns1.wikimedia.org is DOWN: CRITICAL - Host Unreachable (208.80.154.19)
2014-04-08 15:18:04 <icinga-wm> RECOVERY - Host virt1000 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms
2014-04-08 15:19:03 <grrrit-wm> ('CR') 'Andrew Bogott': [C: '2'] Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601' (owner: 'Andrew Bogott')
2014-04-08 15:19:04 <icinga-wm> RECOVERY - Host labs-ns1.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.98 ms
2014-04-08 15:19:07 <mutante> apergos: https://gerrit.wikimedia.org/r/#/c/124609/1
2014-04-08 15:19:46 <mutante> ugly, eh.. since i have to change all those lines because of indentation :p
2014-04-08 15:22:25 <grrrit-wm> ('CR') 'ArielGlenn': [C: ''] enable base monitoring for ALL hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124609' (owner: 'Dzahn')
2014-04-08 15:22:39 <grrrit-wm> ('CR') 'Dzahn': [C: '2'] enable base monitoring for ALL hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124609' (owner: 'Dzahn')
2014-04-08 15:23:46 <grrrit-wm> ('CR') 'Ottomata': module to manage new python-diamond package ('2' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608' (owner: 'Rush')
2014-04-08 15:27:31 <icinga-wm> PROBLEM - HTTPS on cp4009 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:41 <icinga-wm> PROBLEM - HTTPS on ssl3003 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:41 <icinga-wm> PROBLEM - HTTPS on ssl1006 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:41 <icinga-wm> PROBLEM - HTTPS on cp4014 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51 <icinga-wm> PROBLEM - HTTPS on ssl1004 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51 <icinga-wm> PROBLEM - HTTPS on ssl1005 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51 <icinga-wm> PROBLEM - HTTPS on cp4008 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51 <icinga-wm> PROBLEM - HTTPS on cp4004 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51 <icinga-wm> PROBLEM - HTTPS on cp4015 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:52 <icinga-wm> PROBLEM - HTTPS on cp4001 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:52 <icinga-wm> PROBLEM - HTTPS on cp4017 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:53 <icinga-wm> PROBLEM - HTTPS on amssq47 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:53 <icinga-wm> PROBLEM - HTTPS on ssl1002 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:54 <icinga-wm> PROBLEM - HTTPS on ssl1001 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:54 <icinga-wm> PROBLEM - HTTPS on cp4005 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:55 <icinga-wm> PROBLEM - HTTPS on cp4012 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:01 <icinga-wm> PROBLEM - HTTPS on cp4016 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:01 <icinga-wm> PROBLEM - HTTPS on sodium is CRITICAL: SSL_CERT CRITICAL lists.wikimedia.org: invalid CN (lists.wikimedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11 <icinga-wm> PROBLEM - HTTPS on ssl1007 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11 <icinga-wm> PROBLEM - HTTPS on iodine is CRITICAL: SSL_CERT CRITICAL ticket.wikimedia.org: invalid CN (ticket.wikimedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11 <icinga-wm> PROBLEM - HTTPS on ssl3002 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11 <icinga-wm> PROBLEM - HTTPS on ssl3001 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11 <icinga-wm> PROBLEM - HTTPS on cp4018 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:12 <icinga-wm> PROBLEM - HTTPS on ssl1008 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:12 <icinga-wm> PROBLEM - HTTPS on ssl1009 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:13 <icinga-wm> PROBLEM - HTTPS on ssl1003 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:13 <icinga-wm> PROBLEM - HTTPS on cp4013 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:14 <icinga-wm> PROBLEM - HTTPS on cp4003 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:14 <icinga-wm> PROBLEM - HTTPS on cp4007 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:15 <icinga-wm> PROBLEM - HTTPS on cp4011 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:15 <icinga-wm> PROBLEM - HTTPS on cp4010 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:21 <icinga-wm> PROBLEM - HTTPS on cp4020 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:21 <icinga-wm> PROBLEM - HTTPS on cp4006 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:31 <icinga-wm> PROBLEM - HTTPS on cp4002 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:31 <icinga-wm> PROBLEM - HTTPS on cp4019 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:30:02 <greg-g> holy fun :)
2014-04-08 15:30:37 <aude> :o
2014-04-08 15:32:08 <greg-g> aude: getting to your email :)
2014-04-08 15:32:13 <aude> ok
2014-04-08 15:32:25 <aude> want to see if it's ok to do today
2014-04-08 15:32:35 <aude> anytime works for us, i suppose
2014-04-08 15:34:45 <greg-g> aude: tl;dr of email: yep, looks good
2014-04-08 15:34:50 <aude> ok
2014-04-08 15:35:07 <aude> we were smart to put i18n stuff a while ago :)
2014-04-08 15:35:42 <icinga-wm> PROBLEM - RAID on holmium is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
2014-04-08 15:35:52 <icinga-wm> PROBLEM - DPKG on fenari is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:36:01 <andrewbogott> the https failures are me muching with monitoring, nothing to worry about
2014-04-08 15:36:02 <icinga-wm> PROBLEM - Disk space on fenari is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:36:12 <icinga-wm> PROBLEM - RAID on fenari is CRITICAL: NRPE: Command check_raid not defined
2014-04-08 15:36:22 <icinga-wm> PROBLEM - puppet disabled on fenari is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:36:57 <hashar> mutante: fenari is not happy :-D
2014-04-08 15:38:21 <mutante> hashar: thanks, that's cause we just added more monitoring
2014-04-08 15:38:33 <mutante> RT #80 :)
2014-04-08 15:38:48 <hashar> mutante: yeah I noticed your puppet change. Guess fenari is missing some bits
2014-04-08 15:41:12 <mutante> hashar: wasn't running nagios-nrpe-server
2014-04-08 15:41:52 <mutante> greg-g: re: SSL certs, andrewbogott is on that one
2014-04-08 15:41:57 <mutante> ops monitoring sprint over here
2014-04-08 15:42:11 <greg-g> mutante: ahh, good to know who's on point for that, thanks
2014-04-08 15:42:23 <greg-g> wasn't sure if it'd be a opsen party thing or not
2014-04-08 15:42:44 <mutante> it is. ops in Athens
2014-04-08 15:43:05 <mutante> that check is new, in that it checks for validity of cert, not just expiry
2014-04-08 15:43:18 <mutante> and wikimedia vs. wikipedia thing
2014-04-08 15:43:30 <greg-g> nods
2014-04-08 15:44:52 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 438.266663
2014-04-08 15:45:02 <grrrit-wm> ('PS1') 'Andrew Bogott': When checking unified certs, check for *.wikipedia.org [operations/puppet] - 'https://gerrit.wikimedia.org/r/124616'
2014-04-08 15:45:32 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 434.533325
2014-04-08 15:46:21 <grrrit-wm> ('CR') 'Andrew Bogott': [C: '2'] When checking unified certs, check for *.wikipedia.org [operations/puppet] - 'https://gerrit.wikimedia.org/r/124616' (owner: 'Andrew Bogott')
2014-04-08 15:46:22 <icinga-wm> PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 12:45:20 PM UTC
2014-04-08 15:53:10 <icinga-wm> RECOVERY - RAID on fenari is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0
2014-04-08 15:53:17 <mutante> hashar: ^ :)
2014-04-08 15:53:20 <icinga-wm> RECOVERY - puppet disabled on fenari is OK: OK
2014-04-08 15:53:26 <hashar> nice
2014-04-08 15:53:40 <icinga-wm> RECOVERY - Disk space on fenari is OK: DISK OK
2014-04-08 15:53:41 <mutante> RT #80 ftw
2014-04-08 15:53:48 <andrewbogott> With any luck there'll be another flood of OKs in a minute...
2014-04-08 15:53:50 <icinga-wm> RECOVERY - DPKG on fenari is OK: All packages OK
2014-04-08 15:54:10 <icinga-wm> PROBLEM - puppet disabled on bast1001 is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:54:10 <icinga-wm> PROBLEM - Disk space on cp3003 is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:54:10 <icinga-wm> PROBLEM - Disk space on dobson is CRITICAL: Connection refused by host
2014-04-08 15:54:10 <icinga-wm> PROBLEM - DPKG on pdf2 is CRITICAL: Connection refused by host
2014-04-08 15:54:20 <icinga-wm> PROBLEM - puppet disabled on iron is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:54:20 <icinga-wm> PROBLEM - RAID on dobson is CRITICAL: Connection refused by host
2014-04-08 15:54:20 <icinga-wm> PROBLEM - RAID on cp3003 is CRITICAL: NRPE: Command check_raid not defined
2014-04-08 15:54:20 <icinga-wm> PROBLEM - Disk space on pdf2 is CRITICAL: Connection refused by host
2014-04-08 15:54:30 <icinga-wm> PROBLEM - puppet disabled on dobson is CRITICAL: Connection refused by host
2014-04-08 15:54:30 <icinga-wm> PROBLEM - RAID on pdf2 is CRITICAL: Connection refused by host
2014-04-08 15:54:30 <icinga-wm> PROBLEM - DPKG on iodine is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:54:30 <icinga-wm> PROBLEM - puppet disabled on pdf2 is CRITICAL: Connection refused by host
2014-04-08 15:54:40 <icinga-wm> PROBLEM - Disk space on iodine is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:54:40 <icinga-wm> PROBLEM - puppet disabled on cp3003 is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:54:40 <icinga-wm> PROBLEM - DPKG on pdf3 is CRITICAL: Connection refused by host
2014-04-08 15:54:48 <andrewbogott> that's not what I meant
2014-04-08 15:54:50 <icinga-wm> PROBLEM - RAID on iodine is CRITICAL: NRPE: Command check_raid not defined
2014-04-08 15:54:50 <icinga-wm> PROBLEM - Disk space on pdf3 is CRITICAL: Connection refused by host
2014-04-08 15:54:50 <icinga-wm> PROBLEM - DPKG on tridge is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:54:50 <icinga-wm> PROBLEM - DPKG on bast1001 is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:54:51 <icinga-wm> PROBLEM - puppet disabled on iodine is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:54:51 <icinga-wm> PROBLEM - RAID on pdf3 is CRITICAL: Connection refused by host
2014-04-08 15:54:51 <icinga-wm> PROBLEM - Disk space on tridge is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:55:00 <icinga-wm> PROBLEM - Disk space on bast1001 is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:55:00 <icinga-wm> PROBLEM - puppet disabled on pdf3 is CRITICAL: Connection refused by host
2014-04-08 15:55:10 <icinga-wm> PROBLEM - Disk space on iron is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:55:10 <icinga-wm> PROBLEM - RAID on bast1001 is CRITICAL: NRPE: Command check_raid not defined
2014-04-08 15:55:10 <icinga-wm> PROBLEM - DPKG on dobson is CRITICAL: Connection refused by host
2014-04-08 15:55:10 <icinga-wm> PROBLEM - DPKG on cp3003 is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:55:10 <icinga-wm> PROBLEM - DPKG on virt1000 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
2014-04-08 15:55:10 <icinga-wm> PROBLEM - puppet disabled on tridge is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:55:41 <greg-g> ahhh, so today is going to be a worthless -operations channel day, more than normal, due to the sprint? :)
2014-04-08 15:56:03 <andrewbogott> We're about to all go to dinner though.
2014-04-08 15:56:09 <andrewbogott> So things should quiet down shortly.
2014-04-08 15:56:10 <icinga-wm> PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 12:55:50 PM UTC
2014-04-08 15:56:19 <andrewbogott> But the channel will still be useless if you want to talk to ops :)
2014-04-08 15:56:50 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 15:57:03 <mutante> will start nagios-nrpe-server on those
2014-04-08 15:57:10 <icinga-wm> PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 12:56:15 PM UTC
2014-04-08 15:58:42 <icinga-wm> RECOVERY - HTTPS on ssl3001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 15:58:42 <icinga-wm> RECOVERY - HTTPS on ssl1006 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 15:58:52 <icinga-wm> RECOVERY - HTTPS on ssl1007 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 15:58:52 <icinga-wm> RECOVERY - HTTPS on ssl1002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 15:59:32 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 15:59:52 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 16:00:04 <aude> back in 5 min or so
2014-04-08 16:00:06 <grrrit-wm> ('Abandoned') 'Physikerwelt': WIP: Enable orthogonal MathJax config [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/110240' (owner: 'Physikerwelt')
2014-04-08 16:00:42 <icinga-wm> PROBLEM - DPKG on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
2014-04-08 16:00:42 <icinga-wm> PROBLEM - Disk space on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
2014-04-08 16:00:52 <icinga-wm> PROBLEM - RAID on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
2014-04-08 16:01:02 <icinga-wm> PROBLEM - puppet disabled on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
2014-04-08 16:02:22 <icinga-wm> PROBLEM - Puppet freshness on ms6 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:02:03 PM UTC
2014-04-08 16:04:37 <aude> back
2014-04-08 16:08:22 <icinga-wm> PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:07:31 PM UTC
2014-04-08 16:09:27 <icinga-wm> PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:09:07 PM UTC
2014-04-08 16:09:27 <icinga-wm> PROBLEM - Puppet freshness on lvs4003 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:08:32 PM UTC
2014-04-08 16:09:27 <icinga-wm> RECOVERY - HTTPS on cp4020 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:27 <icinga-wm> RECOVERY - HTTPS on cp4006 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:27 <icinga-wm> RECOVERY - HTTPS on cp4013 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:37 <icinga-wm> RECOVERY - HTTPS on cp4009 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:37 <icinga-wm> RECOVERY - HTTPS on cp4010 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:37 <icinga-wm> RECOVERY - HTTPS on ssl3003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:47 <icinga-wm> RECOVERY - HTTPS on ssl3002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:47 <icinga-wm> RECOVERY - HTTPS on ssl1004 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:56 <paravoid> ottomata: ping
2014-04-08 16:09:57 <icinga-wm> RECOVERY - HTTPS on cp4012 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:10:07 <icinga-wm> RECOVERY - HTTPS on cp4016 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:10:07 <icinga-wm> RECOVERY - HTTPS on ssl1008 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:10:07 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 16:10:07 <icinga-wm> RECOVERY - HTTPS on cp4018 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:10:17 <icinga-wm> RECOVERY - HTTPS on ssl1009 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:11:23 <paravoid> ottomata: ping ping
2014-04-08 16:12:47 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 16:12:49 <ottomata> pong pong
2014-04-08 16:13:05 <ottomata> paravoid
2014-04-08 16:13:08 <ottomata> wassupp
2014-04-08 16:13:14 <paravoid> what's with stat1's puppet?
2014-04-08 16:13:18 <paravoid> why is it admin disabled?
2014-04-08 16:13:47 <ottomata> because it is going to be decomed very soon
2014-04-08 16:13:56 <ottomata> and i wanted to make puppet changes that would apply to stat1003 but not mess with what was on stat1
2014-04-08 16:14:05 <ottomata> and I didn't want to re-write a bunch of statistics.pp stuff :/
2014-04-08 16:14:07 <_joe_> ori: are you around? seems like graphite is *not* working
2014-04-08 16:14:24 <paravoid> ottomata: that's bad
2014-04-08 16:14:27 <icinga-wm> PROBLEM - Puppet freshness on lvs1002 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:13:54 PM UTC
2014-04-08 16:14:35 <ottomata> paravoid: even if we are going to decom it soon?
2014-04-08 16:14:36 <paravoid> ottomata: can you remove the "include statistics*" stuff and enable it again?
2014-04-08 16:14:40 <paravoid> yes
2014-04-08 16:14:42 <ottomata> yeah probably can
2014-04-08 16:14:47 <paravoid> because it's messing with monitoring and all that
2014-04-08 16:15:06 <ottomata> ah i see it
2014-04-08 16:15:20 <ottomata> paravoid, what is the differnece between the 3 numbers in each severity category in icinga?
2014-04-08 16:15:25 <mark> ottomata: disabling puppet for more than a few hours max is almost always a really bad idea
2014-04-08 16:15:31 <ottomata> mark, ok, noted.
2014-04-08 16:15:36 <mark> thanks
2014-04-08 16:16:27 <icinga-wm> PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:16:04 PM UTC
2014-04-08 16:16:27 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 16:17:07 <_joe_> :/
2014-04-08 16:17:27 <icinga-wm> PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:16:39 PM UTC
2014-04-08 16:18:10 <ottomata> mark, can you help with the current network ACL problems?
2014-04-08 16:18:22 <mark> sorry, what's that?
2014-04-08 16:18:25 <ottomata> analytics nodes can't talk to apt
2014-04-08 16:18:27 <icinga-wm> PROBLEM - Puppet freshness on lvs4001 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:17:50 PM UTC
2014-04-08 16:18:30 <ottomata> nor statsd.eqiad.wmnet
2014-04-08 16:18:32 <ottomata> https://rt.wikimedia.org/Ticket/Display.html?id=4433
2014-04-08 16:18:37 <ottomata> I added to the bottom of that ticket
2014-04-08 16:18:51 <mark> ok
2014-04-08 16:18:59 <ottomata> i think vanadium was having the same trouble, is it on the vlan too?
2014-04-08 16:19:27 <icinga-wm> PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:19:10 PM UTC
2014-04-08 16:19:31 <aude> still working on wikiquote
2014-04-08 16:19:35 <mark> we can look at getting rid of those ACLs perhaps
2014-04-08 16:19:41 <mark> but we'll need to discuss what you're doing with firewalling
2014-04-08 16:20:18 <grrrit-wm> ('PS1') 'Ottomata': Disabling statistics roles on stat1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124621'
2014-04-08 16:20:18 <se4598> the fingerprint of the wikis SSL cert apparently changed, but it is not a new issued cert but with the same dates as the previous one that i saved. Is that okay that the fingerprint changed?
2014-04-08 16:20:34 <ottomata> mark, yeah, hm, not sure, i kind of like them
2014-04-08 16:20:35 <paravoid> se4598: yes
2014-04-08 16:20:45 <ottomata> especially since anyone with hadoop access can launch whatever mapreduce jobs they want
2014-04-08 16:21:37 <grrrit-wm> ('CR') 'Ottomata': [C: '2' V: '2'] Disabling statistics roles on stat1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124621' (owner: 'Ottomata')
2014-04-08 16:21:37 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 16:21:44 <ottomata> hmmmm
2014-04-08 16:21:48 <ottomata> that's weird
2014-04-08 16:21:59 <ottomata> checking on that 5xx thing in a sec
2014-04-08 16:22:05 <ottomata> that's surely my fault...
2014-04-08 16:22:27 <icinga-wm> PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:21:21 PM UTC
2014-04-08 16:22:27 <icinga-wm> PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:21:26 PM UTC
2014-04-08 16:22:27 <icinga-wm> PROBLEM - Puppet freshness on lvs4002 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:22:07 PM UTC
2014-04-08 16:22:53 <ottomata> hmm, graphite down?
2014-04-08 16:23:04 <mark> ottomata: statsd access for analytics seems already there
2014-04-08 16:23:07 <ottomata> maybe that 5xx thing is not my fault!
2014-04-08 16:23:26 <ottomata> yeah, mark, i think we already had these set up too
2014-04-08 16:23:27 <icinga-wm> PROBLEM - Puppet freshness on virt2 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:22:28 PM UTC
2014-04-08 16:23:37 <icinga-wm> RECOVERY - Puppet freshness on stat1 is OK: puppet ran at Tue Apr 8 16:23:30 UTC 2014
2014-04-08 16:23:43 <ottomata> but it seems that they aren't working right now, starting yesterday when I tried
2014-04-08 16:24:02 <grrrit-wm> ('PS1') 'Hashar': beta: reenable fatalmonitor script on eqiad [operations/puppet] - 'https://gerrit.wikimedia.org/r/124624'
2014-04-08 16:24:13 <mark> and carbon is in there already too
2014-04-08 16:24:15 <ottomata> mark, unless pings just aren't allowed and i'm checking wrong?
2014-04-08 16:24:24 <mark> pings may not be allowed no
2014-04-08 16:24:27 <ottomata> ori and I both had trouble runnign apt-get update because we coudln't talk to carbon
2014-04-08 16:24:31 <mark> check again?
2014-04-08 16:24:35 <ottomata> yeah checking
2014-04-08 16:24:48 <ottomata> and i was trying to run sqstat on analytics1003
2014-04-08 16:24:52 <ottomata> so we can decom emery
2014-04-08 16:24:59 <ottomata> but it couldn't talk to statsd
2014-04-08 16:25:38 <ottomata> hm.
2014-04-08 16:25:44 <ottomata> yeah totally working now
2014-04-08 16:25:57 <ottomata> ooooook.
2014-04-08 16:25:59 <ottomata> weird.
2014-04-08 16:26:00 <_joe_> ottomata: graphite is borked
2014-04-08 16:26:04 <mark> i think faidon did it earlier
2014-04-08 16:26:05 <grrrit-wm> ('CR') 'Hashar': "puppet is broken on deployment-bastion.eqiad.wmflabs, can't deploy the change right now :-/" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124624' (owner: 'Hashar')
2014-04-08 16:26:21 <ottomata> oh, fixed the acl problem?
2014-04-08 16:26:33 <ottomata> maybe something else was just not working, and I assumed because I couldn't ping it was an ACL thing?
2014-04-08 16:26:55 <mark> ping is not a good way to test that
2014-04-08 16:27:10 <ottomata> yeah, i just saw the packets being filtered from ping
2014-04-08 16:27:11 <mark> we allow specific protocols/ports, ping uses different ones
2014-04-08 16:27:14 <ottomata> aye
2014-04-08 16:27:30 <ottomata> yeah, just figured if i couldn't at least ping then probably other stuff was blcoked too, but ja
2014-04-08 16:27:57 <ottomata> but yeah, ori couldn't use apt on vanadium either, so dunno...
2014-04-08 16:28:10 <ottomata> and sqstat couldnt' talk to tungsten, so hm
2014-04-08 16:28:12 <ottomata> but ok!
2014-04-08 16:28:16 <mark> :)
2014-04-08 16:28:22 <mark> we're going for dinner in a bit
2014-04-08 16:28:44 <ottomata> mark
2014-04-08 16:28:45 <ottomata> hm
2014-04-08 16:28:53 <ottomata> so sqstat is trying to talk to tungsten on 2003
2014-04-08 16:28:56 <hashar> !log Jenkins: killed jenkins-slave java process on gallium and repooled gallium slave. It was no more registered in Zuul :-/
2014-04-08 16:28:57 <icinga-wm> RECOVERY - puppet disabled on iron is OK: OK
2014-04-08 16:28:57 <ottomata> is that open?
2014-04-08 16:29:01 <morebots> Logged the message, Master
2014-04-08 16:29:07 <icinga-wm> RECOVERY - Disk space on iron is OK: DISK OK
2014-04-08 16:29:09 <ottomata> can't seem to reach it from an03
2014-04-08 16:29:34 <manybubbles> ganglia seems upset
2014-04-08 16:29:40 <mark> protocol udp;
2014-04-08 16:29:40 <mark> destination-port 8125;
2014-04-08 16:29:45 <aude> tables added
2014-04-08 16:29:51 <mark> so port 2003 isn't
2014-04-08 16:29:54 <ottomata> ah ok
2014-04-08 16:30:03 <ottomata> that's why then, could you add?
2014-04-08 16:30:13 <mark> ok
2014-04-08 16:30:40 <ottomata> i'm going to see if reqstats gets flaky when we move it to analytics1003
2014-04-08 16:30:51 <ottomata> it was either flaky because erbium is busy
2014-04-08 16:30:57 <ottomata> or because the multicast firehose is just too lossy
2014-04-08 16:31:37 <aude> !log added sites and site_identifiers core tables on wikiquote
2014-04-08 16:31:41 <morebots> Logged the message, Master
2014-04-08 16:32:22 <mark> 2003 should work now
2014-04-08 16:33:36 <icinga-wm> RECOVERY - DPKG on iodine is OK: All packages OK
2014-04-08 16:33:36 <icinga-wm> RECOVERY - Disk space on iodine is OK: DISK OK
2014-04-08 16:33:36 <icinga-wm> RECOVERY - puppet disabled on cp3003 is OK: OK
2014-04-08 16:33:39 <ottomata> ah just noticed it is udp, mark, will that work still?
2014-04-08 16:33:46 <icinga-wm> RECOVERY - HTTPS on cp4014 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:46 <icinga-wm> RECOVERY - RAID on cp3003 is OK: OK: optimal, 2 logical, 2 physical
2014-04-08 16:33:46 <icinga-wm> RECOVERY - RAID on iodine is OK: OK: no disks configured for RAID
2014-04-08 16:33:46 <icinga-wm> RECOVERY - HTTPS on ssl1005 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:46 <icinga-wm> RECOVERY - HTTPS on cp4003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:47 <mark> yes
2014-04-08 16:33:51 <ottomata> ok cool
2014-04-08 16:33:52 <ottomata> thanks
2014-04-08 16:33:53 <ottomata> ok go eat
2014-04-08 16:33:55 <ottomata> thank you!
2014-04-08 16:33:56 <icinga-wm> RECOVERY - DPKG on bast1001 is OK: All packages OK
2014-04-08 16:33:56 <icinga-wm> RECOVERY - puppet disabled on iodine is OK: OK
2014-04-08 16:33:56 <icinga-wm> RECOVERY - HTTPS on cp4002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:56 <icinga-wm> RECOVERY - HTTPS on amssq47 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:56 <icinga-wm> RECOVERY - HTTPS on cp4004 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:57 <icinga-wm> RECOVERY - HTTPS on cp4001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:57 <icinga-wm> RECOVERY - HTTPS on cp4017 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:58 <icinga-wm> RECOVERY - HTTPS on cp4015 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:58 <icinga-wm> RECOVERY - HTTPS on cp4008 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:59 <icinga-wm> RECOVERY - HTTPS on ssl1001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:59 <icinga-wm> RECOVERY - HTTPS on cp4005 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:34:00 <icinga-wm> RECOVERY - Disk space on bast1001 is OK: DISK OK
2014-04-08 16:34:00 <icinga-wm> RECOVERY - HTTPS on cp4019 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:34:06 <icinga-wm> RECOVERY - RAID on bast1001 is OK: OK: no RAID installed
2014-04-08 16:34:06 <icinga-wm> RECOVERY - DPKG on cp3003 is OK: All packages OK
2014-04-08 16:34:06 <icinga-wm> RECOVERY - HTTPS on ssl1003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:34:06 <icinga-wm> RECOVERY - HTTPS on cp4007 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:34:16 <icinga-wm> RECOVERY - puppet disabled on bast1001 is OK: OK
2014-04-08 16:34:16 <icinga-wm> RECOVERY - Disk space on cp3003 is OK: DISK OK
2014-04-08 16:34:16 <icinga-wm> RECOVERY - HTTPS on cp4011 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:35:36 <icinga-wm> PROBLEM - Puppet freshness on lvs4004 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:35:09 PM UTC
2014-04-08 16:35:46 <icinga-wm> PROBLEM - HTTPS on cp1044 is CRITICAL: SSL_CERT CRITICAL *.wikimedia.org: invalid CN (*.wikimedia.org does not match *.wikipedia.org)
2014-04-08 16:35:56 <icinga-wm> PROBLEM - HTTPS on cp1043 is CRITICAL: SSL_CERT CRITICAL *.wikimedia.org: invalid CN (*.wikimedia.org does not match *.wikipedia.org)
2014-04-08 16:36:48 <grrrit-wm> ('PS1') 'Ottomata': Putting sqstat back on analytics1003 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124630'
2014-04-08 16:37:16 <grrrit-wm> ('CR') 'Ottomata': [C: '2' V: '2'] Putting sqstat back on analytics1003 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124630' (owner: 'Ottomata')
2014-04-08 16:38:30 <grrrit-wm> ('PS1') 'Springle': invalid MariaDB variable name: user_stat [operations/puppet] - 'https://gerrit.wikimedia.org/r/124632'
2014-04-08 16:40:40 <grrrit-wm> ('CR') 'Springle': [C: '2'] invalid MariaDB variable name: user_stat [operations/puppet] - 'https://gerrit.wikimedia.org/r/124632' (owner: 'Springle')
2014-04-08 16:46:50 <grrrit-wm> ('PS1') 'RobH': replace misc-web-lb cert [operations/puppet] - 'https://gerrit.wikimedia.org/r/124634'
2014-04-08 16:48:11 <grrrit-wm> ('CR') 'RobH': [C: '2' V: '2'] replace misc-web-lb cert [operations/puppet] - 'https://gerrit.wikimedia.org/r/124634' (owner: 'RobH')
2014-04-08 16:49:09 <aude> sorry, being slow... populating sites table
2014-04-08 16:49:20 <grrrit-wm> ('PS1') 'Alexandros Kosiaris': Removing ethtool package from other places [operations/puppet] - 'https://gerrit.wikimedia.org/r/124637'
2014-04-08 16:49:22 <aude> suppose no hurry
2014-04-08 16:50:08 <grrrit-wm> ('CR') 'Dzahn': [C: ''] Removing ethtool package from other places [operations/puppet] - 'https://gerrit.wikimedia.org/r/124637' (owner: 'Alexandros Kosiaris')
2014-04-08 16:52:03 <grrrit-wm> ('CR') 'Dzahn': [C: '2'] "now included in base" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124637' (owner: 'Alexandros Kosiaris')
2014-04-08 16:53:08 <grrrit-wm> ('CR') 'Cmcmahon': [C: ''] "Thanks for putting this back." [operations/puppet] - 'https://gerrit.wikimedia.org/r/124624' (owner: 'Hashar')
2014-04-08 16:53:36 <icinga-wm> RECOVERY - Puppet freshness on virt2 is OK: puppet ran at Tue Apr 8 16:53:29 UTC 2014
2014-04-08 16:53:46 <icinga-wm> RECOVERY - Puppet freshness on dataset1001 is OK: puppet ran at Tue Apr 8 16:53:39 UTC 2014
2014-04-08 16:55:06 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 16:55:28 <ottomata> rats
2014-04-08 16:56:36 <icinga-wm> RECOVERY - Puppet freshness on amslvs2 is OK: puppet ran at Tue Apr 8 16:56:30 UTC 2014
2014-04-08 16:56:46 <icinga-wm> RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Tue Apr 8 16:56:45 UTC 2014
2014-04-08 16:59:04 <aude> waiting for jenkins
2014-04-08 17:01:46 <icinga-wm> RECOVERY - Puppet freshness on ms6 is OK: puppet ran at Tue Apr 8 17:01:37 UTC 2014
2014-04-08 17:01:48 <grrrit-wm> ('PS2') 'Manybubbles': Turn on experimental highlighting in beta [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124003'
2014-04-08 17:03:06 <logmsgbot> !log aude synchronized php-1.23wmf20/extensions/Wikidata 'Update Wikidata build, to allow populating sites table on wikiquote'
2014-04-08 17:03:10 <morebots> Logged the message, Master
2014-04-08 17:05:20 <icinga-wm> RECOVERY - Puppet freshness on lvs4004 is OK: puppet ran at Tue Apr 8 17:05:14 UTC 2014
2014-04-08 17:05:30 <icinga-wm> PROBLEM - RAID on dataset1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded)
2014-04-08 17:06:40 <icinga-wm> PROBLEM - LVS HTTPS IPv6 on misc-web-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection refused
2014-04-08 17:07:40 <icinga-wm> RECOVERY - LVS HTTPS IPv6 on misc-web-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 226 bytes in 0.012 second response time
2014-04-08 17:08:20 <icinga-wm> RECOVERY - Puppet freshness on amslvs3 is OK: puppet ran at Tue Apr 8 17:08:15 UTC 2014
2014-04-08 17:08:30 <icinga-wm> RECOVERY - Puppet freshness on lvs4003 is OK: puppet ran at Tue Apr 8 17:08:25 UTC 2014
2014-04-08 17:08:44 <grrrit-wm> ('CR') 'Chad': [C: '2'] Turn on experimental highlighting in beta [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124003' (owner: 'Manybubbles')
2014-04-08 17:08:53 <grrrit-wm> ('Merged') 'jenkins-bot': Turn on experimental highlighting in beta [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124003' (owner: 'Manybubbles')
2014-04-08 17:09:40 <icinga-wm> RECOVERY - Puppet freshness on lvs1006 is OK: puppet ran at Tue Apr 8 17:09:30 UTC 2014
2014-04-08 17:10:10 <icinga-wm> PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 17:10:10 <icinga-wm> PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 17:10:10 <icinga-wm> PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 17:10:10 <icinga-wm> PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 17:10:19 <grrrit-wm> ('CR') 'QChris': "Prerequisite got merged." [operations/puppet] - 'https://gerrit.wikimedia.org/r/121546' (owner: 'Ottomata')
2014-04-08 17:10:52 <aude> ^demon|away: are you deploying stuff?
2014-04-08 17:11:14 <aude> i'll need to sneak in some point for a config change, but not yet
2014-04-08 17:11:29 <grrrit-wm> ('PS1') 'Ottomata': Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641'
2014-04-08 17:11:38 <grrrit-wm> ('PS2') 'Ottomata': Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641'
2014-04-08 17:11:40 <grrrit-wm> ('CR') 'jenkins-bot': [V: '-1'] Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641' (owner: 'Ottomata')
2014-04-08 17:11:50 <grrrit-wm> ('CR') 'Ottomata': [C: '2' V: '2'] Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641' (owner: 'Ottomata')
2014-04-08 17:12:28 <manybubbles> aude: no, he just merged something for beta
2014-04-08 17:12:34 <aude> ok
2014-04-08 17:12:41 <aude> probably need 10 more minutes
2014-04-08 17:12:50 <aude> done populating tables, now checking they are ok
2014-04-08 17:13:00 <aude> then can do the config change and then done :)
2014-04-08 17:13:19 <^demon|away> aude: Nope, just merged that for Nik for beta.
2014-04-08 17:13:21 <^demon|away> Like he said :)
2014-04-08 17:13:22 <aude> going slow and careful since i'm still newish
2014-04-08 17:13:25 <aude> doign this stuff
2014-04-08 17:13:32 <^demon|away> Someone should sync it eventually for consistency, but no biggie.
2014-04-08 17:13:53 <aude> i can do
2014-04-08 17:14:04 <hoo> so can I
2014-04-08 17:14:29 <aude> hoo: want to check the sites tables and site_identifiers for wikiquote?
2014-04-08 17:14:30 <icinga-wm> RECOVERY - Puppet freshness on lvs1002 is OK: puppet ran at Tue Apr 8 17:14:22 UTC 2014
2014-04-08 17:14:36 <aude> they look ok to me
2014-04-08 17:15:30 <icinga-wm> RECOVERY - Puppet freshness on lvs1005 is OK: puppet ran at Tue Apr 8 17:15:22 UTC 2014
2014-04-08 17:16:02 <grrrit-wm> ('CR') 'Aude': "sites table and site_identifiers are added and populated" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
2014-04-08 17:16:10 <icinga-wm> RECOVERY - Puppet freshness on lvs1004 is OK: puppet ran at Tue Apr 8 17:16:02 UTC 2014
2014-04-08 17:16:28 <manybubbles> !log finished upgrading elastic1001-1006. starting on 1007. yay progress.
2014-04-08 17:16:32 <morebots> Logged the message, Master
2014-04-08 17:16:34 <hoo> enwikiqoute looks good to me
2014-04-08 17:16:39 <aude> alright
2014-04-08 17:16:40 <hoo> sites and site_identifiers
2014-04-08 17:16:44 <aude> strip protocals and all
2014-04-08 17:16:52 <hoo> yep
2014-04-08 17:16:58 <aude> https://gerrit.wikimedia.org/r/#/c/124516/ want to merge
2014-04-08 17:17:07 <aude> i can deploy it and sync the cirrus thing
2014-04-08 17:17:19 <manybubbles> thanks1
2014-04-08 17:17:22 <hoo> ok, also looks good on WD
2014-04-08 17:17:30 <aude> ok
2014-04-08 17:17:45 <aude> let me sync cirrus
2014-04-08 17:17:52 <hoo> go ahead
2014-04-08 17:17:53 <Nemo_bis> Oh, today is the day
2014-04-08 17:18:06 <aude> it's *the* day :)
2014-04-08 17:18:10 <icinga-wm> RECOVERY - Puppet freshness on lvs4001 is OK: puppet ran at Tue Apr 8 17:18:03 UTC 2014
2014-04-08 17:19:18 <hoo> aude: You also sorted the wikidataclient dblist? :P
2014-04-08 17:19:53 <aude> yes
2014-04-08 17:20:04 <hoo> Ok, looks good to me, can approve whenever you want
2014-04-08 17:20:05 <aude> they will get sorted eventually
2014-04-08 17:20:13 <aude> doing chad's thing
2014-04-08 17:20:30 <icinga-wm> RECOVERY - Puppet freshness on amslvs1 is OK: puppet ran at Tue Apr 8 17:20:23 UTC 2014
2014-04-08 17:21:30 <icinga-wm> RECOVERY - Puppet freshness on lvs1001 is OK: puppet ran at Tue Apr 8 17:21:24 UTC 2014
2014-04-08 17:21:50 <icinga-wm> RECOVERY - Puppet freshness on amslvs4 is OK: puppet ran at Tue Apr 8 17:21:45 UTC 2014
2014-04-08 17:22:30 <icinga-wm> RECOVERY - Puppet freshness on lvs4002 is OK: puppet ran at Tue Apr 8 17:22:21 UTC 2014
2014-04-08 17:22:43 <logmsgbot> !log aude synchronized wmf-config/CirrusSearch-labs.php 'config change for beta, to enable highlighting'
2014-04-08 17:22:47 <morebots> Logged the message, Master
2014-04-08 17:23:06 <aude> hoo: ready
2014-04-08 17:23:45 <grrrit-wm> ('CR') 'Hoo man': [C: '2'] "Preparation finished, so do this! \o/" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
2014-04-08 17:23:49 <aude> yay!
2014-04-08 17:23:51 <hoo> there you go ;)
2014-04-08 17:23:53 <grrrit-wm> ('Merged') 'jenkins-bot': Enable Wikibase on Wikiquote [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
2014-04-08 17:27:20 <hoo> aude: About to sync or shall I take it?
2014-04-08 17:27:21 <aude> sync dblist then wmf-config?
2014-04-08 17:27:31 <Nemo_bis> waiting
2014-04-08 17:27:43 <aude> no other way
2014-04-08 17:27:52 <hoo> other way round sounds sane
2014-04-08 17:28:02 <aude> wmf-config then dblist is good
2014-04-08 17:28:06 <hoo> wmf-config changes will work w/o the rest
2014-04-08 17:28:10 <aude> right
2014-04-08 17:28:20 <aude> that' what ree-dy did for wikisource
2014-04-08 17:28:52 <aude> doing
2014-04-08 17:28:55 <hoo> :)
2014-04-08 17:28:59 <logmsgbot> !log aude synchronized wmf-config 'config changes to enable Wikibase on Wikiquote'
2014-04-08 17:29:04 <morebots> Logged the message, Master
2014-04-08 17:29:12 <grrrit-wm> ('PS1') 'Matthias Mullie': Increase Flow cache version [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124646'
2014-04-08 17:29:52 <logmsgbot> !log aude synchronized wikidataclient.dblist 'Enable Wikibase on Wikiquote'
2014-04-08 17:29:57 <morebots> Logged the message, Master
2014-04-08 17:30:01 <hoo> oO
2014-04-08 17:30:02 <hoo> :)
2014-04-08 17:30:12 <aude> alright time to check it's all good
2014-04-08 17:30:17 <hoo> on that
2014-04-08 17:31:13 <hoo> oh well... I think we have to bump wgCacheEpoch once again
2014-04-08 17:31:14 <hoo> aude: ^
2014-04-08 17:31:36 <aude> huh
2014-04-08 17:31:45 <aude> ah, yes
2014-04-08 17:32:00 <hoo> shall I patch or will you?
2014-04-08 17:32:26 <Nemo_bis> https://www.wikidata.org/wiki/Q189119#sitelinks-wikiquote
2014-04-08 17:32:34 <hoo> Nemo_bis: Yes, the usual stuff
2014-04-08 17:32:34 <aude> go ahead
2014-04-08 17:33:06 <aude> it says list of values is complete
2014-04-08 17:33:09 <aude> i assume caching
2014-04-08 17:33:16 <aude> on Q60
2014-04-08 17:33:57 <aude> debug=true, i can add wikiquote
2014-04-08 17:34:23 <Nemo_bis> yep, I did action=purge
2014-04-08 17:34:23 <grrrit-wm> ('PS1') 'Hoo man': Bump wgCacheEpoch for Wikidata after enabling Wikiquote langlinks [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124648'
2014-04-08 17:34:24 <hoo> yep
2014-04-08 17:34:31 <hoo> aude: ^
2014-04-08 17:34:35 <aude> ok
2014-04-08 17:35:21 <ottomata> !log restarted gmetad on nickel to fix ganglia
2014-04-08 17:35:26 <morebots> Logged the message, Master
2014-04-08 17:35:33 <grrrit-wm> ('CR') 'Aude': [C: '2'] Bump wgCacheEpoch for Wikidata after enabling Wikiquote langlinks [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124648' (owner: 'Hoo man')
2014-04-08 17:35:40 <grrrit-wm> ('Merged') 'jenkins-bot': Bump wgCacheEpoch for Wikidata after enabling Wikiquote langlinks [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124648' (owner: 'Hoo man')
2014-04-08 17:37:00 <hoo> aude: Syncing? I have to sync a touch out
2014-04-08 17:37:10 <aude> doing
2014-04-08 17:37:12 <hoo> ok
2014-04-08 17:37:18 <logmsgbot> !log aude synchronized wmf-config/Wikibase.php 'bump wgCacheEpoch for wikidata after enabling wikiquote site links'
2014-04-08 17:37:19 <aude> just being careful
2014-04-08 17:37:22 <morebots> Logged the message, Master
2014-04-08 17:37:28 <logmsgbot> !log hoo synchronized php-1.23wmf20/extensions/Wikidata/extensions/Wikibase/lib/resources/wikibase.Site.js 'touch'
2014-04-08 17:37:32 <morebots> Logged the message, Master
2014-04-08 17:37:34 <hoo> that should purge the sites cache
2014-04-08 17:37:43 <greg-g> "13:37 < aude> just being careful" +1 ;)
2014-04-08 17:37:44 <hoo> in resource loader
2014-04-08 17:37:47 <aude> :)
2014-04-08 17:38:25 <aude> still says complete
2014-04-08 17:38:30 <hoo> mh :/
2014-04-08 17:38:45 <aude> sites module has always been a pain
2014-04-08 17:40:24 <aude> maybe php-1.23wmf20/extensions/Wikidata/extensions/Wikibase/lib/includes/modules/SitesModule.php ?
2014-04-08 17:40:43 <hoo> aude: Wont help, RL does timestamps based on the JS scripts
2014-04-08 17:40:50 <aude> hmmm, ok
2014-04-08 17:41:13 <hoo> works for me
2014-04-08 17:41:16 <hoo> now at least
2014-04-08 17:41:35 <aude> trying in firefox
2014-04-08 17:41:39 <aude> might be my caching
2014-04-08 17:41:42 <hoo> \o/ Just added the first link
2014-04-08 17:41:46 <hoo> https://www.wikidata.org/wiki/Q40904#sitelinks-wikiquote
2014-04-08 17:41:48 <aude> already did one :)
2014-04-08 17:41:54 <aude> with debug=true
2014-04-08 17:41:59 <hoo> Cheating :D
2014-04-08 17:42:11 <aude> heh
2014-04-08 17:42:23 <aude> looks good in firefox
2014-04-08 17:42:30 <aude> i have to assume it's my cache
2014-04-08 17:42:31 <Nemo_bis> I did one ten minutes ago already :P
2014-04-08 17:42:35 <hoo> :P
2014-04-08 17:42:36 <aude> yay
2014-04-08 17:42:45 <hoo> Nemo_bis: with debug true, I guess?!
2014-04-08 17:42:50 <Nemo_bis> lol Heisenberg
2014-04-08 17:42:55 <Nemo_bis> 19.34 < Nemo_bis> yep, I did action=purge
2014-04-08 17:43:01 <hoo> :P
2014-04-08 17:43:01 <aude> ah
2014-04-08 17:43:50 <Guest75555> Is there a procedure to delete gerrit repositories?
2014-04-08 17:45:00 <aude> i can add links in wikidata now in chrome
2014-04-08 17:45:09 <hoo> aude: https://en.wikiquote.org/w/index.php?title=Werner_Heisenberg&action=info mh
2014-04-08 17:45:14 <hoo> why is it not showing up?
2014-04-08 17:45:34 <Nemo_bis> Guest64226 / krinkle : probably you can ask on the same gerrit queue page as usual
2014-04-08 17:45:53 <hoo> ah, I see
2014-04-08 17:45:57 <Nemo_bis> unless it's not "your" repository, in which case maybe a bug is better
2014-04-08 17:46:11 <hoo> dispatching is ... :S
2014-04-08 17:47:21 <aude> hmmm
2014-04-08 17:47:28 <hoo> https://www.wikidata.org/wiki/Special:DispatchStats
2014-04-08 17:47:44 <aude> i did action=purge on https://en.wikiquote.org/wiki/New_York_City
2014-04-08 17:47:46 <hoo> aude: Can we safely skip theses changes? If not just waiting is also fine
2014-04-08 17:47:54 <hoo> it's catching up rather quickly AFAIS
2014-04-08 17:47:55 <aude> removed dewikiquote
2014-04-08 17:48:08 <aude> we can wait
2014-04-08 17:48:16 <bd808|deploy> waits in line to do a group0 to 1.23wmf21 scap
2014-04-08 17:48:28 <aude> give us 5 more minutes to poke
2014-04-08 17:48:43 <bd808|deploy> aude: Sounds good
2014-04-08 17:48:59 <aude> i think we're ok though...
2014-04-08 17:49:32 <aude> or nothing we solve in 5 min, but didn't break anything
2014-04-08 17:50:51 <hoo> aude: I can bump the chd_seen fields
2014-04-08 17:51:12 <aude> ok
2014-04-08 17:52:05 <hoo> Just looking for the right change id
2014-04-08 17:53:43 <hoo> got that
2014-04-08 17:54:37 <aude> something is weird with wikiquote... like it's not actually enabled now
2014-04-08 17:54:45 <aude> but sure i saw it was
2014-04-08 17:55:29 <aude> thinks this happened with wikisource
2014-04-08 17:56:19 <hoo> !log changed the Wikidata wb_changes_dispatch position of all wikiquote wikis to 118158153
2014-04-08 17:56:23 <morebots> Logged the message, Master
2014-04-08 17:56:39 <aude> enwikiquote is in wikidataclient.dblist
2014-04-08 17:56:42 <hoo> 20140408172900
2014-04-08 17:57:03 <hoo> that was the timestamp, should be a few moments before anything happened regarding wikiquote
2014-04-08 17:57:12 <aude> ok
2014-04-08 17:57:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 540.333313
2014-04-08 17:58:28 <hoo> still https://en.wikiquote.org/w/index.php?title=Werner_Heisenberg&action=info
2014-04-08 17:58:56 <hoo> Wikidata is not even loaded there... wtf
2014-04-08 17:58:59 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 645.666687
2014-04-08 17:59:03 <aude> right,
2014-04-08 17:59:05 <aude> i'm sure it was
2014-04-08 17:59:25 <aude> do i have to sync dblist again?
2014-04-08 17:59:37 <aude> did we somehow undo it?
2014-04-08 18:00:58 <hoo> no, looks good on a random mw* machine
2014-04-08 18:01:09 <icinga-wm> PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 1694 MB (2% inode=86%):
2014-04-08 18:01:14 <hoo> ah
2014-04-08 18:01:50 <logmsgbot> !log hoo synchronized wmf-config/InitialiseSettings.php 'Touch to clear config. cache'
2014-04-08 18:01:54 <morebots> Logged the message, Master
2014-04-08 18:01:55 <aude> ok
2014-04-08 18:02:09 <aude> it's back!
2014-04-08 18:02:11 <hoo> Sorry, I forgot about that
2014-04-08 18:02:33 <aude> was about to try that
2014-04-08 18:02:37 <hoo> :)
2014-04-08 18:02:41 <aude> touch all the wikidata things :)
2014-04-08 18:02:43 <bd808|deploy> wants to fix https://bugzilla.wikimedia.org/show_bug.cgi?id=58618 so that's automatic
2014-04-08 18:02:56 <aude> i think we are done!
2014-04-08 18:03:19 <aude> i am sure this happened on wikisource or previously where it was enabled and then not
2014-04-08 18:03:38 <aude> puzzled but we're good now
2014-04-08 18:04:13 <hoo> Yep, looks good to me
2014-04-08 18:04:23 <bd808|deploy> aude, hoo: All clear for me to mess with /a/common on tin and then scap?
2014-04-08 18:04:37 <hoo> Yep, go ahead... we're done for now :)
2014-04-08 18:04:47 <bd808|deploy> Cool
2014-04-08 18:05:08 <aude> done
2014-04-08 18:06:11 <grrrit-wm> ('PS1') 'BryanDavis': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124655'
2014-04-08 18:06:50 <greg-g> crosses fingers and knocks on wood
2014-04-08 18:07:03 <grrrit-wm> ('CR') 'BryanDavis': [C: '2'] Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124655' (owner: 'BryanDavis')
2014-04-08 18:07:05 <aude> too!
2014-04-08 18:07:46 <bd808|deploy> greg-g: Aaron merged my fix so in theory I should only need one scap. I'll verify the file after the first scap to be certain
2014-04-08 18:08:21 <greg-g> nods
2014-04-08 18:08:28 <grrrit-wm> ('Merged') 'jenkins-bot': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124655' (owner: 'BryanDavis')
2014-04-08 18:10:36 <logmsgbot> !log bd808 Started scap: group0 wikis to 1.23wmf21 (with patch for bug 63659)
2014-04-08 18:10:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:10:41 <morebots> Logged the message, Master
2014-04-08 18:11:25 <bd808|deploy> l10n cache did not rebuild which is a great sign
2014-04-08 18:11:58 <jackmcbarn> Unable to open /usr/local/apache/common-local/wikiversions.cdb.
2014-04-08 18:11:58 <MatmaRex> https://pl.wikipedia.org/w/index.php?title=Dyskusja_wikiprojektu:%C5%9Ar%C3%B3dziemie&oldid=prev&diff=39218000
2014-04-08 18:12:01 <MatmaRex> i get a "Unable to open /usr/local/apache/common-local/wikiversions.cdb."
2014-04-08 18:12:10 <andre__> ...and same here.
2014-04-08 18:12:12 <manybubbles> [2014-04-08 18:11:37] Fatal error: Unable to open /usr/local/apache/common-local/wikiversions.cdb.
2014-04-08 18:12:15 <rschen7754> uh-oh
2014-04-08 18:12:19 <bd808|deploy> Yeah. fuck
2014-04-08 18:12:21 <manybubbles> yeah, you got it
2014-04-08 18:12:22 <Steinsplitter> here the same
2014-04-08 18:12:26 <bd808|deploy> It will be fixed in a few moments
2014-04-08 18:12:30 <manybubbles> thats everything
2014-04-08 18:12:31 <greg-g> well shit
2014-04-08 18:12:45 <bd808|deploy> fuuuuck
2014-04-08 18:12:49 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 18:12:57 <bd808|deploy> There's my first crash all of the wikis
2014-04-08 18:12:59 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:13:00 <MaxSem> SNAFU?
2014-04-08 18:13:05 <aude> wtf
2014-04-08 18:13:13 <Amgine> down on wm
2014-04-08 18:13:21 <manybubbles> damn it, I was actually reading an article and I reloaded it to test
2014-04-08 18:13:23 <bd808|deploy> It was my "fix" for the scap problem
2014-04-08 18:13:25 <manybubbles> now I can't read it while I wait
2014-04-08 18:13:29 <icinga-wm> PROBLEM - Apache HTTP on mw1190 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.007 second response time
2014-04-08 18:13:29 <icinga-wm> PROBLEM - Apache HTTP on mw1055 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.013 second response time
2014-04-08 18:13:29 <icinga-wm> PROBLEM - Apache HTTP on mw1150 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.004 second response time
2014-04-08 18:13:29 <icinga-wm> PROBLEM - Apache HTTP on mw1101 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.005 second response time
2014-04-08 18:13:29 <icinga-wm> PROBLEM - Apache HTTP on mw1177 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.009 second response time
2014-04-08 18:13:29 <icinga-wm> PROBLEM - Apache HTTP on mw1138 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.003 second response time
2014-04-08 18:13:30 <icinga-wm> PROBLEM - Apache HTTP on mw1187 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.006 second response time
2014-04-08 18:13:30 <icinga-wm> PROBLEM - Apache HTTP on mw1220 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.006 second response time
2014-04-08 18:13:31 <icinga-wm> PROBLEM - Apache HTTP on mw1197 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.013 second response time
2014-04-08 18:13:31 <icinga-wm> PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - check plugin (check_job_queue) or PHP errors -
2014-04-08 18:13:33 <marktraceur> Whoa
2014-04-08 18:13:34 <aude> cries
2014-04-08 18:13:39 <icinga-wm> PROBLEM - Apache HTTP on mw1213 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.018 second response time
2014-04-08 18:13:39 <icinga-wm> PROBLEM - Apache HTTP on mw1113 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.012 second response time
2014-04-08 18:13:39 <icinga-wm> PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.008 second response time
2014-04-08 18:13:42 <icinga-wm> PROBLEM - Apache HTTP on mw1200 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.006 second response time
2014-04-08 18:13:42 <icinga-wm> PROBLEM - Apache HTTP on mw1035 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.022 second response time
2014-04-08 18:13:42 <icinga-wm> PROBLEM - Apache HTTP on mw1031 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.011 second response time
2014-04-08 18:13:42 <icinga-wm> PROBLEM - Apache HTTP on mw1090 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.010 second response time
2014-04-08 18:13:42 <icinga-wm> PROBLEM - Apache HTTP on mw1154 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.007 second response time
2014-04-08 18:13:52 <bd808|deploy> It will be fixed soon… scap will fix it at the end
2014-04-08 18:13:54 <logmsgbot> !log bd808 Finished scap: group0 wikis to 1.23wmf21 (with patch for bug 63659) (duration: 03m 18s)
2014-04-08 18:13:59 <morebots> Logged the message, Master
2014-04-08 18:14:00 <aude> alright
2014-04-08 18:14:01 <bd808|deploy> Should be fixed now
2014-04-08 18:14:04 <manybubbles> fixed
2014-04-08 18:14:15 <greg-g> breathes again
2014-04-08 18:14:22 <jackmcbarn> can whoever's in charge of icinga-wm bring it back to life?
2014-04-08 18:14:35 <sjoerddebruin> Damn it. :P
2014-04-08 18:14:37 <greg-g> jackmcbarn: it'll again automatically, I *believe*
2014-04-08 18:14:38 <PiRCarre> Someone
2014-04-08 18:14:39 <MaxSem> so what happened?
2014-04-08 18:14:47 <PiRCarre> Oh, you know about it?
2014-04-08 18:14:48 <Marybelle> greg-g: You accidentally a verb.
2014-04-08 18:14:49 <PiRCarre> ok
2014-04-08 18:14:50 <icinga-wm> RECOVERY - Apache HTTP on mw1027 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.066 second response time
2014-04-08 18:14:50 <icinga-wm> RECOVERY - Apache HTTP on mw1092 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.073 second response time
2014-04-08 18:14:51 <icinga-wm> RECOVERY - Apache HTTP on mw1073 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.084 second response time
2014-04-08 18:14:51 <icinga-wm> RECOVERY - Apache HTTP on mw1018 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.111 second response time
2014-04-08 18:14:51 <bd808|deploy> Patch https://gerrit.wikimedia.org/r/#/c/124627/
2014-04-08 18:14:52 <icinga-wm> RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.062 second response time
2014-04-08 18:14:52 <icinga-wm> RECOVERY - Apache HTTP on mw1217 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.059 second response time
2014-04-08 18:15:07 <greg-g> Marybelle: :)
2014-04-08 18:15:16 <bd808|deploy> I'll write up the email. I know exactly what I fucked up
2014-04-08 18:15:21 <PiRCarre> bd808|deploy: thanks, I was just about to report "Unable to open /usr/local/apache/common-local/wikiversions.cdb." - glad to see it's under control
2014-04-08 18:15:29 <aude> breathes
2014-04-08 18:15:54 <paravoid> what's going on?
2014-04-08 18:16:08 <paravoid> we are all at dinner
2014-04-08 18:16:23 <manybubbles> fixed now
2014-04-08 18:16:24 <aude> it's ok
2014-04-08 18:16:25 <bd808|deploy> paravoid: My fault. Should be fixed now
2014-04-08 18:16:31 <paravoid> okay
2014-04-08 18:16:35 <greg-g> paravoid: go back to dinner, all's ok again :)
2014-04-08 18:16:36 <aude> scap temporarily broke everything though
2014-04-08 18:16:36 <paravoid> do you need anything?
2014-04-08 18:16:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 183.266663
2014-04-08 18:16:39 <paravoid> ok
2014-04-08 18:16:44 <paravoid> manual page us if something happens
2014-04-08 18:16:52 <greg-g> paravoid: nope, known ef up
2014-04-08 18:16:57 <greg-g> paravoid: will do, enjoy!
2014-04-08 18:17:05 <paravoid> ciao
2014-04-08 18:18:17 <grrrit-wm> ('PS2') 'Gerg? Tisza': Add setting to show a survey for MediaViewer users on some sites [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036'
2014-04-08 18:18:56 <grrrit-wm> ('CR') 'Gerg? Tisza': "Updated to display feedback survey on beta enwiki." [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036' (owner: 'Gerg? Tisza')
2014-04-08 18:19:29 <bd808|deploy> greg-g: I just reverted my patch to scap that caused that cascade of horribleness
2014-04-08 18:19:36 <greg-g> :)
2014-04-08 18:19:44 <bd808|deploy> One the plus side, group0 is on wmf21 now
2014-04-08 18:19:50 <greg-g> lol
2014-04-08 18:19:58 <greg-g> literal-lol
2014-04-08 18:20:09 <aude> scared to change it back
2014-04-08 18:20:20 <greg-g> "Don't. Touch. Any. Thing."
2014-04-08 18:20:25 <aude> i suppose if bd808|deploy 's patch is reverted then ok
2014-04-08 18:20:39 <greg-g> well, we still have the previous issue which it was trying to fix ;)
2014-04-08 18:20:59 <greg-g> 1 step forward, 1 step back
2014-04-08 18:21:23 <bd808|deploy> So yes we are temporarily back to needing to double-scap, but I'll make a patch that doesn't melt the world after lunch
2014-04-08 18:22:25 <greg-g> bd808|deploy: :)
2014-04-08 18:23:15 <aude> wikiquote etc all looks fine, so i'm going home / eating
2014-04-08 18:23:20 <aude> back in hour
2014-04-08 18:23:26 <greg-g> k, I'll do the same
2014-04-08 18:23:33 <Nemo_bis> quite late dinner for berlin
2014-04-08 18:23:47 <manybubbles> so I told my wife we broke the internet. she told me facebook was working....
2014-04-08 18:24:18 <hoo> Nemo_bis: It's never to late for food :P
2014-04-08 18:24:41 <Jamesofur> ^
2014-04-08 18:28:38 <Nemo_bis> hoo: well, I'd call death for starvation, pellagra etc. "too late" :P
2014-04-08 18:29:07 <hoo> Nemo_bis: :P To late as in time of the day...
2014-04-08 18:29:08 <hoo> :D
2014-04-08 18:30:17 <ori> hoo: http://p.defau.lt/?md_cbLJuORDNsGkhY6_NAg :P
2014-04-08 18:30:55 <hoo> at least the other errors are gone now, I guess
2014-04-08 18:31:28 <greg-g> manybubbles: :(
2014-04-08 18:31:42 <greg-g> goes to lunch for real
2014-04-08 18:32:34 <ori> hoo: yeah, i submitted a patch for hhvm to fix that other issue btw
2014-04-08 18:32:49 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.144
2014-04-08 18:34:15 <hoo> ori: Oh... nice that it's actually done in PHP :)
2014-04-08 18:35:34 <manybubbles> yeah yeah yeah, elasticsearch 1012 is being upgraded
2014-04-08 18:37:56 <ori> hoo: which component should that be filed under?
2014-04-08 18:39:25 <hoo> ori: already done https://bugzilla.wikimedia.org/show_bug.cgi?id=63691
2014-04-08 18:39:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 639.299988
2014-04-08 18:39:40 <ori> oh cool, thanks!
2014-04-08 18:42:09 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 530.733337
2014-04-08 18:42:20 <hoo> ori: Any idea who to poke about https://gerrit.wikimedia.org/r/121709 ?
2014-04-08 18:43:46 <grrrit-wm> ('CR') 'Matanya': add interface speed check for all hosts ('2' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124606' (owner: 'Cmjohnson')
2014-04-08 18:44:08 <grrrit-wm> ('PS2') 'Ori.livneh': Change wgServer and wgCanonicalServer for arbcom wikis [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121709' (owner: 'Hoo man')
2014-04-08 18:44:53 <grrrit-wm> ('CR') 'Ori.livneh': [C: '2'] Change wgServer and wgCanonicalServer for arbcom wikis [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121709' (owner: 'Hoo man')
2014-04-08 18:45:06 <logmsgbot> !log ori updated /a/common to {{Gerrit|I4b18e4ce8}}: Change wgServer and wgCanonicalServer for arbcom wikis
2014-04-08 18:45:11 <morebots> Logged the message, Master
2014-04-08 18:45:28 <hoo> heh :)
2014-04-08 18:45:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:45:50 <logmsgbot> !log ori synchronized wmf-config/InitialiseSettings.php 'I4b18e4ce8: Change wgServer and wgCanonicalServer for arbcom wikis'
2014-04-08 18:45:55 <morebots> Logged the message, Master
2014-04-08 18:53:40 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:56:09 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:57:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 172.800003
2014-04-08 18:58:59 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 18:59:00 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:00 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:00 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:00 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:09 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.10
2014-04-08 18:59:09 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:09 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:10 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1016 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:29 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 19:00:03 <manybubbles> blhe
2014-04-08 19:00:11 <manybubbles> it recovered in a few seconds
2014-04-08 19:00:16 <manybubbles> not sure why it did that
2014-04-08 19:07:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 341.200012
2014-04-08 19:12:00 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1001 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:00 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:00 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:00 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:10 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:11 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1013 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:11 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1016 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:11 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:13:16 <manybubbles> thats right
2014-04-08 19:13:18 <manybubbles> horrible check
2014-04-08 19:13:36 <manybubbles> no errors in the logs associated with those warnings
2014-04-08 19:18:49 <icinga-wm> RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000
2014-04-08 19:20:55 <huh> https://en.wikipedia.org/wiki/Wikipedia:VPT#Heartbleed_bug.3F
2014-04-08 19:23:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 531.166687
2014-04-08 19:24:29 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.12
2014-04-08 19:24:49 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:50 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:50 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:25:09 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 635.799988
2014-04-08 19:25:11 <Jamesofur> kicks icinga-wm
2014-04-08 19:26:39 <icinga-wm> PROBLEM - DPKG on elastic1015 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
2014-04-08 19:28:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 19:29:38 <matanya> huh: it is being fixed by ops
2014-04-08 19:31:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 19:36:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 19:37:49 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:49 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:50 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:00 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:07 <huh> again?
2014-04-08 19:38:09 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 19:38:10 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:10 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:10 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1016 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.13
2014-04-08 19:38:10 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:29 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 224.199997
2014-04-08 19:39:39 <icinga-wm> RECOVERY - DPKG on elastic1015 is OK: All packages OK
2014-04-08 19:40:19 <manybubbles> oh shut up
2014-04-08 19:40:52 <manybubbles> I'm doing rolling restarts
2014-04-08 19:41:47 <manybubbles> got it: labswiki_content_1394813391
2014-04-08 19:41:53 <manybubbles> that thing is configured without replicas
2014-04-08 19:46:40 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 341.066681
2014-04-08 19:48:00 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:01 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:01 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1001 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:01 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:10 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:10 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1013 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:10 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:10 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1016 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:30 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1015 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:43 <manybubbles> and, more noise!
2014-04-08 19:48:49 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:49 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:49 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5308: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 303
2014-04-08 19:48:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5308: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 303
2014-04-08 19:48:59 <icinga-wm> PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5308: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 303
2014-04-08 19:49:22 <manybubbles> bit me labswiki!
2014-04-08 19:52:34 <bd808|LUNCH> cheers manybubbles on
2014-04-08 19:52:53 <manybubbles> it'll spam us again in a few minutes
2014-04-08 19:52:59 <manybubbles> labswiki recovered a long time ago
2014-04-08 19:53:05 <manybubbles> it was only out for ~30 seconds each time
2014-04-08 19:53:20 <manybubbles> but ganglia wants all the shards on all the wikis to be recovered before it is happy
2014-04-08 19:53:59 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1005 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:53:59 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:53:59 <icinga-wm> RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:56:15 <manybubbles> !log upgraded all elasticsearch servers except elastic1008. that is coming now.
2014-04-08 19:56:20 <morebots> Logged the message, Master
2014-04-08 19:58:20 <manybubbles> !log finished upgrading to Elasticsearch 1.1.0. The process went well with no issues other then some knocking out search in labs 3 times for 30 seconds a piece. And logging lots of nasty warnings to irc. I've started to the process to fix search in labs so it won't happen again.
2014-04-08 19:58:25 <morebots> Logged the message, Master
2014-04-08 20:05:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 420.066681
2014-04-08 20:08:09 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 539.900024
2014-04-08 20:10:29 <icinga-wm> PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 20:10:29 <icinga-wm> PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 20:10:29 <icinga-wm> PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 20:10:29 <icinga-wm> PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 20:10:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:12:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:16:56 <se4598> Does someone here know about dns issues with wmflabs-domains or related stuff that happened recently?
2014-04-08 20:19:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:20:41 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 176.399994
2014-04-08 20:22:09 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:26:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 368.466675
2014-04-08 20:28:02 <cajoel> re:heartbleed, I think we'll be wanting a new corp certificate... do you guys have a favorite vendor for star certs these days?
2014-04-08 20:28:21 <cajoel> it's almost due for a re-up anyway, so it's worth the effort
2014-04-08 20:29:53 <ebernhardson> r
2014-04-08 20:48:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 642.700012
2014-04-08 20:51:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:51:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:52:09 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 537.099976
2014-04-08 20:59:46 <odder> greg-g: don't believe you
2014-04-08 20:59:58 <odder> http://lists.wikimedia.org/pipermail/wikitech-ambassadors/2014-April/000666.html
2014-04-08 21:00:04 <odder> This is the work of the Beast
2014-04-08 21:00:11 <bd808> greg-g: Do you still want to try group1 to 1.23wmf21 today or have we had enough excitement?
2014-04-08 21:00:53 <apergos> reminds folks that all ops are out at a bar except for those who are about to go to sleep :-D
2014-04-08 21:01:06 <greg-g> bd808: we're back to "if you run scap, run it twice" world, right?
2014-04-08 21:01:10 <greg-g> apergos: :)
2014-04-08 21:01:23 <greg-g> odder: which part? :)
2014-04-08 21:01:36 <bd808> greg-g: Yes, but for group1 to 1.23wmf21 we only need to run sync-wikiversions
2014-04-08 21:01:49 <greg-g> right
2014-04-08 21:02:09 <greg-g> the world looks sane on phase0?
2014-04-08 21:02:11 <greg-g> looks
2014-04-08 21:02:34 <odder> greg-g: all of it - notice the number immediately preceding .html
2014-04-08 21:02:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 232.46666
2014-04-08 21:02:48 <greg-g> odder: haha
2014-04-08 21:03:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:03:54 <greg-g> this is neat: https://graphite.wikimedia.org/render/…
2014-04-08 21:04:36 <greg-g> I think that's what ori told me yesterdayt to not worry about
2014-04-08 21:05:09 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:05:25 <greg-g> bd808: if we do, we do now, so we have 2 hours before SWAT of settle bug report time. May I take your whole day?
2014-04-08 21:06:36 <bd808> greg-g: I'm yours to command. :)
2014-04-08 21:06:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 269.866669
2014-04-08 21:06:42 <odder> http://heartbleed.com/
2014-04-08 21:06:48 <odder> Q&ampA
2014-04-08 21:06:55 <odder> :-P
2014-04-08 21:07:09 <greg-g> bd808: go forth, please
2014-04-08 21:09:36 <grrrit-wm> ('PS1') 'BryanDavis': Group1 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124744'
2014-04-08 21:11:12 <grrrit-wm> ('CR') 'BryanDavis': [C: '2'] Group1 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124744' (owner: 'BryanDavis')
2014-04-08 21:11:20 <grrrit-wm> ('Merged') 'jenkins-bot': Group1 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124744' (owner: 'BryanDavis')
2014-04-08 21:12:17 <logmsgbot> !log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.23wmf21
2014-04-08 21:12:23 <morebots> Logged the message, Master
2014-04-08 21:12:47 <hoo> greg-g: Have you guys already killed all user sessions?
2014-04-08 21:12:52 <hoo> Can't see a server admin log entry
2014-04-08 21:15:44 <odder> greg-g: I did a https://commons.wikimedia.org/wiki/Commons:Village_pump#Users_are_being_forced_to_log_out
2014-04-08 21:18:21 <Jamesofur> Thanks odder, I left a note about it on en VPT since I saw a question about the bug in general
2014-04-08 21:18:48 <odder> Maybe I'll cross-post that to Meta too
2014-04-08 21:19:59 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 21:20:14 <logmsgbot> !log bd808 Purged l10n cache for 1.23wmf18
2014-04-08 21:20:19 <morebots> Logged the message, Master
2014-04-08 21:21:46 <logmsgbot> !log bd808 Purged l10n cache for 1.23wmf19
2014-04-08 21:21:50 <morebots> Logged the message, Master
2014-04-08 21:21:54 <greg-g> hoo: in process
2014-04-08 21:22:55 <hoo> :)
2014-04-08 21:23:09 <greg-g> hoo: it takes longer than you'd imagine, maybe :)
2014-04-08 21:23:37 <bd808|deploy> greg-g: group1 to 1.23wmf21 is {{done}}
2014-04-08 21:23:40 <se4598> greg-g: just change the cookie name? (like last time)
2014-04-08 21:24:09 <greg-g> se4598: I'm defering to chris on it (not sure what his exact process is, honestly)
2014-04-08 21:24:14 <greg-g> bd808|deploy: ty
2014-04-08 21:24:53 <se4598> mh, the tokens will be still valid I think, wasn't a good idea
2014-04-08 21:25:14 <bd808> se4598: Yeah I think that's why it takes a while
2014-04-08 21:26:45 <hoo> greg-g: Well given how many users we have and that we probably don't want to hammer the DBs to much, I can imagine this to take some time
2014-04-08 21:26:52 <greg-g> nods
2014-04-08 21:28:16 <hoo> csteipp: Why not run one process per shard?
2014-04-08 21:29:24 <odder> Jamesofur: if you're keeping track of things, I alerted Commons and Meta; perhaps someone would need to alert the other big Wikipedias
2014-04-08 21:29:35 <odder> Dunno if the message to tech-ambassadors will be enough; may be.
2014-04-08 21:30:35 <grrrit-wm> ('PS2') 'MaxSem': Put a safeguard on GeoData's usage of CirrusSearch [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121874'
2014-04-08 21:30:37 <grrrit-wm> ('PS1') 'MaxSem': Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747'
2014-04-08 21:30:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:30:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 535.0
2014-04-08 21:30:54 <grrrit-wm> ('CR') 'jenkins-bot': [V: '-1'] Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747' (owner: 'MaxSem')
2014-04-08 21:31:32 <csteipp> se4598: Assuming attacker has the login token, they could use the new name and again spoof the user
2014-04-08 21:31:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:31:46 <grrrit-wm> ('PS2') 'MaxSem': Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747'
2014-04-08 21:32:09 <Jamesofur> odder: yeah, I'll see if we can poke people, we're going to send out SM messages as well in a couple minutes
2014-04-08 21:32:19 <Jamesofur> with a recommendation to password reset
2014-04-08 21:33:09 <odder> SM?
2014-04-08 21:33:22 <Jamesofur> sorry, Social Media (Twitter/Facebook/G+ etc)
2014-04-08 21:33:42 <odder> TMA, Too Many Abbreviations
2014-04-08 21:33:45 <odder> :)
2014-04-08 21:33:59 <Jamesofur> yup lol
2014-04-08 21:34:09 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 539.133362
2014-04-08 21:34:10 <Jamesofur> I abuse them, I even make up my own and forget that they are just in my head
2014-04-08 21:34:23 <HaeB> https://twitter.com/Wikimedia/status/453646877397757953
2014-04-08 21:34:49 <JohnLewis> Jamesofur: EUS IAA. TA IANAL.
2014-04-08 21:34:58 <JohnLewis> *EYS :p
2014-04-08 21:35:42 <odder> thanks HaeB, retweeted
2014-04-08 21:40:46 <aude> woah, new code on wikidata?
2014-04-08 21:40:46 <matanya> Jamesofur: using mass-message might be a good idea
2014-04-08 21:41:15 <greg-g> aude: yep, all ok?
2014-04-08 21:41:26 <Jamesofur> HaeB: ^ what do you think? (about MM)
2014-04-08 21:41:48 <greg-g> wdyt?
2014-04-08 21:42:08 <JohnLewis> greg-g: itjdi
2014-04-08 21:42:12 <aude> so we're confident?
2014-04-08 21:42:39 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 187.866669
2014-04-08 21:42:53 <greg-g> aude: in that it won't break at 2:00 utc? yeah
2014-04-08 21:43:06 <greg-g> aude: the only thing we're still not confident about is scap on thursday
2014-04-08 21:44:19 <aude> alright
2014-04-08 21:44:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:44:40 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 320.200012
2014-04-08 21:44:55 <HaeB> Jamesofur, matanya: i think for the session ending, massmessage would be overkill. regarding the password reset, it's a judgment call (how high one estimates the risk for users who don't change it)
2014-04-08 21:45:24 <matanya> HaeB: it depends on user rights as well
2014-04-08 21:45:27 <bd808> aude: The bug that caused all the 1.23wmf21 l10n issues is https://bugzilla.wikimedia.org/show_bug.cgi?id=63659
2014-04-08 21:46:31 <HaeB> are there any other major sites who notified all users?
2014-04-08 21:46:54 <Jamesofur> not that I've seen yet, but I have a feeling some are still going through the fixing process
2014-04-08 21:46:55 <aude> interesting
2014-04-08 21:46:59 <HaeB> (to recommend a password chanage)
2014-04-08 21:47:10 <hoo> eg. just got stuff from CloudBees
2014-04-08 21:47:15 <hoo> github also logged me out
2014-04-08 21:47:37 <HaeB> would also be interesting to know how quick the wikis were fixed after the news broke yesterday
2014-04-08 21:47:40 <Jamesofur> latimes has an article about resetting your password, but that's different
2014-04-08 21:48:09 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:48:13 <HaeB> last night (PT) i filed a RT ticket for the blog, which was vulnerable at the time, but at that point the wikis tested ok already
2014-04-08 21:48:36 <hoo> The wikis auto update OpenSSL via puppet
2014-04-08 21:49:00 <Jamesofur> hoo: well ya ;) the question is when we updated puppet ;)
2014-04-08 21:49:24 <hoo> Jamesofur: The servers do that themselves
2014-04-08 21:49:39 <HaeB> per https://wikitech.wikimedia.org/wiki/Server_admin_log , the blog (holmium) was pretty late in the game
2014-04-08 21:49:50 <bd808> The timeline is all in SAL from last night
2014-04-08 21:49:51 <hoo> Yesterday I posted about that to the internal ops list, but forgot to poke a root to do a apt-cache clean and force puppet run
2014-04-08 21:50:08 <HaeB> "04:03 Tim: upgrading libssl on ssl1001,ssl1002,ssl1003,ssl1004,ssl1005,ssl1006,ssl1007,ssl1008,ssl1009,ssl3001.esams.wikimedia.org,ssl3002.esams.wikimedia.org,ssl3003.esams.wikimedia.org" - is that the entry for the wikis?
2014-04-08 21:50:37 <bd808> Mostly yes
2014-04-08 21:53:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:53:39 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:53:59 <icinga-wm> RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000
2014-04-08 21:54:55 <grrrit-wm> ('PS1') 'Jean-Frédéric': Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124754'
2014-04-08 22:01:11 <mwalker> greg-g, poking you because I'm not sure who's on point for the i18n / scap stuff -- but I recall getting pinged a couple of days ago (on a centralnotice keyword) saying that the i18n update was failing due to exceptions on CN (and others). I'm wondering if CN's fail was due to being on a deployment branch that did not have the JSON updates (until just now).
2014-04-08 22:01:46 <greg-g> shouldn't be
2014-04-08 22:01:57 <greg-g> there's backward compat in l10nupdate
2014-04-08 22:02:17 <greg-g> mwalker: see https://bugzilla.wikimedia.org/show_bug.cgi?id=63659 for all the gorey details
2014-04-08 22:02:33 <mwalker> puts on tyvek suit
2014-04-08 22:02:38 <greg-g> :)
2014-04-08 22:30:59 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 22:33:06 <csteipp> greg-g: Could I push a small centralauth update soon?
2014-04-08 22:33:44 <greg-g> yeah, now is fine, 30 minutes until swat
2014-04-08 22:34:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:36:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:37:04 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 34.533333
2014-04-08 22:37:34 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 260.733337
2014-04-08 22:38:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:40:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:42:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:44:14 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 625.166687
2014-04-08 22:44:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:45:36 <se4598> marktraceur: I see in deploy-calendar that you have changeset which especially activates MediaViewer on en-beta. You(r pc) may get hit by https://bugzilla.wikimedia.org/show_bug.cgi?id=63709
2014-04-08 22:46:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:47:22 <marktraceur> se4598: Is there a fix?
2014-04-08 22:47:50 <marktraceur> I'm guessing it's an SSL problem
2014-04-08 22:48:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:48:43 <marktraceur> se4598: Replied on bug
2014-04-08 22:49:09 <grrrit-wm> ('PS1') 'BryanDavis': Create symlink for compile-wikiversions in /usr/local/bin [operations/puppet] - 'https://gerrit.wikimedia.org/r/124763'
2014-04-08 22:49:23 <se4598> marktraceur: We in #wikimedia-labs haven't one. And thats not about https but dns resolve, so I don't understand what do you mean by https?
2014-04-08 22:49:35 <marktraceur> Oh, hm
2014-04-08 22:49:37 <marktraceur> Never mind, sorry
2014-04-08 22:50:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:52:04 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 22:52:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:52:34 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 22:52:56 <se4598> marktraceur: currently the fix is.....: it may work if you try multiple times or wait some time (minutes, hours) ;P
2014-04-08 22:54:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:56:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:56:54 <icinga-wm> RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000
2014-04-08 22:58:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:58:41 <hoo> greg-g: csteipp: got both core changes ready
2014-04-08 22:58:53 <hoo> I mean changes to the deploy branch
2014-04-08 22:59:52 <csteipp> hoo: Cool.. one sec and I'll merge and deploy it
2014-04-08 23:00:12 <hoo> I can also jump in, am on tin still anyway
2014-04-08 23:00:14 <icinga-wm> RECOVERY - Puppet freshness on mw1109 is OK: puppet ran at Tue Apr 8 23:00:04 UTC 2014
2014-04-08 23:02:24 <icinga-wm> PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 11:00:04 PM UTC
2014-04-08 23:05:24 <greg-g> stupid puppet
2014-04-08 23:06:33 <Jasper_Deng> always wondered what Puppet does anyways
2014-04-08 23:07:09 <Jamesofur> pulls the strings ;)
2014-04-08 23:07:20 <Jamesofur> (or, probably better 'is the strings' )
2014-04-08 23:07:26 <hoo> Jasper_Deng: Playing with the servers :D
2014-04-08 23:08:20 <JohnLewis> Technically, the sysadmins are a puppet in the WMFs plans, right? :p
2014-04-08 23:08:37 <logmsgbot> !log csteipp synchronized php-1.23wmf21/extensions/CentralAuth/maintenance 'Push maintenance script for token reset'
2014-04-08 23:08:39 <Jamesofur> or we're all just puppets in their plans, duh
2014-04-08 23:08:41 <morebots> Logged the message, Master
2014-04-08 23:09:04 <JohnLewis> Jamesofur: You're the past of the puppets :p
2014-04-08 23:09:09 <JohnLewis> *master of the
2014-04-08 23:09:57 <csteipp> greg-g: CentralAuth updates are out, so swat can go ahead if they were waiting on me
2014-04-08 23:10:01 <Jamesofur> ;) the user with said name may dislike me claiming the title
2014-04-08 23:10:40 <greg-g> mwalker: ori ebernhardson ^
2014-04-08 23:10:46 <greg-g> also, what the heck, oit_display ?
2014-04-08 23:10:54 <greg-g> :)
2014-04-08 23:11:10 <icinga-wm> PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 23:11:10 <icinga-wm> PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 23:11:10 <icinga-wm> PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 23:11:10 <icinga-wm> PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 23:11:51 <mwalker> oh
2014-04-08 23:11:54 <mwalker> yes; it's 4!
2014-04-08 23:13:25 <Danny_B> SUL doesn't work?
2014-04-08 23:14:02 <mwalker> csteipp, ^
2014-04-08 23:14:03 <hoo> Danny_B: We are logging out all users
2014-04-08 23:14:10 <hoo> see http://lists.wikimedia.org/pipermail/wikitech-ambassadors/2014-April/000666.html
2014-04-08 23:14:32 <MaxSem> csteipp, warn ppl with a site notice?
2014-04-08 23:14:35 <se4598> hoo: you know that this isn't merged? https://gerrit.wikimedia.org/r/124756
2014-04-08 23:15:00 <hoo> se4598: not this important at the very moments
2014-04-08 23:15:03 <hoo> * moment
2014-04-08 23:15:23 <csteipp> Danny_B: SUL should work... You should just be logged out. If you can't login, let me know
2014-04-08 23:15:53 <Jamesofur> csteipp: will we get logged out each time you hit a wiki we've visited recently? or just the once per user in theory
2014-04-08 23:16:15 <csteipp> If you're a global user, just once (right now as I logout all the centralauth users)
2014-04-08 23:16:32 <csteipp> If you have multiple ununified local accounts, each will get logged out
2014-04-08 23:16:51 <Danny_B> csteipp: i have to log in on every single project although i have central username
2014-04-08 23:16:54 <Amgine> <grumbles about that><waves fist impotently at it.wp>
2014-04-08 23:17:30 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 135.300003
2014-04-08 23:17:55 <mwalker> marktraceur, MaxSem I'm going to +2 and confirm https://gerrit.wikimedia.org/r/#/c/124036/2 , https://gerrit.wikimedia.org/r/#/c/121874/2 , https://gerrit.wikimedia.org/r/#/c/124747/
2014-04-08 23:18:30 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 173.666672
2014-04-08 23:18:32 <mwalker> it would be wonderful if you all could +1 that so that I know you've looked and said this is good to me
2014-04-08 23:18:35 <marktraceur> 'kay
2014-04-08 23:18:53 <Danny_B> csteipp: +1 to notice ppl with central notice
2014-04-08 23:18:57 <grrrit-wm> ('CR') 'MarkTraceur': [C: ''] Add setting to show a survey for MediaViewer users on some sites [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036' (owner: 'Gerg? Tisza')
2014-04-08 23:19:00 <MaxSem> +1 ourselves?
2014-04-08 23:19:16 <MaxSem> doesn't sound very assuring:)
2014-04-08 23:19:21 <mwalker> nah; you're probably OK MaxSem :p
2014-04-08 23:19:27 <mwalker> but I don't know who Gergo is
2014-04-08 23:19:44 <mwalker> but mark was sponsoring the patch
2014-04-08 23:19:53 <MaxSem> he's tgr :P
2014-04-08 23:20:00 <grrrit-wm> ('CR') 'Mwalker': [C: '2'] Put a safeguard on GeoData's usage of CirrusSearch [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121874' (owner: 'MaxSem')
2014-04-08 23:20:08 <grrrit-wm> ('CR') 'Mwalker': [C: '2'] Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747' (owner: 'MaxSem')
2014-04-08 23:20:21 <grrrit-wm> ('CR') 'Mwalker': [C: '2'] Add setting to show a survey for MediaViewer users on some sites [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036' (owner: 'Gerg? Tisza')
2014-04-08 23:20:27 <ori> greg-g: missed your ping; still need me?
2014-04-08 23:21:00 <greg-g> dont think so
2014-04-08 23:23:33 <mwalker> interesting; sync-common doesn't log to IRC?
2014-04-08 23:23:34 <csteipp> Danny_B: That doesn't sound right.. At the risk of sounding cliche, can you log out and log back in, and see if that helps?
2014-04-08 23:23:55 <mwalker> marktraceur, MaxSem can you tell if your configuration stuff got pushed?
2014-04-08 23:24:15 <MaxSem> mwalker, mine's noop on prod
2014-04-08 23:24:25 <marktraceur> Ditto, but will check on beta
2014-04-08 23:24:26 <MaxSem> checking if prod still works...
2014-04-08 23:24:35 <mwalker> also; marktraceur I presume you want https://gerrit.wikimedia.org/r/#/c/124510/ to go to wmf20 and wmf21?
2014-04-08 23:24:38 <HaeB> Danny_B, hoo : we're still thinking about massmessage instead (more for the password changing advice)
2014-04-08 23:24:43 <marktraceur> mwalker: Sorry, only 21
2014-04-08 23:25:24 <marktraceur> mwalker: Confirmed, beta has the configuration we wanted
2014-04-08 23:26:36 <MaxSem> mwalker, lgtm
2014-04-08 23:27:40 <icinga-wm> PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 23:28:34 <Danny_B> csteipp: log out from any currently logged project, log back to it and then try if sul works on other?
2014-04-08 23:29:14 <csteipp> Danny_B: Yeah
2014-04-08 23:29:22 <Danny_B> csteipp: ok, sec
2014-04-08 23:29:38 <csteipp> Hmm... Danny_B What's you're wiki username?
2014-04-08 23:30:51 <icinga-wm> RECOVERY - Puppet freshness on mw1109 is OK: puppet ran at Tue Apr 8 23:30:43 UTC 2014
2014-04-08 23:30:55 <Danny_B> csteipp: Danny B.
2014-04-08 23:31:17 <Danny_B> csteipp: seems to work now, will let you know if i'll spot another disconnection
2014-04-08 23:31:27 <csteipp> Danny_B: Cool, thanks
2014-04-08 23:32:03 <Danny_B> yw
2014-04-08 23:32:15 <Danny_B> thanks for care
2014-04-08 23:33:30 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 23:34:31 <logmsgbot> !log mwalker synchronized php-1.23wmf21/extensions/MultimediaViewer/ 'Updating MultimediaViewer for {{gerrit|124510}}'
2014-04-08 23:34:35 <morebots> Logged the message, Master
2014-04-08 23:35:16 <mwalker> marktraceur, ^ if you would test what you need to test for that
2014-04-08 23:35:26 <mwalker> I'm not seeing any fatals or exceptions which is good :)
2014-04-08 23:35:31 <icinga-wm> RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 23:35:32 <marktraceur> mwalker: Works
2014-04-08 23:35:32 <marktraceur> Ta
2014-04-08 23:35:39 <mwalker> cool; greg-g SWAT done
2014-04-08 23:58:30 <icinga-wm> PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 179.666672
2014-04-08 23:59:04 <jackmcbarn> "Firefox can't find the server at en.wikipedia.beta.wmflabs.org."
2014-04-08 23:59:08 <jackmcbarn> why?
2014-04-08 23:59:14 <grrrit-wm> ('CR') 'Aaron Schulz': [C: ''] Create symlink for compile-wikiversions in /usr/local/bin [operations/puppet] - 'https://gerrit.wikimedia.org/r/124763' (owner: 'BryanDavis')
2014-04-08 23:59:31 <marktraceur> jackmcbarn: https://bugzilla.wikimedia.org/show_bug.cgi?id=63709 probably

This page is generated from SQL logs, you can also download static txt files from here