Wikimedia IRC logs browser

2014-04-08 00:00:35	<RoanKattouw>	Sure thing
2014-04-08 00:00:42	<RoanKattouw>	I think that's the SWAT all done
2014-04-08 00:00:44	<RoanKattouw>	Sorry for the slowness everyone
2014-04-08 00:01:16	<bd808>	RoanKattouw: If it makes my mailbox less full of debate about font faces...
2014-04-08 00:01:36	<bd808>	is sure that muting those threads will continue
2014-04-08 00:02:28	<icinga-wm>	PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 7.426 second response time
2014-04-08 00:08:52	<bd808>	looks for a python reviewer for: https://gerrit.wikimedia.org/r/#/c/124500/
2014-04-08 00:09:10	<bd808>	I think that will fix the 1.23wmf21 l10n problems
2014-04-08 00:09:30	<bd808>	Because … mystery action at a distance!
2014-04-08 00:12:27	<icinga-wm>	RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 219118 bytes in 8.455 second response time
2014-04-08 00:24:56	<logmsgbot>	!log catrope synchronized php-1.23wmf20/extensions/VisualEditor 'it helps if you run git submodule update first'
2014-04-08 00:25:02	<morebots>	Logged the message, Master
2014-04-08 00:25:05	<logmsgbot>	!log catrope synchronized php-1.23wmf21/extensions/VisualEditor 'it helps if you run git submodule update first'
2014-04-08 00:25:11	<morebots>	Logged the message, Master
2014-04-08 00:27:34	<grrrit-wm>	('PS1') 'BryanDavis': test2wiki to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124505'
2014-04-08 00:28:54	<bd808>	RoanKattouw_away: Are you {{done}} done now? I'd like to run some more scap tests
2014-04-08 00:38:27	<grrrit-wm>	('Abandoned') 'BryanDavis': l10nupdate: Add temporary debugging captures [operations/puppet] - 'https://gerrit.wikimedia.org/r/124467' (owner: 'BryanDavis')
2014-04-08 00:38:40	<grrrit-wm>	('PS2') 'BryanDavis': test2wiki to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124505'
2014-04-08 00:39:44	<grrrit-wm>	('Abandoned') 'BryanDavis': test2wiki to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124505' (owner: 'BryanDavis')
2014-04-08 00:41:34	<grrrit-wm>	('PS1') 'BryanDavis': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124506'
2014-04-08 00:43:55	<bd808>	greg-g: Are you still on a bus? I'd like to scap group0 to 1.23wmf21 to test my band aid fix. I would be on the hook to revert immediately following if ExtensionMessages looks like it will cause a problem for l10nupdate.
2014-04-08 00:44:03	<RoanKattouw_away>	bd808: Yes, sorry
2014-04-08 00:44:43	<bd808>	RoanKattouw_away: :) thanks. I watched your idle time on tin climb until I felt safe.
2014-04-08 00:45:28	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 00:46:57	<bd808>	decides that greg-g won't have changed his mind in the last 1:30 and proceeds
2014-04-08 00:48:38	<grrrit-wm>	('CR') 'BryanDavis': [C: '2'] "Approving to test band aid fix for ExtensionMessages generation problem. Will revert if ExtensionMessages doesn't look right after scap." [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124506' (owner: 'BryanDavis')
2014-04-08 00:48:45	<grrrit-wm>	('Merged') 'jenkins-bot': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124506' (owner: 'BryanDavis')
2014-04-08 00:50:53	<logmsgbot>	!log bd808 Started scap: group0 to 1.23wmf21 (testing python change for mwversionsinuse)
2014-04-08 00:50:58	<morebots>	Logged the message, Master
2014-04-08 00:53:12	<bd808>	sees l10n cache updating yet again for 1.23wmf21 and loses all confidence in his "fix"
2014-04-08 00:53:51	<logmsgbot>	!log bd808 scap aborted: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (duration: 02m 57s)
2014-04-08 00:53:56	<morebots>	Logged the message, Master
2014-04-08 00:54:30	<logmsgbot>	!log bd808 Started scap: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (again)
2014-04-08 00:54:35	<morebots>	Logged the message, Master
2014-04-08 00:54:56	<logmsgbot>	!log bd808 scap aborted: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (again) (duration: 00m 25s)
2014-04-08 00:55:01	<morebots>	Logged the message, Master
2014-04-08 00:55:12	<grrrit-wm>	('PS1') 'BryanDavis': Revert "Group0 wikis to 1.23wmf21" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124507'
2014-04-08 00:55:34	<grrrit-wm>	('CR') 'BryanDavis': [C: '2'] Revert "Group0 wikis to 1.23wmf21" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124507' (owner: 'BryanDavis')
2014-04-08 00:55:42	<grrrit-wm>	('Merged') 'jenkins-bot': Revert "Group0 wikis to 1.23wmf21" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124507' (owner: 'BryanDavis')
2014-04-08 00:56:51	<logmsgbot>	!log bd808 Started scap: revert group0 to 1.23wmf21 (testwiki still on 1.23wmf21)
2014-04-08 00:56:55	<morebots>	Logged the message, Master
2014-04-08 01:01:33	<grrrit-wm>	('PS3') 'Ori.livneh': Add EventLogging Kafka writer plug-in [operations/puppet] - 'https://gerrit.wikimedia.org/r/85337'
2014-04-08 01:06:45	<logmsgbot>	!log bd808 Finished scap: revert group0 to 1.23wmf21 (testwiki still on 1.23wmf21) (duration: 09m 54s)
2014-04-08 01:06:53	<morebots>	Logged the message, Master
2014-04-08 01:22:25	<StevenW>	ori: working now
2014-04-08 01:22:29	<StevenW>	\o/
2014-04-08 02:07:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 02:07:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 02:07:08	<icinga-wm>	PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 02:07:08	<icinga-wm>	PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 02:15:58	<logmsgbot>	!log LocalisationUpdate completed (1.23wmf20) at 2014-04-08 02:15:58+00:00
2014-04-08 02:16:06	<morebots>	Logged the message, Master
2014-04-08 02:34:57	<logmsgbot>	!log LocalisationUpdate completed (1.23wmf21) at 2014-04-08 02:34:56+00:00
2014-04-08 02:35:02	<morebots>	Logged the message, Master
2014-04-08 02:45:57	<icinga-wm>	PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 02:48:37	<icinga-wm>	PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 02:48:57	<icinga-wm>	RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds
2014-04-08 02:49:06	<ori>	springle_: db1047 has been very sad lately
2014-04-08 02:49:27	<icinga-wm>	RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds
2014-04-08 03:00:17	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 03:08:06	<bawolff>	With 1.23wmf21 not getting deployed to mediawiki.org last thursday, does that mean the deployment schedule for 1.23wmf22 will be off by a week?
2014-04-08 03:11:07	<logmsgbot>	!log LocalisationUpdate ResourceLoader cache refresh completed at Tue Apr 8 03:11:04 UTC 2014 (duration 11m 3s)
2014-04-08 03:11:11	<morebots>	Logged the message, Master
2014-04-08 03:31:47	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 03:38:12	<aude>	greg-g: still around?
2014-04-08 03:53:36	<aude>	greg-g: check your mail
2014-04-08 04:03:35	<TimStarling>	!log upgrading libssl on ssl1001,ssl1002,ssl1003,ssl1004,ssl1005,ssl1006,ssl1007,ssl1008,ssl1009,ssl3001.esams.wikimedia.org,ssl3002.esams.wikimedia.org,ssl3003.esams.wikimedia.org
2014-04-08 04:03:41	<morebots>	Logged the message, Master
2014-04-08 04:03:57	<Jasper_Deng>	TimStarling: is this the heartbleed.com thing?
2014-04-08 04:04:07	<Jasper_Deng>	didn't know we used openssl
2014-04-08 04:15:22	<TimStarling>	Jasper_Deng: yes
2014-04-08 04:15:47	<TimStarling>	!log also upgraded libssl on cp4001-4019. Restarted nginx on these servers and also the previous list.
2014-04-08 04:15:51	<morebots>	Logged the message, Master
2014-04-08 04:37:40	<Ryan_Lane>	!log upgrading libssl on virt1000
2014-04-08 04:37:44	<morebots>	Logged the message, Master
2014-04-08 04:38:21	<Ryan_Lane>	!log upgrading libssl on virt0
2014-04-08 04:38:26	<morebots>	Logged the message, Master
2014-04-08 04:41:03	<TimStarling>	!log upgraded libssl on zirconium.wikimedia.org,neon.wikimedia.org,netmon1001.wikimedia.org,iodine.wikimedia.org,ytterbium.wikimedia.org,gerrit.wikimedia.org,virt1000.wikimedia.org,labs-ns1.wikimedia.org,stat1001.wikimedia.org
2014-04-08 04:43:13	<TimStarling>	!log restarted apache on the above list, failed on labs-ns1, virt1000, ytterbium
2014-04-08 04:43:18	<morebots>	Logged the message, Master
2014-04-08 04:43:47	<^d>	TimStarling: I'll poke ytterbium
2014-04-08 04:44:00	<^d>	Keep moving on to other boxes if you need.
2014-04-08 04:44:35	<^d>	Seems up now.
2014-04-08 04:45:04	<TimStarling>	yeah, labs-ns1 and virt1000 are actually the same server
2014-04-08 04:45:19	<TimStarling>	and apache is running there with stime after the upgrade
2014-04-08 04:46:30	<TimStarling>	!log on dataset1001: upgraded libssl and restarted lighttpd
2014-04-08 04:46:34	<morebots>	Logged the message, Master
2014-04-08 04:53:47	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 05:08:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 05:08:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 05:08:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 05:08:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 05:25:10	<grrrit-wm>	('PS1') 'Aude': Enable Wikibase on Wikiquote [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516'
2014-04-08 05:26:24	<grrrit-wm>	('CR') 'Aude': [C: '-2'] "requires sites and site_identifiers tables to be added and populated on wikiquote" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
2014-04-08 05:31:00	<_joe_>	!log upgraded openssl on cp10* and cp30* servers as well
2014-04-08 05:31:06	<morebots>	Logged the message, Master
2014-04-08 05:39:29	<apergos>	!log restarted apache on fenari magnesium yterrbium antimony
2014-04-08 05:39:33	<morebots>	Logged the message, Master
2014-04-08 05:39:51	<apergos>	with some mispellings but people will get the point
2014-04-08 05:47:01	<apergos>	!log shot many old apache processes running as stats user from 2013, on stat1001 (restarting apache runs it as www-data user)
2014-04-08 05:47:06	<morebots>	Logged the message, Master
2014-04-08 06:34:37	<grrrit-wm>	('PS3') 'Matanya': dataset: fix module path [operations/puppet] - 'https://gerrit.wikimedia.org/r/119212'
2014-04-08 06:37:44	<grrrit-wm>	('PS3') 'Matanya': exim: fix scoping [operations/puppet] - 'https://gerrit.wikimedia.org/r/119496'
2014-04-08 06:43:48	<matanya>	springle: did you hear from otto regarding https://gerrit.wikimedia.org/r/#/c/122406/ ?
2014-04-08 06:45:27	<springle>	matanya: no
2014-04-08 06:45:41	<matanya>	:/ i need to chase him down, thanks
2014-04-08 06:46:04	<springle>	not sure otto knows about it? i emailed analytics lists directly
2014-04-08 06:46:29	<springle>	so far the answer is: probably fine to decom db67, but lets wait for enveryone to chime in
2014-04-08 06:46:43	<springle>	i'll bump it this week
2014-04-08 06:47:05	<matanya>	thank you
2014-04-08 07:30:44	<grrrit-wm>	('PS1') 'Faidon Liambotis': base: add debian-goodies [operations/puppet] - 'https://gerrit.wikimedia.org/r/124524'
2014-04-08 07:47:07	<_joe\|away>	!log restarted nginx on cp1044 and cp1043
2014-04-08 07:47:12	<morebots>	Logged the message, Master
2014-04-08 07:53:07	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 07:53:07	<grrrit-wm>	('CR') 'coren': [C: '2'] base: add debian-goodies [operations/puppet] - 'https://gerrit.wikimedia.org/r/124524' (owner: 'Faidon Liambotis')
2014-04-08 08:02:57	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 08:09:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 08:09:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 08:09:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 08:09:07	<icinga-wm>	PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 08:11:47	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 08:15:17	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 08:36:30	<siebrand>	ori: still working?
2014-04-08 09:03:47	<icinga-wm>	PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:04:07	<YuviPanda>	hashar: help with setting up zuul for the apps? https://gerrit.wikimedia.org/r/#/c/124539/
2014-04-08 09:08:37	<icinga-wm>	PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:08:47	<icinga-wm>	RECOVERY - RAID on labstore3 is OK: OK: optimal, 12 logical, 12 physical
2014-04-08 09:08:57	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 09:11:47	<icinga-wm>	PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:16:55	<grrrit-wm>	('PS1') 'RobH': Replacing the unified certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124542'
2014-04-08 09:24:34	<grrrit-wm>	('CR') 'RobH': [C: '2'] Replacing the unified certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124542' (owner: 'RobH')
2014-04-08 09:29:47	<icinga-wm>	RECOVERY - RAID on labstore3 is OK: OK: optimal, 12 logical, 12 physical
2014-04-08 09:33:47	<icinga-wm>	PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:36:37	<icinga-wm>	RECOVERY - RAID on labstore3 is OK: OK: optimal, 12 logical, 12 physical
2014-04-08 09:37:37	<icinga-wm>	RECOVERY - Disk space on labstore3 is OK: DISK OK
2014-04-08 09:39:19	<hashar>	YuviPanda: hello
2014-04-08 09:39:25	<YuviPanda>	hashar: hello!
2014-04-08 09:40:00	<icinga-wm>	PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:40:37	<icinga-wm>	PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
2014-04-08 09:40:57	<grrrit-wm>	('PS1') 'Andrew Bogott': Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560'
2014-04-08 09:44:12	<hashar>	and we lost YuviPanda
2014-04-08 09:45:10	<sjoerddebruin>	Noooo not our panda. :(
2014-04-08 09:46:25	<Steinsplitter>	panda \O/
2014-04-08 09:46:28	<icinga-wm>	PROBLEM - SSH on labstore3 is CRITICAL: Connection refused
2014-04-08 09:46:28	<icinga-wm>	PROBLEM - DPKG on labstore3 is CRITICAL: Connection refused by host
2014-04-08 09:46:47	<icinga-wm>	PROBLEM - puppet disabled on labstore3 is CRITICAL: Connection refused by host
2014-04-08 09:47:00	<andrewbogott>	mutante: https://gerrit.wikimedia.org/r/#/c/124560/
2014-04-08 09:47:43	<icinga-wm>	ACKNOWLEDGEMENT - DPKG on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:47:44	<icinga-wm>	ACKNOWLEDGEMENT - Disk space on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:47:44	<icinga-wm>	ACKNOWLEDGEMENT - RAID on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:47:44	<icinga-wm>	ACKNOWLEDGEMENT - SSH on labstore3 is CRITICAL: Connection refused daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:47:44	<icinga-wm>	ACKNOWLEDGEMENT - puppet disabled on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
2014-04-08 09:49:57	<matanya>	so nice to see all ops in an europian time zone :)
2014-04-08 09:50:37	<icinga-wm>	PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100%
2014-04-08 09:57:12	<grrrit-wm>	('CR') 'Dzahn': [C: '-1'] Add eth1 checks to nova compute hosts. ('3' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560' (owner: 'Andrew Bogott')
2014-04-08 10:00:49	<springle>	ori: what is udpprofile::collector, and can i move it from db1014 to... somewhere else?
2014-04-08 10:02:47	<ori>	springle: oh, wow. is there any indication that continues to see activity? mediawiki's profiler class can be configured to write to a database, but i didn't know anyone was using it in production. is it not ancient?
2014-04-08 10:04:56	<andrewbogott>	mutante, cmjohnson: https://wikitech.wikimedia.org/wiki/Help:Git_rebase#Don.27t_panic
2014-04-08 10:05:21	<thedj>	andrewbogott: 42
2014-04-08 10:05:57	<ori>	springle: it can go away
2014-04-08 10:06:34	<ori>	springle: it was added in this commit: <https://gerrit.wikimedia.org/r/#/c/83953/>;. the message reads: "testing graphite 0.910 on db1014".
2014-04-08 10:07:04	<springle>	yeah, asher stole db1014 for graphite
2014-04-08 10:07:12	<springle>	trying to steal it back :)
2014-04-08 10:07:20	<springle>	ori: thanks
2014-04-08 10:07:46	<ori>	springle: it's not in any way implicated in our current graphite setup, which exists solely on tungsten.eqiad.wmnet (and labs)
2014-04-08 10:08:13	<grrrit-wm>	('PS2') 'Andrew Bogott': Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560'
2014-04-08 10:08:18	<andrewbogott>	mutante: ^
2014-04-08 10:09:24	<grrrit-wm>	('PS1') 'Cmjohnson': adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572'
2014-04-08 10:11:07	<grrrit-wm>	('CR') 'jenkins-bot': [V: '-1'] adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
2014-04-08 10:12:49	<grrrit-wm>	('CR') 'Dzahn': [C: ''] Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560' (owner: 'Andrew Bogott')
2014-04-08 10:15:34	<Jeff_Green>	!log update & reboot samarium
2014-04-08 10:15:38	<morebots>	Logged the message, Master
2014-04-08 10:15:48	<grrrit-wm>	('CR') 'Andrew Bogott': [C: '2'] Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560' (owner: 'Andrew Bogott')
2014-04-08 10:16:26	<grrrit-wm>	('PS1') 'Springle': Remove unused db1014 block. db1014 was renamed tungsten rt5871. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124575'
2014-04-08 10:18:19	<grrrit-wm>	('CR') 'Springle': [C: '2'] Remove unused db1014 block. db1014 was renamed tungsten rt5871. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124575' (owner: 'Springle')
2014-04-08 10:21:04	<Jeff_Green>	!log update & reboot barium
2014-04-08 10:21:09	<morebots>	Logged the message, Master
2014-04-08 10:23:09	<grrrit-wm>	('PS1') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
2014-04-08 10:24:10	<grrrit-wm>	('CR') 'jenkins-bot': [V: '-1'] add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
2014-04-08 11:09:28	<icinga-wm>	PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 11:09:28	<icinga-wm>	PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 11:09:28	<icinga-wm>	PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 11:09:28	<icinga-wm>	PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 11:32:05	<grrrit-wm>	('PS20') 'Matanya': etherpad: convert into a module [operations/puppet] - 'https://gerrit.wikimedia.org/r/107567'
2014-04-08 11:32:32	<matanya>	akosiaris: in a meeting or this ^ can be handled ?
2014-04-08 11:39:18	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 12:32:58	<grrrit-wm>	('PS2') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
2014-04-08 12:39:13	<akosiaris>	matanya: in ops meeting
2014-04-08 12:39:19	<matanya>	sorry
2014-04-08 12:39:27	<akosiaris>	and please tell me you did not resubmit from your local repo
2014-04-08 12:39:48	<akosiaris>	rebase* sorry
2014-04-08 12:39:50	<grrrit-wm>	('PS2') 'Cmjohnson': adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572'
2014-04-08 12:40:26	<grrrit-wm>	('CR') 'Andrew Bogott': [V: ''] "This looks good -- we'll see if it makes new alarms go off :)" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
2014-04-08 12:46:38	<grrrit-wm>	('PS3') 'Cmjohnson': adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572'
2014-04-08 12:48:28	<icinga-wm>	PROBLEM - DPKG on strontium is CRITICAL: DPKG CRITICAL dpkg reports broken packages
2014-04-08 12:49:28	<icinga-wm>	RECOVERY - DPKG on strontium is OK: All packages OK
2014-04-08 12:49:35	<grrrit-wm>	('CR') 'Matanya': [C: ''] add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
2014-04-08 12:50:21	<cmjohnson1>	paravoid: can you review please https://gerrit.wikimedia.org/r/124572
2014-04-08 12:50:38	<andrewbogott>	mutante: https://rt.wikimedia.org/Ticket/Display.html?id=5064
2014-04-08 12:51:29	<grrrit-wm>	('CR') 'Dzahn': [C: ''] "yep, if we want to monitor this on everything, then standard-packages sounds good to me" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
2014-04-08 12:52:38	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 12:53:10	<grrrit-wm>	('CR') 'Alexandros Kosiaris': [C: '2'] adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
2014-04-08 12:55:34	<manybubbles>	can anyone around update Elasticsearch in apt?
2014-04-08 12:55:55	<manybubbles>	and ack nagios errors (so they don't spam to irc) for a couple horus?
2014-04-08 12:56:39	<logmsgbot>	!log reedy updated /a/common to {{Gerrit\|Id15ddc665}}: Revert "Group0 wikis to 1.23wmf21"
2014-04-08 12:56:44	<morebots>	Logged the message, Master
2014-04-08 12:57:23	<grrrit-wm>	('PS1') 'Reedy': Non wikipedias to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124591'
2014-04-08 12:59:03	<Reedy>	pokes qchris_away and ^d
2014-04-08 13:01:42	<Reedy>	Any idea why https://gerrit.wikimedia.org/changes/?q=status:merged+age%3A0d&o=DETAILED_ACCOUNTS&n=100 doesn't work?
2014-04-08 13:02:00	<grrrit-wm>	('CR') 'Cmjohnson': [C: '2'] adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
2014-04-08 13:03:24	<Reedy>	versus
2014-04-08 13:03:24	<Reedy>	http://review.cyanogenmod.org/changes/?q=status:open+age%3A0d&o=DETAILED_ACCOUNTS&n=100
2014-04-08 13:07:41	<grrrit-wm>	('PS3') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
2014-04-08 13:12:48	<grrrit-wm>	('PS4') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
2014-04-08 13:15:18	<apergos>	test
2014-04-08 13:15:42	<apergos>	test akosiaris
2014-04-08 13:15:43	<akosiaris>	apergos: :-)
2014-04-08 13:15:51	<apergos>	manybubbles:
2014-04-08 13:16:54	<mutante>	already pinged
2014-04-08 13:17:06	<grrrit-wm>	('PS1') 'coren': Tool Labs: forcibly upgrade libssl [operations/puppet] - 'https://gerrit.wikimedia.org/r/124594'
2014-04-08 13:19:25	<grrrit-wm>	('CR') 'Dzahn': [C: '2'] "RT #80 :)" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
2014-04-08 13:21:58	<_joe_>	ori: If you're here, please let me know :)
2014-04-08 13:26:57	<Reedy>	_joe_: Couple of hours from now
2014-04-08 13:27:05	<Reedy>	Though, he is around early sometimes
2014-04-08 13:27:31	<_joe_>	Reedy: thanks
2014-04-08 13:30:38	<grrrit-wm>	('CR') 'RobH': [C: ''] Tool Labs: forcibly upgrade libssl [operations/puppet] - 'https://gerrit.wikimedia.org/r/124594' (owner: 'coren')
2014-04-08 13:31:20	<manybubbles>	ottomata: welcome!
2014-04-08 13:31:34	<manybubbles>	can you help me get started today?
2014-04-08 13:31:42	<grrrit-wm>	('CR') 'coren': [C: '2'] Tool Labs: forcibly upgrade libssl [operations/puppet] - 'https://gerrit.wikimedia.org/r/124594' (owner: 'coren')
2014-04-08 13:31:50	<Reedy>	manybubbles: We have an extension for that
2014-04-08 13:31:51	<Reedy>	grins
2014-04-08 13:31:57	<manybubbles>	Reedy: thanks!
2014-04-08 13:32:01	<manybubbles>	I totally used it a while ago
2014-04-08 13:32:27	<qchris_away>	Reedy: Because we're using /r/ to mark the reverse proxy ...
2014-04-08 13:32:33	<qchris_away>	Reedy: https://gerrit.wikimedia.org/r/changes/?q=status:merged+age%3A0d&o=DETAILED_ACCOUNTS&n=100
2014-04-08 13:32:37	<qchris_away>	Reedy: ^ should work
2014-04-08 13:32:47	<Reedy>	Aha, sweet!
2014-04-08 13:33:43	<grrrit-wm>	('PS1') 'RobH': replace blog.wikimedia.org certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124595'
2014-04-08 13:35:07	<manybubbles>	ottomata: I need Elasticsearch 1.1.0 shoved into apt
2014-04-08 13:35:37	<grrrit-wm>	('PS2') 'RobH': replace blog.wikimedia.org certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124595'
2014-04-08 13:36:15	<Reedy>	qchris: thanks
2014-04-08 13:36:22	<qchris>	yw
2014-04-08 13:37:04	<icinga-wm>	PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds
2014-04-08 13:37:33	<mutante>	!log restarting gitblit
2014-04-08 13:37:33	<grrrit-wm>	('CR') 'RobH': [C: '2'] replace blog.wikimedia.org certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124595' (owner: 'RobH')
2014-04-08 13:37:37	<morebots>	Logged the message, Master
2014-04-08 13:39:00	<RobH>	!log replacing the blog cert, if holmium crashes I didn't do it correctly.
2014-04-08 13:39:01	<grrrit-wm>	('PS1') 'Faidon Liambotis': Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597'
2014-04-08 13:39:03	<ottomata>	manybubbles: ok!
2014-04-08 13:39:03	<morebots>	Logged the message, RobH
2014-04-08 13:39:04	<icinga-wm>	RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 305803 bytes in 9.337 second response time
2014-04-08 13:39:08	<manybubbles>	thanks!
2014-04-08 13:39:28	<Jeff_Green>	!log update & reboot tellurium
2014-04-08 13:39:33	<morebots>	Logged the message, Master
2014-04-08 13:39:47	<grrrit-wm>	('CR') 'jenkins-bot': [V: '-1'] Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597' (owner: 'Faidon Liambotis')
2014-04-08 13:41:14	<icinga-wm>	PROBLEM - Host tellurium is DOWN: PING CRITICAL - Packet loss = 100%
2014-04-08 13:42:38	<grrrit-wm>	('PS2') 'Faidon Liambotis': Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597'
2014-04-08 13:43:27	<grrrit-wm>	('CR') 'Faidon Liambotis': [C: '2' V: '2'] Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597' (owner: 'Faidon Liambotis')
2014-04-08 13:44:28	<grrrit-wm>	('CR') 'Manybubbles': "Is there a better place to run this?" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597' (owner: 'Faidon Liambotis')
2014-04-08 13:45:14	<icinga-wm>	RECOVERY - Host tellurium is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms
2014-04-08 13:46:13	<RobH>	!log upgraded libssl on holmium
2014-04-08 13:46:18	<morebots>	Logged the message, RobH
2014-04-08 13:48:49	<paravoid>	ottomata: kafka upgrade doesn't work on an1004
2014-04-08 13:49:41	<ottomata>	paravoid, analytics1004 (and analytics1003) were kafka test brokers, and were never productionized or puppetized
2014-04-08 13:49:50	<ottomata>	i thought I had removed kafka from analytics1004, actually
2014-04-08 13:50:38	<manybubbles>	ottomata: can you install git fat on tin?
2014-04-08 13:50:42	<manybubbles>	I cannot
2014-04-08 13:50:46	<ottomata>	hm, sure, why do you need git-fat there?
2014-04-08 13:50:55	<manybubbles>	to git deploy
2014-04-08 13:50:58	<manybubbles>	to Elasticsearch
2014-04-08 13:51:07	<manybubbles>	the plugins
2014-04-08 13:51:14	<manybubbles>	or is there another server
2014-04-08 13:51:17	<ottomata>	you don't need git-fat on tin though
2014-04-08 13:51:23	<ottomata>	the git-fat commands are run on deplo hsots
2014-04-08 13:51:27	<ottomata>	on the targets
2014-04-08 13:51:46	<manybubbles>	huh, I'm used to running it on the server to check the jars got there. I'll just do it without and see
2014-04-08 13:53:21	<manybubbles>	ottomata: that worked as you said it would
2014-04-08 13:53:35	<manybubbles>	!log synced first Elasticsearch plugin to production Elasticsearch servers
2014-04-08 13:53:39	<morebots>	Logged the message, Master
2014-04-08 13:54:01	<manybubbles>	!log they'll pick it up during the rolling restart today to upgrade to 1.1.0
2014-04-08 13:54:05	<morebots>	Logged the message, Master
2014-04-08 13:54:08	<ottomata>	cool
2014-04-08 13:54:18	<ottomata>	manybubbles: , i was going to start reinstalling an elastic search server today
2014-04-08 13:54:33	<manybubbles>	ottomata: not a _great_ day for it
2014-04-08 13:54:37	<manybubbles>	because I'm upgrading to 1.1.0
2014-04-08 13:54:43	<ottomata>	ok
2014-04-08 13:54:45	<manybubbles>	that is on the deployment calendar and everything
2014-04-08 13:55:05	<manybubbles>	maybe tomorrow?
2014-04-08 13:57:09	<ottomata>	sure
2014-04-08 14:04:07	<manybubbles>	ottomata: please ping me when you get a chance to update apt
2014-04-08 14:04:35	<ottomata>	i was about to to do it, but am in standup now
2014-04-08 14:04:36	<ottomata>	um
2014-04-08 14:04:41	<ottomata>	q for akosiaris, if you are around
2014-04-08 14:04:54	<ottomata>	I should change VerifyRelease, right?
2014-04-08 14:04:54	<icinga-wm>	PROBLEM - DPKG on labstore4 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
2014-04-08 14:04:59	<ottomata>	i'm trying to find the right thing to change it to
2014-04-08 14:05:14	<ottomata>	i downloaded 1.1's Release.gpg and am doing what the reprepro man page says to do
2014-04-08 14:05:17	<ottomata>	but am not sure
2014-04-08 14:05:23	<ottomata>	the output doesn't look like what you have
2014-04-08 14:05:54	<icinga-wm>	RECOVERY - DPKG on labstore4 is OK: All packages OK
2014-04-08 14:09:44	<icinga-wm>	PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 14:09:44	<icinga-wm>	PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 14:09:44	<icinga-wm>	PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 14:09:44	<icinga-wm>	PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 14:11:17	<grrrit-wm>	('PS1') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:18:13	<grrrit-wm>	('PS2') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:19:21	<grrrit-wm>	('PS1') 'Ottomata': reprepro/updates - upgrading elasticsearch to 1.1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124603'
2014-04-08 14:20:08	<grrrit-wm>	('CR') 'Ottomata': [C: '2' V: '2'] reprepro/updates - upgrading elasticsearch to 1.1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124603' (owner: 'Ottomata')
2014-04-08 14:23:54	<icinga-wm>	PROBLEM - HTTPS on ssl1002 is CRITICAL: Connection refused
2014-04-08 14:24:06	<ottomata>	manybubbles: http://apt.wikimedia.org/wikimedia/pool/main/e/elasticsearch/
2014-04-08 14:24:09	<ottomata>	look ok?
2014-04-08 14:28:54	<icinga-wm>	RECOVERY - HTTPS on ssl1002 is OK: OK - Certificate will expire on 01/20/2016 12:00.
2014-04-08 14:29:45	<manybubbles>	ottomata: looks good - let me try elastic1001
2014-04-08 14:30:35	<grrrit-wm>	('PS3') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:30:57	<andrewbogott>	mutante, ^ pls?
2014-04-08 14:31:37	<manybubbles>	!log upgrading elastic1001
2014-04-08 14:31:42	<morebots>	Logged the message, Master
2014-04-08 14:32:38	<manybubbles>	!log woops, just restarted elastic1002. silly me
2014-04-08 14:32:42	<morebots>	Logged the message, Master
2014-04-08 14:32:46	<manybubbles>	!log no harm done, just lost time
2014-04-08 14:32:50	<morebots>	Logged the message, Master
2014-04-08 14:33:53	<manybubbles>	ottomata: can you make nagios not bother us about Elasticsearch warning over the next few hours?
2014-04-08 14:33:56	<manybubbles>	I'm paying attention
2014-04-08 14:34:25	<ottomata>	uh hm
2014-04-08 14:35:43	<ottomata>	i think so, how long manybubbles
2014-04-08 14:35:45	<ottomata>	4 hours?
2014-04-08 14:35:48	<manybubbles>	sure!
2014-04-08 14:36:14	<icinga-wm>	PROBLEM - NTP peers on linne is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
2014-04-08 14:38:14	<icinga-wm>	RECOVERY - NTP peers on linne is OK: NTP OK: Offset 0.016747 secs
2014-04-08 14:44:43	<mutante>	andrewbogott: https://gerrit.wikimedia.org/r/#/c/77332/7/modules/base/manifests/monitoring/host.pp
2014-04-08 14:44:51	<grrrit-wm>	('PS4') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:54:18	<grrrit-wm>	('PS5') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
2014-04-08 14:54:59	<grrrit-wm>	('PS3') 'Cmjohnson': add interface speed check for all hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124606'
2014-04-08 15:01:42	<cmjohnson>	mutante: can you review https://gerrit.wikimedia.org/r/124606
2014-04-08 15:02:06	<grrrit-wm>	('CR') 'Alexandros Kosiaris': [C: '-1'] "Great idea. Minor stuff here and there like making it parameterizable but looks nice." ('6' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124606' (owner: 'Cmjohnson')
2014-04-08 15:03:10	<ottomata>	manybubbles: i think I just scheduled downtime in icinga for elastic search for the next ~4 hours
2014-04-08 15:03:19	<ottomata>	never done that before, so not sure what it will do
2014-04-08 15:03:47	<grrrit-wm>	('PS1') 'Rush': module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608'
2014-04-08 15:04:54	<manybubbles>	ottomata: its cool!
2014-04-08 15:04:56	<manybubbles>	thanks
2014-04-08 15:07:45	<grrrit-wm>	('CR') 'Ottomata': module to manage new python-diamond package ('5' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608' (owner: 'Rush')
2014-04-08 15:08:18	<grrrit-wm>	('CR') 'Dzahn': [C: ''] Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601' (owner: 'Andrew Bogott')
2014-04-08 15:12:34	<grrrit-wm>	('PS2') 'Rush': module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608'
2014-04-08 15:13:35	<grrrit-wm>	('CR') 'jenkins-bot': [V: '-1'] module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608' (owner: 'Rush')
2014-04-08 15:15:36	<grrrit-wm>	('PS3') 'Rush': module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608'
2014-04-08 15:16:34	<icinga-wm>	PROBLEM - Host virt1000 is DOWN: CRITICAL - Host Unreachable (208.80.154.18)
2014-04-08 15:16:42	<RobH>	!log all ssl servers in eqiad have been updated with new cert and restarted
2014-04-08 15:16:51	<RobH>	!log rolling updates on ssl3001-3003 presently
2014-04-08 15:17:10	<grrrit-wm>	('PS1') 'Dzahn': enable base monitoring for ALL hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124609'
2014-04-08 15:17:24	<icinga-wm>	PROBLEM - Host labs-ns1.wikimedia.org is DOWN: CRITICAL - Host Unreachable (208.80.154.19)
2014-04-08 15:18:04	<icinga-wm>	RECOVERY - Host virt1000 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms
2014-04-08 15:19:03	<grrrit-wm>	('CR') 'Andrew Bogott': [C: '2'] Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601' (owner: 'Andrew Bogott')
2014-04-08 15:19:04	<icinga-wm>	RECOVERY - Host labs-ns1.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.98 ms
2014-04-08 15:19:07	<mutante>	apergos: https://gerrit.wikimedia.org/r/#/c/124609/1
2014-04-08 15:19:46	<mutante>	ugly, eh.. since i have to change all those lines because of indentation :p
2014-04-08 15:22:25	<grrrit-wm>	('CR') 'ArielGlenn': [C: ''] enable base monitoring for ALL hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124609' (owner: 'Dzahn')
2014-04-08 15:22:39	<grrrit-wm>	('CR') 'Dzahn': [C: '2'] enable base monitoring for ALL hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124609' (owner: 'Dzahn')
2014-04-08 15:23:46	<grrrit-wm>	('CR') 'Ottomata': module to manage new python-diamond package ('2' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608' (owner: 'Rush')
2014-04-08 15:27:31	<icinga-wm>	PROBLEM - HTTPS on cp4009 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:41	<icinga-wm>	PROBLEM - HTTPS on ssl3003 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:41	<icinga-wm>	PROBLEM - HTTPS on ssl1006 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:41	<icinga-wm>	PROBLEM - HTTPS on cp4014 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51	<icinga-wm>	PROBLEM - HTTPS on ssl1004 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51	<icinga-wm>	PROBLEM - HTTPS on ssl1005 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51	<icinga-wm>	PROBLEM - HTTPS on cp4008 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51	<icinga-wm>	PROBLEM - HTTPS on cp4004 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:51	<icinga-wm>	PROBLEM - HTTPS on cp4015 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:52	<icinga-wm>	PROBLEM - HTTPS on cp4001 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:52	<icinga-wm>	PROBLEM - HTTPS on cp4017 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:53	<icinga-wm>	PROBLEM - HTTPS on amssq47 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:53	<icinga-wm>	PROBLEM - HTTPS on ssl1002 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:54	<icinga-wm>	PROBLEM - HTTPS on ssl1001 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:54	<icinga-wm>	PROBLEM - HTTPS on cp4005 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:27:55	<icinga-wm>	PROBLEM - HTTPS on cp4012 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:01	<icinga-wm>	PROBLEM - HTTPS on cp4016 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:01	<icinga-wm>	PROBLEM - HTTPS on sodium is CRITICAL: SSL_CERT CRITICAL lists.wikimedia.org: invalid CN (lists.wikimedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11	<icinga-wm>	PROBLEM - HTTPS on ssl1007 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11	<icinga-wm>	PROBLEM - HTTPS on iodine is CRITICAL: SSL_CERT CRITICAL ticket.wikimedia.org: invalid CN (ticket.wikimedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11	<icinga-wm>	PROBLEM - HTTPS on ssl3002 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11	<icinga-wm>	PROBLEM - HTTPS on ssl3001 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:11	<icinga-wm>	PROBLEM - HTTPS on cp4018 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:12	<icinga-wm>	PROBLEM - HTTPS on ssl1008 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:12	<icinga-wm>	PROBLEM - HTTPS on ssl1009 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:13	<icinga-wm>	PROBLEM - HTTPS on ssl1003 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:13	<icinga-wm>	PROBLEM - HTTPS on cp4013 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:14	<icinga-wm>	PROBLEM - HTTPS on cp4003 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:14	<icinga-wm>	PROBLEM - HTTPS on cp4007 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:15	<icinga-wm>	PROBLEM - HTTPS on cp4011 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:15	<icinga-wm>	PROBLEM - HTTPS on cp4010 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:21	<icinga-wm>	PROBLEM - HTTPS on cp4020 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:21	<icinga-wm>	PROBLEM - HTTPS on cp4006 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:31	<icinga-wm>	PROBLEM - HTTPS on cp4002 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:28:31	<icinga-wm>	PROBLEM - HTTPS on cp4019 is CRITICAL: SSL_CERT CRITICAL .wikipedia.org: invalid CN (.wikipedia.org does not match *.wikimedia.org)
2014-04-08 15:30:02	<greg-g>	holy fun :)
2014-04-08 15:30:37	<aude>	:o
2014-04-08 15:32:08	<greg-g>	aude: getting to your email :)
2014-04-08 15:32:13	<aude>	ok
2014-04-08 15:32:25	<aude>	want to see if it's ok to do today
2014-04-08 15:32:35	<aude>	anytime works for us, i suppose
2014-04-08 15:34:45	<greg-g>	aude: tl;dr of email: yep, looks good
2014-04-08 15:34:50	<aude>	ok
2014-04-08 15:35:07	<aude>	we were smart to put i18n stuff a while ago :)
2014-04-08 15:35:42	<icinga-wm>	PROBLEM - RAID on holmium is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
2014-04-08 15:35:52	<icinga-wm>	PROBLEM - DPKG on fenari is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:36:01	<andrewbogott>	the https failures are me muching with monitoring, nothing to worry about
2014-04-08 15:36:02	<icinga-wm>	PROBLEM - Disk space on fenari is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:36:12	<icinga-wm>	PROBLEM - RAID on fenari is CRITICAL: NRPE: Command check_raid not defined
2014-04-08 15:36:22	<icinga-wm>	PROBLEM - puppet disabled on fenari is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:36:57	<hashar>	mutante: fenari is not happy :-D
2014-04-08 15:38:21	<mutante>	hashar: thanks, that's cause we just added more monitoring
2014-04-08 15:38:33	<mutante>	RT #80 :)
2014-04-08 15:38:48	<hashar>	mutante: yeah I noticed your puppet change. Guess fenari is missing some bits
2014-04-08 15:41:12	<mutante>	hashar: wasn't running nagios-nrpe-server
2014-04-08 15:41:52	<mutante>	greg-g: re: SSL certs, andrewbogott is on that one
2014-04-08 15:41:57	<mutante>	ops monitoring sprint over here
2014-04-08 15:42:11	<greg-g>	mutante: ahh, good to know who's on point for that, thanks
2014-04-08 15:42:23	<greg-g>	wasn't sure if it'd be a opsen party thing or not
2014-04-08 15:42:44	<mutante>	it is. ops in Athens
2014-04-08 15:43:05	<mutante>	that check is new, in that it checks for validity of cert, not just expiry
2014-04-08 15:43:18	<mutante>	and wikimedia vs. wikipedia thing
2014-04-08 15:43:30	<greg-g>	nods
2014-04-08 15:44:52	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 438.266663
2014-04-08 15:45:02	<grrrit-wm>	('PS1') 'Andrew Bogott': When checking unified certs, check for *.wikipedia.org [operations/puppet] - 'https://gerrit.wikimedia.org/r/124616'
2014-04-08 15:45:32	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 434.533325
2014-04-08 15:46:21	<grrrit-wm>	('CR') 'Andrew Bogott': [C: '2'] When checking unified certs, check for .wikipedia.org [operations/puppet] - 'https://gerrit.wikimedia.org/r/124616' (owner: 'Andrew Bogott*')
2014-04-08 15:46:22	<icinga-wm>	PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 12:45:20 PM UTC
2014-04-08 15:53:10	<icinga-wm>	RECOVERY - RAID on fenari is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0
2014-04-08 15:53:17	<mutante>	hashar: ^ :)
2014-04-08 15:53:20	<icinga-wm>	RECOVERY - puppet disabled on fenari is OK: OK
2014-04-08 15:53:26	<hashar>	nice
2014-04-08 15:53:40	<icinga-wm>	RECOVERY - Disk space on fenari is OK: DISK OK
2014-04-08 15:53:41	<mutante>	RT #80 ftw
2014-04-08 15:53:48	<andrewbogott>	With any luck there'll be another flood of OKs in a minute...
2014-04-08 15:53:50	<icinga-wm>	RECOVERY - DPKG on fenari is OK: All packages OK
2014-04-08 15:54:10	<icinga-wm>	PROBLEM - puppet disabled on bast1001 is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:54:10	<icinga-wm>	PROBLEM - Disk space on cp3003 is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:54:10	<icinga-wm>	PROBLEM - Disk space on dobson is CRITICAL: Connection refused by host
2014-04-08 15:54:10	<icinga-wm>	PROBLEM - DPKG on pdf2 is CRITICAL: Connection refused by host
2014-04-08 15:54:20	<icinga-wm>	PROBLEM - puppet disabled on iron is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:54:20	<icinga-wm>	PROBLEM - RAID on dobson is CRITICAL: Connection refused by host
2014-04-08 15:54:20	<icinga-wm>	PROBLEM - RAID on cp3003 is CRITICAL: NRPE: Command check_raid not defined
2014-04-08 15:54:20	<icinga-wm>	PROBLEM - Disk space on pdf2 is CRITICAL: Connection refused by host
2014-04-08 15:54:30	<icinga-wm>	PROBLEM - puppet disabled on dobson is CRITICAL: Connection refused by host
2014-04-08 15:54:30	<icinga-wm>	PROBLEM - RAID on pdf2 is CRITICAL: Connection refused by host
2014-04-08 15:54:30	<icinga-wm>	PROBLEM - DPKG on iodine is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:54:30	<icinga-wm>	PROBLEM - puppet disabled on pdf2 is CRITICAL: Connection refused by host
2014-04-08 15:54:40	<icinga-wm>	PROBLEM - Disk space on iodine is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:54:40	<icinga-wm>	PROBLEM - puppet disabled on cp3003 is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:54:40	<icinga-wm>	PROBLEM - DPKG on pdf3 is CRITICAL: Connection refused by host
2014-04-08 15:54:48	<andrewbogott>	that's not what I meant
2014-04-08 15:54:50	<icinga-wm>	PROBLEM - RAID on iodine is CRITICAL: NRPE: Command check_raid not defined
2014-04-08 15:54:50	<icinga-wm>	PROBLEM - Disk space on pdf3 is CRITICAL: Connection refused by host
2014-04-08 15:54:50	<icinga-wm>	PROBLEM - DPKG on tridge is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:54:50	<icinga-wm>	PROBLEM - DPKG on bast1001 is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:54:51	<icinga-wm>	PROBLEM - puppet disabled on iodine is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:54:51	<icinga-wm>	PROBLEM - RAID on pdf3 is CRITICAL: Connection refused by host
2014-04-08 15:54:51	<icinga-wm>	PROBLEM - Disk space on tridge is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:55:00	<icinga-wm>	PROBLEM - Disk space on bast1001 is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:55:00	<icinga-wm>	PROBLEM - puppet disabled on pdf3 is CRITICAL: Connection refused by host
2014-04-08 15:55:10	<icinga-wm>	PROBLEM - Disk space on iron is CRITICAL: NRPE: Command check_disk_space not defined
2014-04-08 15:55:10	<icinga-wm>	PROBLEM - RAID on bast1001 is CRITICAL: NRPE: Command check_raid not defined
2014-04-08 15:55:10	<icinga-wm>	PROBLEM - DPKG on dobson is CRITICAL: Connection refused by host
2014-04-08 15:55:10	<icinga-wm>	PROBLEM - DPKG on cp3003 is CRITICAL: NRPE: Command check_dpkg not defined
2014-04-08 15:55:10	<icinga-wm>	PROBLEM - DPKG on virt1000 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
2014-04-08 15:55:10	<icinga-wm>	PROBLEM - puppet disabled on tridge is CRITICAL: NRPE: Command check_puppet_disabled not defined
2014-04-08 15:55:41	<greg-g>	ahhh, so today is going to be a worthless -operations channel day, more than normal, due to the sprint? :)
2014-04-08 15:56:03	<andrewbogott>	We're about to all go to dinner though.
2014-04-08 15:56:09	<andrewbogott>	So things should quiet down shortly.
2014-04-08 15:56:10	<icinga-wm>	PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 12:55:50 PM UTC
2014-04-08 15:56:19	<andrewbogott>	But the channel will still be useless if you want to talk to ops :)
2014-04-08 15:56:50	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 15:57:03	<mutante>	will start nagios-nrpe-server on those
2014-04-08 15:57:10	<icinga-wm>	PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 12:56:15 PM UTC
2014-04-08 15:58:42	<icinga-wm>	RECOVERY - HTTPS on ssl3001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 15:58:42	<icinga-wm>	RECOVERY - HTTPS on ssl1006 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 15:58:52	<icinga-wm>	RECOVERY - HTTPS on ssl1007 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 15:58:52	<icinga-wm>	RECOVERY - HTTPS on ssl1002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 15:59:32	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 15:59:52	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 16:00:04	<aude>	back in 5 min or so
2014-04-08 16:00:06	<grrrit-wm>	('Abandoned') 'Physikerwelt': WIP: Enable orthogonal MathJax config [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/110240' (owner: 'Physikerwelt')
2014-04-08 16:00:42	<icinga-wm>	PROBLEM - DPKG on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
2014-04-08 16:00:42	<icinga-wm>	PROBLEM - Disk space on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
2014-04-08 16:00:52	<icinga-wm>	PROBLEM - RAID on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
2014-04-08 16:01:02	<icinga-wm>	PROBLEM - puppet disabled on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
2014-04-08 16:02:22	<icinga-wm>	PROBLEM - Puppet freshness on ms6 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:02:03 PM UTC
2014-04-08 16:04:37	<aude>	back
2014-04-08 16:08:22	<icinga-wm>	PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:07:31 PM UTC
2014-04-08 16:09:27	<icinga-wm>	PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:09:07 PM UTC
2014-04-08 16:09:27	<icinga-wm>	PROBLEM - Puppet freshness on lvs4003 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:08:32 PM UTC
2014-04-08 16:09:27	<icinga-wm>	RECOVERY - HTTPS on cp4020 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:27	<icinga-wm>	RECOVERY - HTTPS on cp4006 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:27	<icinga-wm>	RECOVERY - HTTPS on cp4013 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:37	<icinga-wm>	RECOVERY - HTTPS on cp4009 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:37	<icinga-wm>	RECOVERY - HTTPS on cp4010 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:37	<icinga-wm>	RECOVERY - HTTPS on ssl3003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:47	<icinga-wm>	RECOVERY - HTTPS on ssl3002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:47	<icinga-wm>	RECOVERY - HTTPS on ssl1004 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:09:56	<paravoid>	ottomata: ping
2014-04-08 16:09:57	<icinga-wm>	RECOVERY - HTTPS on cp4012 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:10:07	<icinga-wm>	RECOVERY - HTTPS on cp4016 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:10:07	<icinga-wm>	RECOVERY - HTTPS on ssl1008 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:10:07	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 16:10:07	<icinga-wm>	RECOVERY - HTTPS on cp4018 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:10:17	<icinga-wm>	RECOVERY - HTTPS on ssl1009 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:11:23	<paravoid>	ottomata: ping ping
2014-04-08 16:12:47	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 16:12:49	<ottomata>	pong pong
2014-04-08 16:13:05	<ottomata>	paravoid
2014-04-08 16:13:08	<ottomata>	wassupp
2014-04-08 16:13:14	<paravoid>	what's with stat1's puppet?
2014-04-08 16:13:18	<paravoid>	why is it admin disabled?
2014-04-08 16:13:47	<ottomata>	because it is going to be decomed very soon
2014-04-08 16:13:56	<ottomata>	and i wanted to make puppet changes that would apply to stat1003 but not mess with what was on stat1
2014-04-08 16:14:05	<ottomata>	and I didn't want to re-write a bunch of statistics.pp stuff :/
2014-04-08 16:14:07	<_joe_>	ori: are you around? seems like graphite is not working
2014-04-08 16:14:24	<paravoid>	ottomata: that's bad
2014-04-08 16:14:27	<icinga-wm>	PROBLEM - Puppet freshness on lvs1002 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:13:54 PM UTC
2014-04-08 16:14:35	<ottomata>	paravoid: even if we are going to decom it soon?
2014-04-08 16:14:36	<paravoid>	ottomata: can you remove the "include statistics*" stuff and enable it again?
2014-04-08 16:14:40	<paravoid>	yes
2014-04-08 16:14:42	<ottomata>	yeah probably can
2014-04-08 16:14:47	<paravoid>	because it's messing with monitoring and all that
2014-04-08 16:15:06	<ottomata>	ah i see it
2014-04-08 16:15:20	<ottomata>	paravoid, what is the differnece between the 3 numbers in each severity category in icinga?
2014-04-08 16:15:25	<mark>	ottomata: disabling puppet for more than a few hours max is almost always a really bad idea
2014-04-08 16:15:31	<ottomata>	mark, ok, noted.
2014-04-08 16:15:36	<mark>	thanks
2014-04-08 16:16:27	<icinga-wm>	PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:16:04 PM UTC
2014-04-08 16:16:27	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 16:17:07	<_joe_>	:/
2014-04-08 16:17:27	<icinga-wm>	PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:16:39 PM UTC
2014-04-08 16:18:10	<ottomata>	mark, can you help with the current network ACL problems?
2014-04-08 16:18:22	<mark>	sorry, what's that?
2014-04-08 16:18:25	<ottomata>	analytics nodes can't talk to apt
2014-04-08 16:18:27	<icinga-wm>	PROBLEM - Puppet freshness on lvs4001 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:17:50 PM UTC
2014-04-08 16:18:30	<ottomata>	nor statsd.eqiad.wmnet
2014-04-08 16:18:32	<ottomata>	https://rt.wikimedia.org/Ticket/Display.html?id=4433
2014-04-08 16:18:37	<ottomata>	I added to the bottom of that ticket
2014-04-08 16:18:51	<mark>	ok
2014-04-08 16:18:59	<ottomata>	i think vanadium was having the same trouble, is it on the vlan too?
2014-04-08 16:19:27	<icinga-wm>	PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:19:10 PM UTC
2014-04-08 16:19:31	<aude>	still working on wikiquote
2014-04-08 16:19:35	<mark>	we can look at getting rid of those ACLs perhaps
2014-04-08 16:19:41	<mark>	but we'll need to discuss what you're doing with firewalling
2014-04-08 16:20:18	<grrrit-wm>	('PS1') 'Ottomata': Disabling statistics roles on stat1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124621'
2014-04-08 16:20:18	<se4598>	the fingerprint of the wikis SSL cert apparently changed, but it is not a new issued cert but with the same dates as the previous one that i saved. Is that okay that the fingerprint changed?
2014-04-08 16:20:34	<ottomata>	mark, yeah, hm, not sure, i kind of like them
2014-04-08 16:20:35	<paravoid>	se4598: yes
2014-04-08 16:20:45	<ottomata>	especially since anyone with hadoop access can launch whatever mapreduce jobs they want
2014-04-08 16:21:37	<grrrit-wm>	('CR') 'Ottomata': [C: '2' V: '2'] Disabling statistics roles on stat1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124621' (owner: 'Ottomata')
2014-04-08 16:21:37	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 16:21:44	<ottomata>	hmmmm
2014-04-08 16:21:48	<ottomata>	that's weird
2014-04-08 16:21:59	<ottomata>	checking on that 5xx thing in a sec
2014-04-08 16:22:05	<ottomata>	that's surely my fault...
2014-04-08 16:22:27	<icinga-wm>	PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:21:21 PM UTC
2014-04-08 16:22:27	<icinga-wm>	PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:21:26 PM UTC
2014-04-08 16:22:27	<icinga-wm>	PROBLEM - Puppet freshness on lvs4002 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:22:07 PM UTC
2014-04-08 16:22:53	<ottomata>	hmm, graphite down?
2014-04-08 16:23:04	<mark>	ottomata: statsd access for analytics seems already there
2014-04-08 16:23:07	<ottomata>	maybe that 5xx thing is not my fault!
2014-04-08 16:23:26	<ottomata>	yeah, mark, i think we already had these set up too
2014-04-08 16:23:27	<icinga-wm>	PROBLEM - Puppet freshness on virt2 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:22:28 PM UTC
2014-04-08 16:23:37	<icinga-wm>	RECOVERY - Puppet freshness on stat1 is OK: puppet ran at Tue Apr 8 16:23:30 UTC 2014
2014-04-08 16:23:43	<ottomata>	but it seems that they aren't working right now, starting yesterday when I tried
2014-04-08 16:24:02	<grrrit-wm>	('PS1') 'Hashar': beta: reenable fatalmonitor script on eqiad [operations/puppet] - 'https://gerrit.wikimedia.org/r/124624'
2014-04-08 16:24:13	<mark>	and carbon is in there already too
2014-04-08 16:24:15	<ottomata>	mark, unless pings just aren't allowed and i'm checking wrong?
2014-04-08 16:24:24	<mark>	pings may not be allowed no
2014-04-08 16:24:27	<ottomata>	ori and I both had trouble runnign apt-get update because we coudln't talk to carbon
2014-04-08 16:24:31	<mark>	check again?
2014-04-08 16:24:35	<ottomata>	yeah checking
2014-04-08 16:24:48	<ottomata>	and i was trying to run sqstat on analytics1003
2014-04-08 16:24:52	<ottomata>	so we can decom emery
2014-04-08 16:24:59	<ottomata>	but it couldn't talk to statsd
2014-04-08 16:25:38	<ottomata>	hm.
2014-04-08 16:25:44	<ottomata>	yeah totally working now
2014-04-08 16:25:57	<ottomata>	ooooook.
2014-04-08 16:25:59	<ottomata>	weird.
2014-04-08 16:26:00	<_joe_>	ottomata: graphite is borked
2014-04-08 16:26:04	<mark>	i think faidon did it earlier
2014-04-08 16:26:05	<grrrit-wm>	('CR') 'Hashar': "puppet is broken on deployment-bastion.eqiad.wmflabs, can't deploy the change right now :-/" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124624' (owner: 'Hashar')
2014-04-08 16:26:21	<ottomata>	oh, fixed the acl problem?
2014-04-08 16:26:33	<ottomata>	maybe something else was just not working, and I assumed because I couldn't ping it was an ACL thing?
2014-04-08 16:26:55	<mark>	ping is not a good way to test that
2014-04-08 16:27:10	<ottomata>	yeah, i just saw the packets being filtered from ping
2014-04-08 16:27:11	<mark>	we allow specific protocols/ports, ping uses different ones
2014-04-08 16:27:14	<ottomata>	aye
2014-04-08 16:27:30	<ottomata>	yeah, just figured if i couldn't at least ping then probably other stuff was blcoked too, but ja
2014-04-08 16:27:57	<ottomata>	but yeah, ori couldn't use apt on vanadium either, so dunno...
2014-04-08 16:28:10	<ottomata>	and sqstat couldnt' talk to tungsten, so hm
2014-04-08 16:28:12	<ottomata>	but ok!
2014-04-08 16:28:16	<mark>	:)
2014-04-08 16:28:22	<mark>	we're going for dinner in a bit
2014-04-08 16:28:44	<ottomata>	mark
2014-04-08 16:28:45	<ottomata>	hm
2014-04-08 16:28:53	<ottomata>	so sqstat is trying to talk to tungsten on 2003
2014-04-08 16:28:56	<hashar>	!log Jenkins: killed jenkins-slave java process on gallium and repooled gallium slave. It was no more registered in Zuul :-/
2014-04-08 16:28:57	<icinga-wm>	RECOVERY - puppet disabled on iron is OK: OK
2014-04-08 16:28:57	<ottomata>	is that open?
2014-04-08 16:29:01	<morebots>	Logged the message, Master
2014-04-08 16:29:07	<icinga-wm>	RECOVERY - Disk space on iron is OK: DISK OK
2014-04-08 16:29:09	<ottomata>	can't seem to reach it from an03
2014-04-08 16:29:34	<manybubbles>	ganglia seems upset
2014-04-08 16:29:40	<mark>	protocol udp;
2014-04-08 16:29:40	<mark>	destination-port 8125;
2014-04-08 16:29:45	<aude>	tables added
2014-04-08 16:29:51	<mark>	so port 2003 isn't
2014-04-08 16:29:54	<ottomata>	ah ok
2014-04-08 16:30:03	<ottomata>	that's why then, could you add?
2014-04-08 16:30:13	<mark>	ok
2014-04-08 16:30:40	<ottomata>	i'm going to see if reqstats gets flaky when we move it to analytics1003
2014-04-08 16:30:51	<ottomata>	it was either flaky because erbium is busy
2014-04-08 16:30:57	<ottomata>	or because the multicast firehose is just too lossy
2014-04-08 16:31:37	<aude>	!log added sites and site_identifiers core tables on wikiquote
2014-04-08 16:31:41	<morebots>	Logged the message, Master
2014-04-08 16:32:22	<mark>	2003 should work now
2014-04-08 16:33:36	<icinga-wm>	RECOVERY - DPKG on iodine is OK: All packages OK
2014-04-08 16:33:36	<icinga-wm>	RECOVERY - Disk space on iodine is OK: DISK OK
2014-04-08 16:33:36	<icinga-wm>	RECOVERY - puppet disabled on cp3003 is OK: OK
2014-04-08 16:33:39	<ottomata>	ah just noticed it is udp, mark, will that work still?
2014-04-08 16:33:46	<icinga-wm>	RECOVERY - HTTPS on cp4014 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:46	<icinga-wm>	RECOVERY - RAID on cp3003 is OK: OK: optimal, 2 logical, 2 physical
2014-04-08 16:33:46	<icinga-wm>	RECOVERY - RAID on iodine is OK: OK: no disks configured for RAID
2014-04-08 16:33:46	<icinga-wm>	RECOVERY - HTTPS on ssl1005 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:46	<icinga-wm>	RECOVERY - HTTPS on cp4003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:47	<mark>	yes
2014-04-08 16:33:51	<ottomata>	ok cool
2014-04-08 16:33:52	<ottomata>	thanks
2014-04-08 16:33:53	<ottomata>	ok go eat
2014-04-08 16:33:55	<ottomata>	thank you!
2014-04-08 16:33:56	<icinga-wm>	RECOVERY - DPKG on bast1001 is OK: All packages OK
2014-04-08 16:33:56	<icinga-wm>	RECOVERY - puppet disabled on iodine is OK: OK
2014-04-08 16:33:56	<icinga-wm>	RECOVERY - HTTPS on cp4002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:56	<icinga-wm>	RECOVERY - HTTPS on amssq47 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:56	<icinga-wm>	RECOVERY - HTTPS on cp4004 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:57	<icinga-wm>	RECOVERY - HTTPS on cp4001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:57	<icinga-wm>	RECOVERY - HTTPS on cp4017 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:58	<icinga-wm>	RECOVERY - HTTPS on cp4015 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:58	<icinga-wm>	RECOVERY - HTTPS on cp4008 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:59	<icinga-wm>	RECOVERY - HTTPS on ssl1001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:33:59	<icinga-wm>	RECOVERY - HTTPS on cp4005 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:34:00	<icinga-wm>	RECOVERY - Disk space on bast1001 is OK: DISK OK
2014-04-08 16:34:00	<icinga-wm>	RECOVERY - HTTPS on cp4019 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:34:06	<icinga-wm>	RECOVERY - RAID on bast1001 is OK: OK: no RAID installed
2014-04-08 16:34:06	<icinga-wm>	RECOVERY - DPKG on cp3003 is OK: All packages OK
2014-04-08 16:34:06	<icinga-wm>	RECOVERY - HTTPS on ssl1003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:34:06	<icinga-wm>	RECOVERY - HTTPS on cp4007 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:34:16	<icinga-wm>	RECOVERY - puppet disabled on bast1001 is OK: OK
2014-04-08 16:34:16	<icinga-wm>	RECOVERY - Disk space on cp3003 is OK: DISK OK
2014-04-08 16:34:16	<icinga-wm>	RECOVERY - HTTPS on cp4011 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
2014-04-08 16:35:36	<icinga-wm>	PROBLEM - Puppet freshness on lvs4004 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:35:09 PM UTC
2014-04-08 16:35:46	<icinga-wm>	PROBLEM - HTTPS on cp1044 is CRITICAL: SSL_CERT CRITICAL .wikimedia.org: invalid CN (.wikimedia.org does not match *.wikipedia.org)
2014-04-08 16:35:56	<icinga-wm>	PROBLEM - HTTPS on cp1043 is CRITICAL: SSL_CERT CRITICAL .wikimedia.org: invalid CN (.wikimedia.org does not match *.wikipedia.org)
2014-04-08 16:36:48	<grrrit-wm>	('PS1') 'Ottomata': Putting sqstat back on analytics1003 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124630'
2014-04-08 16:37:16	<grrrit-wm>	('CR') 'Ottomata': [C: '2' V: '2'] Putting sqstat back on analytics1003 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124630' (owner: 'Ottomata')
2014-04-08 16:38:30	<grrrit-wm>	('PS1') 'Springle': invalid MariaDB variable name: user_stat [operations/puppet] - 'https://gerrit.wikimedia.org/r/124632'
2014-04-08 16:40:40	<grrrit-wm>	('CR') 'Springle': [C: '2'] invalid MariaDB variable name: user_stat [operations/puppet] - 'https://gerrit.wikimedia.org/r/124632' (owner: 'Springle')
2014-04-08 16:46:50	<grrrit-wm>	('PS1') 'RobH': replace misc-web-lb cert [operations/puppet] - 'https://gerrit.wikimedia.org/r/124634'
2014-04-08 16:48:11	<grrrit-wm>	('CR') 'RobH': [C: '2' V: '2'] replace misc-web-lb cert [operations/puppet] - 'https://gerrit.wikimedia.org/r/124634' (owner: 'RobH')
2014-04-08 16:49:09	<aude>	sorry, being slow... populating sites table
2014-04-08 16:49:20	<grrrit-wm>	('PS1') 'Alexandros Kosiaris': Removing ethtool package from other places [operations/puppet] - 'https://gerrit.wikimedia.org/r/124637'
2014-04-08 16:49:22	<aude>	suppose no hurry
2014-04-08 16:50:08	<grrrit-wm>	('CR') 'Dzahn': [C: ''] Removing ethtool package from other places [operations/puppet] - 'https://gerrit.wikimedia.org/r/124637' (owner: 'Alexandros Kosiaris')
2014-04-08 16:52:03	<grrrit-wm>	('CR') 'Dzahn': [C: '2'] "now included in base" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124637' (owner: 'Alexandros Kosiaris')
2014-04-08 16:53:08	<grrrit-wm>	('CR') 'Cmcmahon': [C: ''] "Thanks for putting this back." [operations/puppet] - 'https://gerrit.wikimedia.org/r/124624' (owner: 'Hashar')
2014-04-08 16:53:36	<icinga-wm>	RECOVERY - Puppet freshness on virt2 is OK: puppet ran at Tue Apr 8 16:53:29 UTC 2014
2014-04-08 16:53:46	<icinga-wm>	RECOVERY - Puppet freshness on dataset1001 is OK: puppet ran at Tue Apr 8 16:53:39 UTC 2014
2014-04-08 16:55:06	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 16:55:28	<ottomata>	rats
2014-04-08 16:56:36	<icinga-wm>	RECOVERY - Puppet freshness on amslvs2 is OK: puppet ran at Tue Apr 8 16:56:30 UTC 2014
2014-04-08 16:56:46	<icinga-wm>	RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Tue Apr 8 16:56:45 UTC 2014
2014-04-08 16:59:04	<aude>	waiting for jenkins
2014-04-08 17:01:46	<icinga-wm>	RECOVERY - Puppet freshness on ms6 is OK: puppet ran at Tue Apr 8 17:01:37 UTC 2014
2014-04-08 17:01:48	<grrrit-wm>	('PS2') 'Manybubbles': Turn on experimental highlighting in beta [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124003'
2014-04-08 17:03:06	<logmsgbot>	!log aude synchronized php-1.23wmf20/extensions/Wikidata 'Update Wikidata build, to allow populating sites table on wikiquote'
2014-04-08 17:03:10	<morebots>	Logged the message, Master
2014-04-08 17:05:20	<icinga-wm>	RECOVERY - Puppet freshness on lvs4004 is OK: puppet ran at Tue Apr 8 17:05:14 UTC 2014
2014-04-08 17:05:30	<icinga-wm>	PROBLEM - RAID on dataset1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded)
2014-04-08 17:06:40	<icinga-wm>	PROBLEM - LVS HTTPS IPv6 on misc-web-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection refused
2014-04-08 17:07:40	<icinga-wm>	RECOVERY - LVS HTTPS IPv6 on misc-web-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 226 bytes in 0.012 second response time
2014-04-08 17:08:20	<icinga-wm>	RECOVERY - Puppet freshness on amslvs3 is OK: puppet ran at Tue Apr 8 17:08:15 UTC 2014
2014-04-08 17:08:30	<icinga-wm>	RECOVERY - Puppet freshness on lvs4003 is OK: puppet ran at Tue Apr 8 17:08:25 UTC 2014
2014-04-08 17:08:44	<grrrit-wm>	('CR') 'Chad': [C: '2'] Turn on experimental highlighting in beta [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124003' (owner: 'Manybubbles')
2014-04-08 17:08:53	<grrrit-wm>	('Merged') 'jenkins-bot': Turn on experimental highlighting in beta [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124003' (owner: 'Manybubbles')
2014-04-08 17:09:40	<icinga-wm>	RECOVERY - Puppet freshness on lvs1006 is OK: puppet ran at Tue Apr 8 17:09:30 UTC 2014
2014-04-08 17:10:10	<icinga-wm>	PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 17:10:10	<icinga-wm>	PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 17:10:10	<icinga-wm>	PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 17:10:10	<icinga-wm>	PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 17:10:19	<grrrit-wm>	('CR') 'QChris': "Prerequisite got merged." [operations/puppet] - 'https://gerrit.wikimedia.org/r/121546' (owner: 'Ottomata')
2014-04-08 17:10:52	<aude>	^demon\|away: are you deploying stuff?
2014-04-08 17:11:14	<aude>	i'll need to sneak in some point for a config change, but not yet
2014-04-08 17:11:29	<grrrit-wm>	('PS1') 'Ottomata': Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641'
2014-04-08 17:11:38	<grrrit-wm>	('PS2') 'Ottomata': Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641'
2014-04-08 17:11:40	<grrrit-wm>	('CR') 'jenkins-bot': [V: '-1'] Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641' (owner: 'Ottomata')
2014-04-08 17:11:50	<grrrit-wm>	('CR') 'Ottomata': [C: '2' V: '2'] Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641' (owner: 'Ottomata')
2014-04-08 17:12:28	<manybubbles>	aude: no, he just merged something for beta
2014-04-08 17:12:34	<aude>	ok
2014-04-08 17:12:41	<aude>	probably need 10 more minutes
2014-04-08 17:12:50	<aude>	done populating tables, now checking they are ok
2014-04-08 17:13:00	<aude>	then can do the config change and then done :)
2014-04-08 17:13:19	<^demon\|away>	aude: Nope, just merged that for Nik for beta.
2014-04-08 17:13:21	<^demon\|away>	Like he said :)
2014-04-08 17:13:22	<aude>	going slow and careful since i'm still newish
2014-04-08 17:13:25	<aude>	doign this stuff
2014-04-08 17:13:32	<^demon\|away>	Someone should sync it eventually for consistency, but no biggie.
2014-04-08 17:13:53	<aude>	i can do
2014-04-08 17:14:04	<hoo>	so can I
2014-04-08 17:14:29	<aude>	hoo: want to check the sites tables and site_identifiers for wikiquote?
2014-04-08 17:14:30	<icinga-wm>	RECOVERY - Puppet freshness on lvs1002 is OK: puppet ran at Tue Apr 8 17:14:22 UTC 2014
2014-04-08 17:14:36	<aude>	they look ok to me
2014-04-08 17:15:30	<icinga-wm>	RECOVERY - Puppet freshness on lvs1005 is OK: puppet ran at Tue Apr 8 17:15:22 UTC 2014
2014-04-08 17:16:02	<grrrit-wm>	('CR') 'Aude': "sites table and site_identifiers are added and populated" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
2014-04-08 17:16:10	<icinga-wm>	RECOVERY - Puppet freshness on lvs1004 is OK: puppet ran at Tue Apr 8 17:16:02 UTC 2014
2014-04-08 17:16:28	<manybubbles>	!log finished upgrading elastic1001-1006. starting on 1007. yay progress.
2014-04-08 17:16:32	<morebots>	Logged the message, Master
2014-04-08 17:16:34	<hoo>	enwikiqoute looks good to me
2014-04-08 17:16:39	<aude>	alright
2014-04-08 17:16:40	<hoo>	sites and site_identifiers
2014-04-08 17:16:44	<aude>	strip protocals and all
2014-04-08 17:16:52	<hoo>	yep
2014-04-08 17:16:58	<aude>	https://gerrit.wikimedia.org/r/#/c/124516/ want to merge
2014-04-08 17:17:07	<aude>	i can deploy it and sync the cirrus thing
2014-04-08 17:17:19	<manybubbles>	thanks1
2014-04-08 17:17:22	<hoo>	ok, also looks good on WD
2014-04-08 17:17:30	<aude>	ok
2014-04-08 17:17:45	<aude>	let me sync cirrus
2014-04-08 17:17:52	<hoo>	go ahead
2014-04-08 17:17:53	<Nemo_bis>	Oh, today is the day
2014-04-08 17:18:06	<aude>	it's the day :)
2014-04-08 17:18:10	<icinga-wm>	RECOVERY - Puppet freshness on lvs4001 is OK: puppet ran at Tue Apr 8 17:18:03 UTC 2014
2014-04-08 17:19:18	<hoo>	aude: You also sorted the wikidataclient dblist? :P
2014-04-08 17:19:53	<aude>	yes
2014-04-08 17:20:04	<hoo>	Ok, looks good to me, can approve whenever you want
2014-04-08 17:20:05	<aude>	they will get sorted eventually
2014-04-08 17:20:13	<aude>	doing chad's thing
2014-04-08 17:20:30	<icinga-wm>	RECOVERY - Puppet freshness on amslvs1 is OK: puppet ran at Tue Apr 8 17:20:23 UTC 2014
2014-04-08 17:21:30	<icinga-wm>	RECOVERY - Puppet freshness on lvs1001 is OK: puppet ran at Tue Apr 8 17:21:24 UTC 2014
2014-04-08 17:21:50	<icinga-wm>	RECOVERY - Puppet freshness on amslvs4 is OK: puppet ran at Tue Apr 8 17:21:45 UTC 2014
2014-04-08 17:22:30	<icinga-wm>	RECOVERY - Puppet freshness on lvs4002 is OK: puppet ran at Tue Apr 8 17:22:21 UTC 2014
2014-04-08 17:22:43	<logmsgbot>	!log aude synchronized wmf-config/CirrusSearch-labs.php 'config change for beta, to enable highlighting'
2014-04-08 17:22:47	<morebots>	Logged the message, Master
2014-04-08 17:23:06	<aude>	hoo: ready
2014-04-08 17:23:45	<grrrit-wm>	('CR') 'Hoo man': [C: '2'] "Preparation finished, so do this! \o/" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
2014-04-08 17:23:49	<aude>	yay!
2014-04-08 17:23:51	<hoo>	there you go ;)
2014-04-08 17:23:53	<grrrit-wm>	('Merged') 'jenkins-bot': Enable Wikibase on Wikiquote [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
2014-04-08 17:27:20	<hoo>	aude: About to sync or shall I take it?
2014-04-08 17:27:21	<aude>	sync dblist then wmf-config?
2014-04-08 17:27:31	<Nemo_bis>	waiting
2014-04-08 17:27:43	<aude>	no other way
2014-04-08 17:27:52	<hoo>	other way round sounds sane
2014-04-08 17:28:02	<aude>	wmf-config then dblist is good
2014-04-08 17:28:06	<hoo>	wmf-config changes will work w/o the rest
2014-04-08 17:28:10	<aude>	right
2014-04-08 17:28:20	<aude>	that' what ree-dy did for wikisource
2014-04-08 17:28:52	<aude>	doing
2014-04-08 17:28:55	<hoo>	:)
2014-04-08 17:28:59	<logmsgbot>	!log aude synchronized wmf-config 'config changes to enable Wikibase on Wikiquote'
2014-04-08 17:29:04	<morebots>	Logged the message, Master
2014-04-08 17:29:12	<grrrit-wm>	('PS1') 'Matthias Mullie': Increase Flow cache version [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124646'
2014-04-08 17:29:52	<logmsgbot>	!log aude synchronized wikidataclient.dblist 'Enable Wikibase on Wikiquote'
2014-04-08 17:29:57	<morebots>	Logged the message, Master
2014-04-08 17:30:01	<hoo>	oO
2014-04-08 17:30:02	<hoo>	:)
2014-04-08 17:30:12	<aude>	alright time to check it's all good
2014-04-08 17:30:17	<hoo>	on that
2014-04-08 17:31:13	<hoo>	oh well... I think we have to bump wgCacheEpoch once again
2014-04-08 17:31:14	<hoo>	aude: ^
2014-04-08 17:31:36	<aude>	huh
2014-04-08 17:31:45	<aude>	ah, yes
2014-04-08 17:32:00	<hoo>	shall I patch or will you?
2014-04-08 17:32:26	<Nemo_bis>	https://www.wikidata.org/wiki/Q189119#sitelinks-wikiquote
2014-04-08 17:32:34	<hoo>	Nemo_bis: Yes, the usual stuff
2014-04-08 17:32:34	<aude>	go ahead
2014-04-08 17:33:06	<aude>	it says list of values is complete
2014-04-08 17:33:09	<aude>	i assume caching
2014-04-08 17:33:16	<aude>	on Q60
2014-04-08 17:33:57	<aude>	debug=true, i can add wikiquote
2014-04-08 17:34:23	<Nemo_bis>	yep, I did action=purge
2014-04-08 17:34:23	<grrrit-wm>	('PS1') 'Hoo man': Bump wgCacheEpoch for Wikidata after enabling Wikiquote langlinks [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124648'
2014-04-08 17:34:24	<hoo>	yep
2014-04-08 17:34:31	<hoo>	aude: ^
2014-04-08 17:34:35	<aude>	ok
2014-04-08 17:35:21	<ottomata>	!log restarted gmetad on nickel to fix ganglia
2014-04-08 17:35:26	<morebots>	Logged the message, Master
2014-04-08 17:35:33	<grrrit-wm>	('CR') 'Aude': [C: '2'] Bump wgCacheEpoch for Wikidata after enabling Wikiquote langlinks [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124648' (owner: 'Hoo man')
2014-04-08 17:35:40	<grrrit-wm>	('Merged') 'jenkins-bot': Bump wgCacheEpoch for Wikidata after enabling Wikiquote langlinks [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124648' (owner: 'Hoo man')
2014-04-08 17:37:00	<hoo>	aude: Syncing? I have to sync a touch out
2014-04-08 17:37:10	<aude>	doing
2014-04-08 17:37:12	<hoo>	ok
2014-04-08 17:37:18	<logmsgbot>	!log aude synchronized wmf-config/Wikibase.php 'bump wgCacheEpoch for wikidata after enabling wikiquote site links'
2014-04-08 17:37:19	<aude>	just being careful
2014-04-08 17:37:22	<morebots>	Logged the message, Master
2014-04-08 17:37:28	<logmsgbot>	!log hoo synchronized php-1.23wmf20/extensions/Wikidata/extensions/Wikibase/lib/resources/wikibase.Site.js 'touch'
2014-04-08 17:37:32	<morebots>	Logged the message, Master
2014-04-08 17:37:34	<hoo>	that should purge the sites cache
2014-04-08 17:37:43	<greg-g>	"13:37 < aude> just being careful" +1 ;)
2014-04-08 17:37:44	<hoo>	in resource loader
2014-04-08 17:37:47	<aude>	:)
2014-04-08 17:38:25	<aude>	still says complete
2014-04-08 17:38:30	<hoo>	mh :/
2014-04-08 17:38:45	<aude>	sites module has always been a pain
2014-04-08 17:40:24	<aude>	maybe php-1.23wmf20/extensions/Wikidata/extensions/Wikibase/lib/includes/modules/SitesModule.php ?
2014-04-08 17:40:43	<hoo>	aude: Wont help, RL does timestamps based on the JS scripts
2014-04-08 17:40:50	<aude>	hmmm, ok
2014-04-08 17:41:13	<hoo>	works for me
2014-04-08 17:41:16	<hoo>	now at least
2014-04-08 17:41:35	<aude>	trying in firefox
2014-04-08 17:41:39	<aude>	might be my caching
2014-04-08 17:41:42	<hoo>	\o/ Just added the first link
2014-04-08 17:41:46	<hoo>	https://www.wikidata.org/wiki/Q40904#sitelinks-wikiquote
2014-04-08 17:41:48	<aude>	already did one :)
2014-04-08 17:41:54	<aude>	with debug=true
2014-04-08 17:41:59	<hoo>	Cheating :D
2014-04-08 17:42:11	<aude>	heh
2014-04-08 17:42:23	<aude>	looks good in firefox
2014-04-08 17:42:30	<aude>	i have to assume it's my cache
2014-04-08 17:42:31	<Nemo_bis>	I did one ten minutes ago already :P
2014-04-08 17:42:35	<hoo>	:P
2014-04-08 17:42:36	<aude>	yay
2014-04-08 17:42:45	<hoo>	Nemo_bis: with debug true, I guess?!
2014-04-08 17:42:50	<Nemo_bis>	lol Heisenberg
2014-04-08 17:42:55	<Nemo_bis>	19.34 < Nemo_bis> yep, I did action=purge
2014-04-08 17:43:01	<hoo>	:P
2014-04-08 17:43:01	<aude>	ah
2014-04-08 17:43:50	<Guest75555>	Is there a procedure to delete gerrit repositories?
2014-04-08 17:45:00	<aude>	i can add links in wikidata now in chrome
2014-04-08 17:45:09	<hoo>	aude: https://en.wikiquote.org/w/index.php?title=Werner_Heisenberg&action=info mh
2014-04-08 17:45:14	<hoo>	why is it not showing up?
2014-04-08 17:45:34	<Nemo_bis>	Guest64226 / krinkle : probably you can ask on the same gerrit queue page as usual
2014-04-08 17:45:53	<hoo>	ah, I see
2014-04-08 17:45:57	<Nemo_bis>	unless it's not "your" repository, in which case maybe a bug is better
2014-04-08 17:46:11	<hoo>	dispatching is ... :S
2014-04-08 17:47:21	<aude>	hmmm
2014-04-08 17:47:28	<hoo>	https://www.wikidata.org/wiki/Special:DispatchStats
2014-04-08 17:47:44	<aude>	i did action=purge on https://en.wikiquote.org/wiki/New_York_City
2014-04-08 17:47:46	<hoo>	aude: Can we safely skip theses changes? If not just waiting is also fine
2014-04-08 17:47:54	<hoo>	it's catching up rather quickly AFAIS
2014-04-08 17:47:55	<aude>	removed dewikiquote
2014-04-08 17:48:08	<aude>	we can wait
2014-04-08 17:48:16	<bd808\|deploy>	waits in line to do a group0 to 1.23wmf21 scap
2014-04-08 17:48:28	<aude>	give us 5 more minutes to poke
2014-04-08 17:48:43	<bd808\|deploy>	aude: Sounds good
2014-04-08 17:48:59	<aude>	i think we're ok though...
2014-04-08 17:49:32	<aude>	or nothing we solve in 5 min, but didn't break anything
2014-04-08 17:50:51	<hoo>	aude: I can bump the chd_seen fields
2014-04-08 17:51:12	<aude>	ok
2014-04-08 17:52:05	<hoo>	Just looking for the right change id
2014-04-08 17:53:43	<hoo>	got that
2014-04-08 17:54:37	<aude>	something is weird with wikiquote... like it's not actually enabled now
2014-04-08 17:54:45	<aude>	but sure i saw it was
2014-04-08 17:55:29	<aude>	thinks this happened with wikisource
2014-04-08 17:56:19	<hoo>	!log changed the Wikidata wb_changes_dispatch position of all wikiquote wikis to 118158153
2014-04-08 17:56:23	<morebots>	Logged the message, Master
2014-04-08 17:56:39	<aude>	enwikiquote is in wikidataclient.dblist
2014-04-08 17:56:42	<hoo>	20140408172900
2014-04-08 17:57:03	<hoo>	that was the timestamp, should be a few moments before anything happened regarding wikiquote
2014-04-08 17:57:12	<aude>	ok
2014-04-08 17:57:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 540.333313
2014-04-08 17:58:28	<hoo>	still https://en.wikiquote.org/w/index.php?title=Werner_Heisenberg&action=info
2014-04-08 17:58:56	<hoo>	Wikidata is not even loaded there... wtf
2014-04-08 17:58:59	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 645.666687
2014-04-08 17:59:03	<aude>	right,
2014-04-08 17:59:05	<aude>	i'm sure it was
2014-04-08 17:59:25	<aude>	do i have to sync dblist again?
2014-04-08 17:59:37	<aude>	did we somehow undo it?
2014-04-08 18:00:58	<hoo>	no, looks good on a random mw* machine
2014-04-08 18:01:09	<icinga-wm>	PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 1694 MB (2% inode=86%):
2014-04-08 18:01:14	<hoo>	ah
2014-04-08 18:01:50	<logmsgbot>	!log hoo synchronized wmf-config/InitialiseSettings.php 'Touch to clear config. cache'
2014-04-08 18:01:54	<morebots>	Logged the message, Master
2014-04-08 18:01:55	<aude>	ok
2014-04-08 18:02:09	<aude>	it's back!
2014-04-08 18:02:11	<hoo>	Sorry, I forgot about that
2014-04-08 18:02:33	<aude>	was about to try that
2014-04-08 18:02:37	<hoo>	:)
2014-04-08 18:02:41	<aude>	touch all the wikidata things :)
2014-04-08 18:02:43	<bd808\|deploy>	wants to fix https://bugzilla.wikimedia.org/show_bug.cgi?id=58618 so that's automatic
2014-04-08 18:02:56	<aude>	i think we are done!
2014-04-08 18:03:19	<aude>	i am sure this happened on wikisource or previously where it was enabled and then not
2014-04-08 18:03:38	<aude>	puzzled but we're good now
2014-04-08 18:04:13	<hoo>	Yep, looks good to me
2014-04-08 18:04:23	<bd808\|deploy>	aude, hoo: All clear for me to mess with /a/common on tin and then scap?
2014-04-08 18:04:37	<hoo>	Yep, go ahead... we're done for now :)
2014-04-08 18:04:47	<bd808\|deploy>	Cool
2014-04-08 18:05:08	<aude>	done
2014-04-08 18:06:11	<grrrit-wm>	('PS1') 'BryanDavis': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124655'
2014-04-08 18:06:50	<greg-g>	crosses fingers and knocks on wood
2014-04-08 18:07:03	<grrrit-wm>	('CR') 'BryanDavis': [C: '2'] Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124655' (owner: 'BryanDavis')
2014-04-08 18:07:05	<aude>	too!
2014-04-08 18:07:46	<bd808\|deploy>	greg-g: Aaron merged my fix so in theory I should only need one scap. I'll verify the file after the first scap to be certain
2014-04-08 18:08:21	<greg-g>	nods
2014-04-08 18:08:28	<grrrit-wm>	('Merged') 'jenkins-bot': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124655' (owner: 'BryanDavis')
2014-04-08 18:10:36	<logmsgbot>	!log bd808 Started scap: group0 wikis to 1.23wmf21 (with patch for bug 63659)
2014-04-08 18:10:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:10:41	<morebots>	Logged the message, Master
2014-04-08 18:11:25	<bd808\|deploy>	l10n cache did not rebuild which is a great sign
2014-04-08 18:11:58	<jackmcbarn>	Unable to open /usr/local/apache/common-local/wikiversions.cdb.
2014-04-08 18:11:58	<MatmaRex>	https://pl.wikipedia.org/w/index.php?title=Dyskusja_wikiprojektu:%C5%9Ar%C3%B3dziemie&oldid=prev&diff=39218000
2014-04-08 18:12:01	<MatmaRex>	i get a "Unable to open /usr/local/apache/common-local/wikiversions.cdb."
2014-04-08 18:12:10	<andre__>	...and same here.
2014-04-08 18:12:12	<manybubbles>	[2014-04-08 18:11:37] Fatal error: Unable to open /usr/local/apache/common-local/wikiversions.cdb.
2014-04-08 18:12:15	<rschen7754>	uh-oh
2014-04-08 18:12:19	<bd808\|deploy>	Yeah. fuck
2014-04-08 18:12:21	<manybubbles>	yeah, you got it
2014-04-08 18:12:22	<Steinsplitter>	here the same
2014-04-08 18:12:26	<bd808\|deploy>	It will be fixed in a few moments
2014-04-08 18:12:30	<manybubbles>	thats everything
2014-04-08 18:12:31	<greg-g>	well shit
2014-04-08 18:12:45	<bd808\|deploy>	fuuuuck
2014-04-08 18:12:49	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
2014-04-08 18:12:57	<bd808\|deploy>	There's my first crash all of the wikis
2014-04-08 18:12:59	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:13:00	<MaxSem>	SNAFU?
2014-04-08 18:13:05	<aude>	wtf
2014-04-08 18:13:13	<Amgine>	down on wm
2014-04-08 18:13:21	<manybubbles>	damn it, I was actually reading an article and I reloaded it to test
2014-04-08 18:13:23	<bd808\|deploy>	It was my "fix" for the scap problem
2014-04-08 18:13:25	<manybubbles>	now I can't read it while I wait
2014-04-08 18:13:29	<icinga-wm>	PROBLEM - Apache HTTP on mw1190 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.007 second response time
2014-04-08 18:13:29	<icinga-wm>	PROBLEM - Apache HTTP on mw1055 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.013 second response time
2014-04-08 18:13:29	<icinga-wm>	PROBLEM - Apache HTTP on mw1150 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.004 second response time
2014-04-08 18:13:29	<icinga-wm>	PROBLEM - Apache HTTP on mw1101 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.005 second response time
2014-04-08 18:13:29	<icinga-wm>	PROBLEM - Apache HTTP on mw1177 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.009 second response time
2014-04-08 18:13:29	<icinga-wm>	PROBLEM - Apache HTTP on mw1138 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.003 second response time
2014-04-08 18:13:30	<icinga-wm>	PROBLEM - Apache HTTP on mw1187 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.006 second response time
2014-04-08 18:13:30	<icinga-wm>	PROBLEM - Apache HTTP on mw1220 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.006 second response time
2014-04-08 18:13:31	<icinga-wm>	PROBLEM - Apache HTTP on mw1197 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.013 second response time
2014-04-08 18:13:31	<icinga-wm>	PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - check plugin (check_job_queue) or PHP errors -
2014-04-08 18:13:33	<marktraceur>	Whoa
2014-04-08 18:13:34	<aude>	cries
2014-04-08 18:13:39	<icinga-wm>	PROBLEM - Apache HTTP on mw1213 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.018 second response time
2014-04-08 18:13:39	<icinga-wm>	PROBLEM - Apache HTTP on mw1113 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.012 second response time
2014-04-08 18:13:39	<icinga-wm>	PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.008 second response time
2014-04-08 18:13:42	<icinga-wm>	PROBLEM - Apache HTTP on mw1200 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.006 second response time
2014-04-08 18:13:42	<icinga-wm>	PROBLEM - Apache HTTP on mw1035 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.022 second response time
2014-04-08 18:13:42	<icinga-wm>	PROBLEM - Apache HTTP on mw1031 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.011 second response time
2014-04-08 18:13:42	<icinga-wm>	PROBLEM - Apache HTTP on mw1090 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.010 second response time
2014-04-08 18:13:42	<icinga-wm>	PROBLEM - Apache HTTP on mw1154 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.007 second response time
2014-04-08 18:13:52	<bd808\|deploy>	It will be fixed soon… scap will fix it at the end
2014-04-08 18:13:54	<logmsgbot>	!log bd808 Finished scap: group0 wikis to 1.23wmf21 (with patch for bug 63659) (duration: 03m 18s)
2014-04-08 18:13:59	<morebots>	Logged the message, Master
2014-04-08 18:14:00	<aude>	alright
2014-04-08 18:14:01	<bd808\|deploy>	Should be fixed now
2014-04-08 18:14:04	<manybubbles>	fixed
2014-04-08 18:14:15	<greg-g>	breathes again
2014-04-08 18:14:22	<jackmcbarn>	can whoever's in charge of icinga-wm bring it back to life?
2014-04-08 18:14:35	<sjoerddebruin>	Damn it. :P
2014-04-08 18:14:37	<greg-g>	jackmcbarn: it'll again automatically, I believe
2014-04-08 18:14:38	<PiRCarre>	Someone
2014-04-08 18:14:39	<MaxSem>	so what happened?
2014-04-08 18:14:47	<PiRCarre>	Oh, you know about it?
2014-04-08 18:14:48	<Marybelle>	greg-g: You accidentally a verb.
2014-04-08 18:14:49	<PiRCarre>	ok
2014-04-08 18:14:50	<icinga-wm>	RECOVERY - Apache HTTP on mw1027 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.066 second response time
2014-04-08 18:14:50	<icinga-wm>	RECOVERY - Apache HTTP on mw1092 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.073 second response time
2014-04-08 18:14:51	<icinga-wm>	RECOVERY - Apache HTTP on mw1073 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.084 second response time
2014-04-08 18:14:51	<icinga-wm>	RECOVERY - Apache HTTP on mw1018 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.111 second response time
2014-04-08 18:14:51	<bd808\|deploy>	Patch https://gerrit.wikimedia.org/r/#/c/124627/
2014-04-08 18:14:52	<icinga-wm>	RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.062 second response time
2014-04-08 18:14:52	<icinga-wm>	RECOVERY - Apache HTTP on mw1217 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.059 second response time
2014-04-08 18:15:07	<greg-g>	Marybelle: :)
2014-04-08 18:15:16	<bd808\|deploy>	I'll write up the email. I know exactly what I fucked up
2014-04-08 18:15:21	<PiRCarre>	bd808\|deploy: thanks, I was just about to report "Unable to open /usr/local/apache/common-local/wikiversions.cdb." - glad to see it's under control
2014-04-08 18:15:29	<aude>	breathes
2014-04-08 18:15:54	<paravoid>	what's going on?
2014-04-08 18:16:08	<paravoid>	we are all at dinner
2014-04-08 18:16:23	<manybubbles>	fixed now
2014-04-08 18:16:24	<aude>	it's ok
2014-04-08 18:16:25	<bd808\|deploy>	paravoid: My fault. Should be fixed now
2014-04-08 18:16:31	<paravoid>	okay
2014-04-08 18:16:35	<greg-g>	paravoid: go back to dinner, all's ok again :)
2014-04-08 18:16:36	<aude>	scap temporarily broke everything though
2014-04-08 18:16:36	<paravoid>	do you need anything?
2014-04-08 18:16:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 183.266663
2014-04-08 18:16:39	<paravoid>	ok
2014-04-08 18:16:44	<paravoid>	manual page us if something happens
2014-04-08 18:16:52	<greg-g>	paravoid: nope, known ef up
2014-04-08 18:16:57	<greg-g>	paravoid: will do, enjoy!
2014-04-08 18:17:05	<paravoid>	ciao
2014-04-08 18:18:17	<grrrit-wm>	('PS2') 'Gerg? Tisza': Add setting to show a survey for MediaViewer users on some sites [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036'
2014-04-08 18:18:56	<grrrit-wm>	('CR') 'Gerg? Tisza': "Updated to display feedback survey on beta enwiki." [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036' (owner: 'Gerg? Tisza')
2014-04-08 18:19:29	<bd808\|deploy>	greg-g: I just reverted my patch to scap that caused that cascade of horribleness
2014-04-08 18:19:36	<greg-g>	:)
2014-04-08 18:19:44	<bd808\|deploy>	One the plus side, group0 is on wmf21 now
2014-04-08 18:19:50	<greg-g>	lol
2014-04-08 18:19:58	<greg-g>	literal-lol
2014-04-08 18:20:09	<aude>	scared to change it back
2014-04-08 18:20:20	<greg-g>	"Don't. Touch. Any. Thing."
2014-04-08 18:20:25	<aude>	i suppose if bd808\|deploy 's patch is reverted then ok
2014-04-08 18:20:39	<greg-g>	well, we still have the previous issue which it was trying to fix ;)
2014-04-08 18:20:59	<greg-g>	1 step forward, 1 step back
2014-04-08 18:21:23	<bd808\|deploy>	So yes we are temporarily back to needing to double-scap, but I'll make a patch that doesn't melt the world after lunch
2014-04-08 18:22:25	<greg-g>	bd808\|deploy: :)
2014-04-08 18:23:15	<aude>	wikiquote etc all looks fine, so i'm going home / eating
2014-04-08 18:23:20	<aude>	back in hour
2014-04-08 18:23:26	<greg-g>	k, I'll do the same
2014-04-08 18:23:33	<Nemo_bis>	quite late dinner for berlin
2014-04-08 18:23:47	<manybubbles>	so I told my wife we broke the internet. she told me facebook was working....
2014-04-08 18:24:18	<hoo>	Nemo_bis: It's never to late for food :P
2014-04-08 18:24:41	<Jamesofur>	^
2014-04-08 18:28:38	<Nemo_bis>	hoo: well, I'd call death for starvation, pellagra etc. "too late" :P
2014-04-08 18:29:07	<hoo>	Nemo_bis: :P To late as in time of the day...
2014-04-08 18:29:08	<hoo>	:D
2014-04-08 18:30:17	<ori>	hoo: http://p.defau.lt/?md_cbLJuORDNsGkhY6_NAg :P
2014-04-08 18:30:55	<hoo>	at least the other errors are gone now, I guess
2014-04-08 18:31:28	<greg-g>	manybubbles: :(
2014-04-08 18:31:42	<greg-g>	goes to lunch for real
2014-04-08 18:32:34	<ori>	hoo: yeah, i submitted a patch for hhvm to fix that other issue btw
2014-04-08 18:32:49	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.144
2014-04-08 18:34:15	<hoo>	ori: Oh... nice that it's actually done in PHP :)
2014-04-08 18:35:34	<manybubbles>	yeah yeah yeah, elasticsearch 1012 is being upgraded
2014-04-08 18:37:56	<ori>	hoo: which component should that be filed under?
2014-04-08 18:39:25	<hoo>	ori: already done https://bugzilla.wikimedia.org/show_bug.cgi?id=63691
2014-04-08 18:39:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 639.299988
2014-04-08 18:39:40	<ori>	oh cool, thanks!
2014-04-08 18:42:09	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 530.733337
2014-04-08 18:42:20	<hoo>	ori: Any idea who to poke about https://gerrit.wikimedia.org/r/121709 ?
2014-04-08 18:43:46	<grrrit-wm>	('CR') 'Matanya': add interface speed check for all hosts ('2' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124606' (owner: 'Cmjohnson')
2014-04-08 18:44:08	<grrrit-wm>	('PS2') 'Ori.livneh': Change wgServer and wgCanonicalServer for arbcom wikis [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121709' (owner: 'Hoo man')
2014-04-08 18:44:53	<grrrit-wm>	('CR') 'Ori.livneh': [C: '2'] Change wgServer and wgCanonicalServer for arbcom wikis [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121709' (owner: 'Hoo man')
2014-04-08 18:45:06	<logmsgbot>	!log ori updated /a/common to {{Gerrit\|I4b18e4ce8}}: Change wgServer and wgCanonicalServer for arbcom wikis
2014-04-08 18:45:11	<morebots>	Logged the message, Master
2014-04-08 18:45:28	<hoo>	heh :)
2014-04-08 18:45:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:45:50	<logmsgbot>	!log ori synchronized wmf-config/InitialiseSettings.php 'I4b18e4ce8: Change wgServer and wgCanonicalServer for arbcom wikis'
2014-04-08 18:45:55	<morebots>	Logged the message, Master
2014-04-08 18:53:40	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:56:09	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 18:57:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 172.800003
2014-04-08 18:58:59	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 18:59:00	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:00	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:00	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:00	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:09	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.10
2014-04-08 18:59:09	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:09	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:10	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1016 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 18:59:29	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
2014-04-08 19:00:03	<manybubbles>	blhe
2014-04-08 19:00:11	<manybubbles>	it recovered in a few seconds
2014-04-08 19:00:16	<manybubbles>	not sure why it did that
2014-04-08 19:07:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 341.200012
2014-04-08 19:12:00	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1001 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:00	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:00	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:00	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:10	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:11	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1013 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:11	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1016 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:12:11	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:13:16	<manybubbles>	thats right
2014-04-08 19:13:18	<manybubbles>	horrible check
2014-04-08 19:13:36	<manybubbles>	no errors in the logs associated with those warnings
2014-04-08 19:18:49	<icinga-wm>	RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000
2014-04-08 19:20:55	<huh>	https://en.wikipedia.org/wiki/Wikipedia:VPT#Heartbleed_bug.3F
2014-04-08 19:23:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 531.166687
2014-04-08 19:24:29	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.12
2014-04-08 19:24:49	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:50	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:50	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:24:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
2014-04-08 19:25:09	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 635.799988
2014-04-08 19:25:11	<Jamesofur>	kicks icinga-wm
2014-04-08 19:26:39	<icinga-wm>	PROBLEM - DPKG on elastic1015 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
2014-04-08 19:28:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 19:29:38	<matanya>	huh: it is being fixed by ops
2014-04-08 19:31:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 19:36:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 19:37:49	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:49	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:50	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:37:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:00	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:07	<huh>	again?
2014-04-08 19:38:09	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 19:38:10	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:10	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:10	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1016 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.13
2014-04-08 19:38:10	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:29	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
2014-04-08 19:38:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 224.199997
2014-04-08 19:39:39	<icinga-wm>	RECOVERY - DPKG on elastic1015 is OK: All packages OK
2014-04-08 19:40:19	<manybubbles>	oh shut up
2014-04-08 19:40:52	<manybubbles>	I'm doing rolling restarts
2014-04-08 19:41:47	<manybubbles>	got it: labswiki_content_1394813391
2014-04-08 19:41:53	<manybubbles>	that thing is configured without replicas
2014-04-08 19:46:40	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 341.066681
2014-04-08 19:48:00	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:01	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:01	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1001 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:01	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:10	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:10	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1013 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:10	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:10	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1016 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:30	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1015 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:43	<manybubbles>	and, more noise!
2014-04-08 19:48:49	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:49	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:49	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:48:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5308: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 303
2014-04-08 19:48:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5308: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 303
2014-04-08 19:48:59	<icinga-wm>	PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5308: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 303
2014-04-08 19:49:22	<manybubbles>	bit me labswiki!
2014-04-08 19:52:34	<bd808\|LUNCH>	cheers manybubbles on
2014-04-08 19:52:53	<manybubbles>	it'll spam us again in a few minutes
2014-04-08 19:52:59	<manybubbles>	labswiki recovered a long time ago
2014-04-08 19:53:05	<manybubbles>	it was only out for ~30 seconds each time
2014-04-08 19:53:20	<manybubbles>	but ganglia wants all the shards on all the wikis to be recovered before it is happy
2014-04-08 19:53:59	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1005 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:53:59	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:53:59	<icinga-wm>	RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
2014-04-08 19:56:15	<manybubbles>	!log upgraded all elasticsearch servers except elastic1008. that is coming now.
2014-04-08 19:56:20	<morebots>	Logged the message, Master
2014-04-08 19:58:20	<manybubbles>	!log finished upgrading to Elasticsearch 1.1.0. The process went well with no issues other then some knocking out search in labs 3 times for 30 seconds a piece. And logging lots of nasty warnings to irc. I've started to the process to fix search in labs so it won't happen again.
2014-04-08 19:58:25	<morebots>	Logged the message, Master
2014-04-08 20:05:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 420.066681
2014-04-08 20:08:09	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 539.900024
2014-04-08 20:10:29	<icinga-wm>	PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 20:10:29	<icinga-wm>	PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 20:10:29	<icinga-wm>	PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 20:10:29	<icinga-wm>	PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 20:10:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:12:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:16:56	<se4598>	Does someone here know about dns issues with wmflabs-domains or related stuff that happened recently?
2014-04-08 20:19:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:20:41	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 176.399994
2014-04-08 20:22:09	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:26:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 368.466675
2014-04-08 20:28:02	<cajoel>	re:heartbleed, I think we'll be wanting a new corp certificate... do you guys have a favorite vendor for star certs these days?
2014-04-08 20:28:21	<cajoel>	it's almost due for a re-up anyway, so it's worth the effort
2014-04-08 20:29:53	<ebernhardson>	r
2014-04-08 20:48:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 642.700012
2014-04-08 20:51:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:51:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 20:52:09	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 537.099976
2014-04-08 20:59:46	<odder>	greg-g: don't believe you
2014-04-08 20:59:58	<odder>	http://lists.wikimedia.org/pipermail/wikitech-ambassadors/2014-April/000666.html
2014-04-08 21:00:04	<odder>	This is the work of the Beast
2014-04-08 21:00:11	<bd808>	greg-g: Do you still want to try group1 to 1.23wmf21 today or have we had enough excitement?
2014-04-08 21:00:53	<apergos>	reminds folks that all ops are out at a bar except for those who are about to go to sleep :-D
2014-04-08 21:01:06	<greg-g>	bd808: we're back to "if you run scap, run it twice" world, right?
2014-04-08 21:01:10	<greg-g>	apergos: :)
2014-04-08 21:01:23	<greg-g>	odder: which part? :)
2014-04-08 21:01:36	<bd808>	greg-g: Yes, but for group1 to 1.23wmf21 we only need to run sync-wikiversions
2014-04-08 21:01:49	<greg-g>	right
2014-04-08 21:02:09	<greg-g>	the world looks sane on phase0?
2014-04-08 21:02:11	<greg-g>	looks
2014-04-08 21:02:34	<odder>	greg-g: all of it - notice the number immediately preceding .html
2014-04-08 21:02:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 232.46666
2014-04-08 21:02:48	<greg-g>	odder: haha
2014-04-08 21:03:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:03:54	<greg-g>	this is neat: https://graphite.wikimedia.org/render/…
2014-04-08 21:04:36	<greg-g>	I think that's what ori told me yesterdayt to not worry about
2014-04-08 21:05:09	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:05:25	<greg-g>	bd808: if we do, we do now, so we have 2 hours before SWAT of settle bug report time. May I take your whole day?
2014-04-08 21:06:36	<bd808>	greg-g: I'm yours to command. :)
2014-04-08 21:06:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 269.866669
2014-04-08 21:06:42	<odder>	http://heartbleed.com/
2014-04-08 21:06:48	<odder>	Q&ampA
2014-04-08 21:06:55	<odder>	:-P
2014-04-08 21:07:09	<greg-g>	bd808: go forth, please
2014-04-08 21:09:36	<grrrit-wm>	('PS1') 'BryanDavis': Group1 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124744'
2014-04-08 21:11:12	<grrrit-wm>	('CR') 'BryanDavis': [C: '2'] Group1 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124744' (owner: 'BryanDavis')
2014-04-08 21:11:20	<grrrit-wm>	('Merged') 'jenkins-bot': Group1 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124744' (owner: 'BryanDavis')
2014-04-08 21:12:17	<logmsgbot>	!log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.23wmf21
2014-04-08 21:12:23	<morebots>	Logged the message, Master
2014-04-08 21:12:47	<hoo>	greg-g: Have you guys already killed all user sessions?
2014-04-08 21:12:52	<hoo>	Can't see a server admin log entry
2014-04-08 21:15:44	<odder>	greg-g: I did a https://commons.wikimedia.org/wiki/Commons:Village_pump#Users_are_being_forced_to_log_out
2014-04-08 21:18:21	<Jamesofur>	Thanks odder, I left a note about it on en VPT since I saw a question about the bug in general
2014-04-08 21:18:48	<odder>	Maybe I'll cross-post that to Meta too
2014-04-08 21:19:59	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 21:20:14	<logmsgbot>	!log bd808 Purged l10n cache for 1.23wmf18
2014-04-08 21:20:19	<morebots>	Logged the message, Master
2014-04-08 21:21:46	<logmsgbot>	!log bd808 Purged l10n cache for 1.23wmf19
2014-04-08 21:21:50	<morebots>	Logged the message, Master
2014-04-08 21:21:54	<greg-g>	hoo: in process
2014-04-08 21:22:55	<hoo>	:)
2014-04-08 21:23:09	<greg-g>	hoo: it takes longer than you'd imagine, maybe :)
2014-04-08 21:23:37	<bd808\|deploy>	greg-g: group1 to 1.23wmf21 is {{done}}
2014-04-08 21:23:40	<se4598>	greg-g: just change the cookie name? (like last time)
2014-04-08 21:24:09	<greg-g>	se4598: I'm defering to chris on it (not sure what his exact process is, honestly)
2014-04-08 21:24:14	<greg-g>	bd808\|deploy: ty
2014-04-08 21:24:53	<se4598>	mh, the tokens will be still valid I think, wasn't a good idea
2014-04-08 21:25:14	<bd808>	se4598: Yeah I think that's why it takes a while
2014-04-08 21:26:45	<hoo>	greg-g: Well given how many users we have and that we probably don't want to hammer the DBs to much, I can imagine this to take some time
2014-04-08 21:26:52	<greg-g>	nods
2014-04-08 21:28:16	<hoo>	csteipp: Why not run one process per shard?
2014-04-08 21:29:24	<odder>	Jamesofur: if you're keeping track of things, I alerted Commons and Meta; perhaps someone would need to alert the other big Wikipedias
2014-04-08 21:29:35	<odder>	Dunno if the message to tech-ambassadors will be enough; may be.
2014-04-08 21:30:35	<grrrit-wm>	('PS2') 'MaxSem': Put a safeguard on GeoData's usage of CirrusSearch [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121874'
2014-04-08 21:30:37	<grrrit-wm>	('PS1') 'MaxSem': Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747'
2014-04-08 21:30:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:30:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 535.0
2014-04-08 21:30:54	<grrrit-wm>	('CR') 'jenkins-bot': [V: '-1'] Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747' (owner: 'MaxSem')
2014-04-08 21:31:32	<csteipp>	se4598: Assuming attacker has the login token, they could use the new name and again spoof the user
2014-04-08 21:31:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:31:46	<grrrit-wm>	('PS2') 'MaxSem': Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747'
2014-04-08 21:32:09	<Jamesofur>	odder: yeah, I'll see if we can poke people, we're going to send out SM messages as well in a couple minutes
2014-04-08 21:32:19	<Jamesofur>	with a recommendation to password reset
2014-04-08 21:33:09	<odder>	SM?
2014-04-08 21:33:22	<Jamesofur>	sorry, Social Media (Twitter/Facebook/G+ etc)
2014-04-08 21:33:42	<odder>	TMA, Too Many Abbreviations
2014-04-08 21:33:45	<odder>	:)
2014-04-08 21:33:59	<Jamesofur>	yup lol
2014-04-08 21:34:09	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 539.133362
2014-04-08 21:34:10	<Jamesofur>	I abuse them, I even make up my own and forget that they are just in my head
2014-04-08 21:34:23	<HaeB>	https://twitter.com/Wikimedia/status/453646877397757953
2014-04-08 21:34:49	<JohnLewis>	Jamesofur: EUS IAA. TA IANAL.
2014-04-08 21:34:58	<JohnLewis>	*EYS :p
2014-04-08 21:35:42	<odder>	thanks HaeB, retweeted
2014-04-08 21:40:46	<aude>	woah, new code on wikidata?
2014-04-08 21:40:46	<matanya>	Jamesofur: using mass-message might be a good idea
2014-04-08 21:41:15	<greg-g>	aude: yep, all ok?
2014-04-08 21:41:26	<Jamesofur>	HaeB: ^ what do you think? (about MM)
2014-04-08 21:41:48	<greg-g>	wdyt?
2014-04-08 21:42:08	<JohnLewis>	greg-g: itjdi
2014-04-08 21:42:12	<aude>	so we're confident?
2014-04-08 21:42:39	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 187.866669
2014-04-08 21:42:53	<greg-g>	aude: in that it won't break at 2:00 utc? yeah
2014-04-08 21:43:06	<greg-g>	aude: the only thing we're still not confident about is scap on thursday
2014-04-08 21:44:19	<aude>	alright
2014-04-08 21:44:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:44:40	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 320.200012
2014-04-08 21:44:55	<HaeB>	Jamesofur, matanya: i think for the session ending, massmessage would be overkill. regarding the password reset, it's a judgment call (how high one estimates the risk for users who don't change it)
2014-04-08 21:45:24	<matanya>	HaeB: it depends on user rights as well
2014-04-08 21:45:27	<bd808>	aude: The bug that caused all the 1.23wmf21 l10n issues is https://bugzilla.wikimedia.org/show_bug.cgi?id=63659
2014-04-08 21:46:31	<HaeB>	are there any other major sites who notified all users?
2014-04-08 21:46:54	<Jamesofur>	not that I've seen yet, but I have a feeling some are still going through the fixing process
2014-04-08 21:46:55	<aude>	interesting
2014-04-08 21:46:59	<HaeB>	(to recommend a password chanage)
2014-04-08 21:47:10	<hoo>	eg. just got stuff from CloudBees
2014-04-08 21:47:15	<hoo>	github also logged me out
2014-04-08 21:47:37	<HaeB>	would also be interesting to know how quick the wikis were fixed after the news broke yesterday
2014-04-08 21:47:40	<Jamesofur>	latimes has an article about resetting your password, but that's different
2014-04-08 21:48:09	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:48:13	<HaeB>	last night (PT) i filed a RT ticket for the blog, which was vulnerable at the time, but at that point the wikis tested ok already
2014-04-08 21:48:36	<hoo>	The wikis auto update OpenSSL via puppet
2014-04-08 21:49:00	<Jamesofur>	hoo: well ya ;) the question is when we updated puppet ;)
2014-04-08 21:49:24	<hoo>	Jamesofur: The servers do that themselves
2014-04-08 21:49:39	<HaeB>	per https://wikitech.wikimedia.org/wiki/Server_admin_log , the blog (holmium) was pretty late in the game
2014-04-08 21:49:50	<bd808>	The timeline is all in SAL from last night
2014-04-08 21:49:51	<hoo>	Yesterday I posted about that to the internal ops list, but forgot to poke a root to do a apt-cache clean and force puppet run
2014-04-08 21:50:08	<HaeB>	"04:03 Tim: upgrading libssl on ssl1001,ssl1002,ssl1003,ssl1004,ssl1005,ssl1006,ssl1007,ssl1008,ssl1009,ssl3001.esams.wikimedia.org,ssl3002.esams.wikimedia.org,ssl3003.esams.wikimedia.org" - is that the entry for the wikis?
2014-04-08 21:50:37	<bd808>	Mostly yes
2014-04-08 21:53:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:53:39	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 21:53:59	<icinga-wm>	RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000
2014-04-08 21:54:55	<grrrit-wm>	('PS1') 'Jean-Frédéric': Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124754'
2014-04-08 22:01:11	<mwalker>	greg-g, poking you because I'm not sure who's on point for the i18n / scap stuff -- but I recall getting pinged a couple of days ago (on a centralnotice keyword) saying that the i18n update was failing due to exceptions on CN (and others). I'm wondering if CN's fail was due to being on a deployment branch that did not have the JSON updates (until just now).
2014-04-08 22:01:46	<greg-g>	shouldn't be
2014-04-08 22:01:57	<greg-g>	there's backward compat in l10nupdate
2014-04-08 22:02:17	<greg-g>	mwalker: see https://bugzilla.wikimedia.org/show_bug.cgi?id=63659 for all the gorey details
2014-04-08 22:02:33	<mwalker>	puts on tyvek suit
2014-04-08 22:02:38	<greg-g>	:)
2014-04-08 22:30:59	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 22:33:06	<csteipp>	greg-g: Could I push a small centralauth update soon?
2014-04-08 22:33:44	<greg-g>	yeah, now is fine, 30 minutes until swat
2014-04-08 22:34:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:36:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:37:04	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 34.533333
2014-04-08 22:37:34	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 260.733337
2014-04-08 22:38:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:40:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:42:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:44:14	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 625.166687
2014-04-08 22:44:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:45:36	<se4598>	marktraceur: I see in deploy-calendar that you have changeset which especially activates MediaViewer on en-beta. You(r pc) may get hit by https://bugzilla.wikimedia.org/show_bug.cgi?id=63709
2014-04-08 22:46:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:47:22	<marktraceur>	se4598: Is there a fix?
2014-04-08 22:47:50	<marktraceur>	I'm guessing it's an SSL problem
2014-04-08 22:48:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:48:43	<marktraceur>	se4598: Replied on bug
2014-04-08 22:49:09	<grrrit-wm>	('PS1') 'BryanDavis': Create symlink for compile-wikiversions in /usr/local/bin [operations/puppet] - 'https://gerrit.wikimedia.org/r/124763'
2014-04-08 22:49:23	<se4598>	marktraceur: We in #wikimedia-labs haven't one. And thats not about https but dns resolve, so I don't understand what do you mean by https?
2014-04-08 22:49:35	<marktraceur>	Oh, hm
2014-04-08 22:49:37	<marktraceur>	Never mind, sorry
2014-04-08 22:50:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:52:04	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 22:52:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:52:34	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 22:52:56	<se4598>	marktraceur: currently the fix is.....: it may work if you try multiple times or wait some time (minutes, hours) ;P
2014-04-08 22:54:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:56:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:56:54	<icinga-wm>	RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000
2014-04-08 22:58:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
2014-04-08 22:58:41	<hoo>	greg-g: csteipp: got both core changes ready
2014-04-08 22:58:53	<hoo>	I mean changes to the deploy branch
2014-04-08 22:59:52	<csteipp>	hoo: Cool.. one sec and I'll merge and deploy it
2014-04-08 23:00:12	<hoo>	I can also jump in, am on tin still anyway
2014-04-08 23:00:14	<icinga-wm>	RECOVERY - Puppet freshness on mw1109 is OK: puppet ran at Tue Apr 8 23:00:04 UTC 2014
2014-04-08 23:02:24	<icinga-wm>	PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 11:00:04 PM UTC
2014-04-08 23:05:24	<greg-g>	stupid puppet
2014-04-08 23:06:33	<Jasper_Deng>	always wondered what Puppet does anyways
2014-04-08 23:07:09	<Jamesofur>	pulls the strings ;)
2014-04-08 23:07:20	<Jamesofur>	(or, probably better 'is the strings' )
2014-04-08 23:07:26	<hoo>	Jasper_Deng: Playing with the servers :D
2014-04-08 23:08:20	<JohnLewis>	Technically, the sysadmins are a puppet in the WMFs plans, right? :p
2014-04-08 23:08:37	<logmsgbot>	!log csteipp synchronized php-1.23wmf21/extensions/CentralAuth/maintenance 'Push maintenance script for token reset'
2014-04-08 23:08:39	<Jamesofur>	or we're all just puppets in their plans, duh
2014-04-08 23:08:41	<morebots>	Logged the message, Master
2014-04-08 23:09:04	<JohnLewis>	Jamesofur: You're the past of the puppets :p
2014-04-08 23:09:09	<JohnLewis>	*master of the
2014-04-08 23:09:57	<csteipp>	greg-g: CentralAuth updates are out, so swat can go ahead if they were waiting on me
2014-04-08 23:10:01	<Jamesofur>	;) the user with said name may dislike me claiming the title
2014-04-08 23:10:40	<greg-g>	mwalker: ori ebernhardson ^
2014-04-08 23:10:46	<greg-g>	also, what the heck, oit_display ?
2014-04-08 23:10:54	<greg-g>	:)
2014-04-08 23:11:10	<icinga-wm>	PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 23:11:10	<icinga-wm>	PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 23:11:10	<icinga-wm>	PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 23:11:10	<icinga-wm>	PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
2014-04-08 23:11:51	<mwalker>	oh
2014-04-08 23:11:54	<mwalker>	yes; it's 4!
2014-04-08 23:13:25	<Danny_B>	SUL doesn't work?
2014-04-08 23:14:02	<mwalker>	csteipp, ^
2014-04-08 23:14:03	<hoo>	Danny_B: We are logging out all users
2014-04-08 23:14:10	<hoo>	see http://lists.wikimedia.org/pipermail/wikitech-ambassadors/2014-April/000666.html
2014-04-08 23:14:32	<MaxSem>	csteipp, warn ppl with a site notice?
2014-04-08 23:14:35	<se4598>	hoo: you know that this isn't merged? https://gerrit.wikimedia.org/r/124756
2014-04-08 23:15:00	<hoo>	se4598: not this important at the very moments
2014-04-08 23:15:03	<hoo>	* moment
2014-04-08 23:15:23	<csteipp>	Danny_B: SUL should work... You should just be logged out. If you can't login, let me know
2014-04-08 23:15:53	<Jamesofur>	csteipp: will we get logged out each time you hit a wiki we've visited recently? or just the once per user in theory
2014-04-08 23:16:15	<csteipp>	If you're a global user, just once (right now as I logout all the centralauth users)
2014-04-08 23:16:32	<csteipp>	If you have multiple ununified local accounts, each will get logged out
2014-04-08 23:16:51	<Danny_B>	csteipp: i have to log in on every single project although i have central username
2014-04-08 23:16:54	<Amgine>	<grumbles about that><waves fist impotently at it.wp>
2014-04-08 23:17:30	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 135.300003
2014-04-08 23:17:55	<mwalker>	marktraceur, MaxSem I'm going to +2 and confirm https://gerrit.wikimedia.org/r/#/c/124036/2 , https://gerrit.wikimedia.org/r/#/c/121874/2 , https://gerrit.wikimedia.org/r/#/c/124747/
2014-04-08 23:18:30	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 173.666672
2014-04-08 23:18:32	<mwalker>	it would be wonderful if you all could +1 that so that I know you've looked and said this is good to me
2014-04-08 23:18:35	<marktraceur>	'kay
2014-04-08 23:18:53	<Danny_B>	csteipp: +1 to notice ppl with central notice
2014-04-08 23:18:57	<grrrit-wm>	('CR') 'MarkTraceur': [C: ''] Add setting to show a survey for MediaViewer users on some sites [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036' (owner: 'Gerg? Tisza')
2014-04-08 23:19:00	<MaxSem>	+1 ourselves?
2014-04-08 23:19:16	<MaxSem>	doesn't sound very assuring:)
2014-04-08 23:19:21	<mwalker>	nah; you're probably OK MaxSem :p
2014-04-08 23:19:27	<mwalker>	but I don't know who Gergo is
2014-04-08 23:19:44	<mwalker>	but mark was sponsoring the patch
2014-04-08 23:19:53	<MaxSem>	he's tgr :P
2014-04-08 23:20:00	<grrrit-wm>	('CR') 'Mwalker': [C: '2'] Put a safeguard on GeoData's usage of CirrusSearch [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121874' (owner: 'MaxSem')
2014-04-08 23:20:08	<grrrit-wm>	('CR') 'Mwalker': [C: '2'] Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747' (owner: 'MaxSem')
2014-04-08 23:20:21	<grrrit-wm>	('CR') 'Mwalker': [C: '2'] Add setting to show a survey for MediaViewer users on some sites [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036' (owner: 'Gerg? Tisza')
2014-04-08 23:20:27	<ori>	greg-g: missed your ping; still need me?
2014-04-08 23:21:00	<greg-g>	dont think so
2014-04-08 23:23:33	<mwalker>	interesting; sync-common doesn't log to IRC?
2014-04-08 23:23:34	<csteipp>	Danny_B: That doesn't sound right.. At the risk of sounding cliche, can you log out and log back in, and see if that helps?
2014-04-08 23:23:55	<mwalker>	marktraceur, MaxSem can you tell if your configuration stuff got pushed?
2014-04-08 23:24:15	<MaxSem>	mwalker, mine's noop on prod
2014-04-08 23:24:25	<marktraceur>	Ditto, but will check on beta
2014-04-08 23:24:26	<MaxSem>	checking if prod still works...
2014-04-08 23:24:35	<mwalker>	also; marktraceur I presume you want https://gerrit.wikimedia.org/r/#/c/124510/ to go to wmf20 and wmf21?
2014-04-08 23:24:38	<HaeB>	Danny_B, hoo : we're still thinking about massmessage instead (more for the password changing advice)
2014-04-08 23:24:43	<marktraceur>	mwalker: Sorry, only 21
2014-04-08 23:25:24	<marktraceur>	mwalker: Confirmed, beta has the configuration we wanted
2014-04-08 23:26:36	<MaxSem>	mwalker, lgtm
2014-04-08 23:27:40	<icinga-wm>	PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
2014-04-08 23:28:34	<Danny_B>	csteipp: log out from any currently logged project, log back to it and then try if sul works on other?
2014-04-08 23:29:14	<csteipp>	Danny_B: Yeah
2014-04-08 23:29:22	<Danny_B>	csteipp: ok, sec
2014-04-08 23:29:38	<csteipp>	Hmm... Danny_B What's you're wiki username?
2014-04-08 23:30:51	<icinga-wm>	RECOVERY - Puppet freshness on mw1109 is OK: puppet ran at Tue Apr 8 23:30:43 UTC 2014
2014-04-08 23:30:55	<Danny_B>	csteipp: Danny B.
2014-04-08 23:31:17	<Danny_B>	csteipp: seems to work now, will let you know if i'll spot another disconnection
2014-04-08 23:31:27	<csteipp>	Danny_B: Cool, thanks
2014-04-08 23:32:03	<Danny_B>	yw
2014-04-08 23:32:15	<Danny_B>	thanks for care
2014-04-08 23:33:30	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 23:34:31	<logmsgbot>	!log mwalker synchronized php-1.23wmf21/extensions/MultimediaViewer/ 'Updating MultimediaViewer for {{gerrit\|124510}}'
2014-04-08 23:34:35	<morebots>	Logged the message, Master
2014-04-08 23:35:16	<mwalker>	marktraceur, ^ if you would test what you need to test for that
2014-04-08 23:35:26	<mwalker>	I'm not seeing any fatals or exceptions which is good :)
2014-04-08 23:35:31	<icinga-wm>	RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
2014-04-08 23:35:32	<marktraceur>	mwalker: Works
2014-04-08 23:35:32	<marktraceur>	Ta
2014-04-08 23:35:39	<mwalker>	cool; greg-g SWAT done
2014-04-08 23:58:30	<icinga-wm>	PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 179.666672
2014-04-08 23:59:04	<jackmcbarn>	"Firefox can't find the server at en.wikipedia.beta.wmflabs.org."
2014-04-08 23:59:08	<jackmcbarn>	why?
2014-04-08 23:59:14	<grrrit-wm>	('CR') 'Aaron Schulz': [C: ''] Create symlink for compile-wikiversions in /usr/local/bin [operations/puppet] - 'https://gerrit.wikimedia.org/r/124763' (owner: 'BryanDavis')
2014-04-08 23:59:31	<marktraceur>	jackmcbarn: https://bugzilla.wikimedia.org/show_bug.cgi?id=63709 probably

Wikimedia IRC logs browser - #wikimedia-operations