2014-04-08 00:00:35
|
<RoanKattouw>
|
Sure thing
|
2014-04-08 00:00:42
|
<RoanKattouw>
|
I think that's the SWAT all done
|
2014-04-08 00:00:44
|
<RoanKattouw>
|
Sorry for the slowness everyone
|
2014-04-08 00:01:16
|
<bd808>
|
RoanKattouw: If it makes my mailbox less full of debate about font faces...
|
2014-04-08 00:01:36
|
<bd808>
|
is sure that muting those threads will continue
|
2014-04-08 00:02:28
|
<icinga-wm>
|
PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Server Error - 1703 bytes in 7.426 second response time
|
2014-04-08 00:08:52
|
<bd808>
|
looks for a python reviewer for: https://gerrit.wikimedia.org/r/#/c/124500/
|
2014-04-08 00:09:10
|
<bd808>
|
I think that will fix the 1.23wmf21 l10n problems
|
2014-04-08 00:09:30
|
<bd808>
|
Because … mystery action at a distance!
|
2014-04-08 00:12:27
|
<icinga-wm>
|
RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 219118 bytes in 8.455 second response time
|
2014-04-08 00:24:56
|
<logmsgbot>
|
!log catrope synchronized php-1.23wmf20/extensions/VisualEditor 'it helps if you run git submodule update first'
|
2014-04-08 00:25:02
|
<morebots>
|
Logged the message, Master
|
2014-04-08 00:25:05
|
<logmsgbot>
|
!log catrope synchronized php-1.23wmf21/extensions/VisualEditor 'it helps if you run git submodule update first'
|
2014-04-08 00:25:11
|
<morebots>
|
Logged the message, Master
|
2014-04-08 00:27:34
|
<grrrit-wm>
|
('PS1') 'BryanDavis': test2wiki to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124505'
|
2014-04-08 00:28:54
|
<bd808>
|
RoanKattouw_away: Are you {{done}} done now? I'd like to run some more scap tests
|
2014-04-08 00:38:27
|
<grrrit-wm>
|
('Abandoned') 'BryanDavis': l10nupdate: Add temporary debugging captures [operations/puppet] - 'https://gerrit.wikimedia.org/r/124467' (owner: 'BryanDavis')
|
2014-04-08 00:38:40
|
<grrrit-wm>
|
('PS2') 'BryanDavis': test2wiki to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124505'
|
2014-04-08 00:39:44
|
<grrrit-wm>
|
('Abandoned') 'BryanDavis': test2wiki to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124505' (owner: 'BryanDavis')
|
2014-04-08 00:41:34
|
<grrrit-wm>
|
('PS1') 'BryanDavis': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124506'
|
2014-04-08 00:43:55
|
<bd808>
|
greg-g: Are you still on a bus? I'd like to scap group0 to 1.23wmf21 to test my band aid fix. I would be on the hook to revert immediately following if ExtensionMessages looks like it will cause a problem for l10nupdate.
|
2014-04-08 00:44:03
|
<RoanKattouw_away>
|
bd808: Yes, sorry
|
2014-04-08 00:44:43
|
<bd808>
|
RoanKattouw_away: :) thanks. I watched your idle time on tin climb until I felt safe.
|
2014-04-08 00:45:28
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
|
2014-04-08 00:46:57
|
<bd808>
|
decides that greg-g won't have changed his mind in the last 1:30 and proceeds
|
2014-04-08 00:48:38
|
<grrrit-wm>
|
('CR') 'BryanDavis': [C: '2'] "Approving to test band aid fix for ExtensionMessages generation problem. Will revert if ExtensionMessages doesn't look right after scap." [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124506' (owner: 'BryanDavis')
|
2014-04-08 00:48:45
|
<grrrit-wm>
|
('Merged') 'jenkins-bot': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124506' (owner: 'BryanDavis')
|
2014-04-08 00:50:53
|
<logmsgbot>
|
!log bd808 Started scap: group0 to 1.23wmf21 (testing python change for mwversionsinuse)
|
2014-04-08 00:50:58
|
<morebots>
|
Logged the message, Master
|
2014-04-08 00:53:12
|
<bd808>
|
sees l10n cache updating yet again for 1.23wmf21 and loses all confidence in his "fix"
|
2014-04-08 00:53:51
|
<logmsgbot>
|
!log bd808 scap aborted: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (duration: 02m 57s)
|
2014-04-08 00:53:56
|
<morebots>
|
Logged the message, Master
|
2014-04-08 00:54:30
|
<logmsgbot>
|
!log bd808 Started scap: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (again)
|
2014-04-08 00:54:35
|
<morebots>
|
Logged the message, Master
|
2014-04-08 00:54:56
|
<logmsgbot>
|
!log bd808 scap aborted: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (again) (duration: 00m 25s)
|
2014-04-08 00:55:01
|
<morebots>
|
Logged the message, Master
|
2014-04-08 00:55:12
|
<grrrit-wm>
|
('PS1') 'BryanDavis': Revert "Group0 wikis to 1.23wmf21" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124507'
|
2014-04-08 00:55:34
|
<grrrit-wm>
|
('CR') 'BryanDavis': [C: '2'] Revert "Group0 wikis to 1.23wmf21" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124507' (owner: 'BryanDavis')
|
2014-04-08 00:55:42
|
<grrrit-wm>
|
('Merged') 'jenkins-bot': Revert "Group0 wikis to 1.23wmf21" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124507' (owner: 'BryanDavis')
|
2014-04-08 00:56:51
|
<logmsgbot>
|
!log bd808 Started scap: revert group0 to 1.23wmf21 (testwiki still on 1.23wmf21)
|
2014-04-08 00:56:55
|
<morebots>
|
Logged the message, Master
|
2014-04-08 01:01:33
|
<grrrit-wm>
|
('PS3') 'Ori.livneh': Add EventLogging Kafka writer plug-in [operations/puppet] - 'https://gerrit.wikimedia.org/r/85337'
|
2014-04-08 01:06:45
|
<logmsgbot>
|
!log bd808 Finished scap: revert group0 to 1.23wmf21 (testwiki still on 1.23wmf21) (duration: 09m 54s)
|
2014-04-08 01:06:53
|
<morebots>
|
Logged the message, Master
|
2014-04-08 01:22:25
|
<StevenW>
|
ori: working now
|
2014-04-08 01:22:29
|
<StevenW>
|
\o/
|
2014-04-08 02:07:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 02:07:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 02:07:08
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 02:07:08
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 02:15:58
|
<logmsgbot>
|
!log LocalisationUpdate completed (1.23wmf20) at 2014-04-08 02:15:58+00:00
|
2014-04-08 02:16:06
|
<morebots>
|
Logged the message, Master
|
2014-04-08 02:34:57
|
<logmsgbot>
|
!log LocalisationUpdate completed (1.23wmf21) at 2014-04-08 02:34:56+00:00
|
2014-04-08 02:35:02
|
<morebots>
|
Logged the message, Master
|
2014-04-08 02:45:57
|
<icinga-wm>
|
PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
|
2014-04-08 02:48:37
|
<icinga-wm>
|
PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
|
2014-04-08 02:48:57
|
<icinga-wm>
|
RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds
|
2014-04-08 02:49:06
|
<ori>
|
springle_: db1047 has been very sad lately
|
2014-04-08 02:49:27
|
<icinga-wm>
|
RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds
|
2014-04-08 03:00:17
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 03:08:06
|
<bawolff>
|
With 1.23wmf21 not getting deployed to mediawiki.org last thursday, does that mean the deployment schedule for 1.23wmf22 will be off by a week?
|
2014-04-08 03:11:07
|
<logmsgbot>
|
!log LocalisationUpdate ResourceLoader cache refresh completed at Tue Apr 8 03:11:04 UTC 2014 (duration 11m 3s)
|
2014-04-08 03:11:11
|
<morebots>
|
Logged the message, Master
|
2014-04-08 03:31:47
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
|
2014-04-08 03:38:12
|
<aude>
|
greg-g: still around?
|
2014-04-08 03:53:36
|
<aude>
|
greg-g: check your mail
|
2014-04-08 04:03:35
|
<TimStarling>
|
!log upgrading libssl on ssl1001,ssl1002,ssl1003,ssl1004,ssl1005,ssl1006,ssl1007,ssl1008,ssl1009,ssl3001.esams.wikimedia.org,ssl3002.esams.wikimedia.org,ssl3003.esams.wikimedia.org
|
2014-04-08 04:03:41
|
<morebots>
|
Logged the message, Master
|
2014-04-08 04:03:57
|
<Jasper_Deng>
|
TimStarling: is this the heartbleed.com thing?
|
2014-04-08 04:04:07
|
<Jasper_Deng>
|
didn't know we used openssl
|
2014-04-08 04:15:22
|
<TimStarling>
|
Jasper_Deng: yes
|
2014-04-08 04:15:47
|
<TimStarling>
|
!log also upgraded libssl on cp4001-4019. Restarted nginx on these servers and also the previous list.
|
2014-04-08 04:15:51
|
<morebots>
|
Logged the message, Master
|
2014-04-08 04:37:40
|
<Ryan_Lane>
|
!log upgrading libssl on virt1000
|
2014-04-08 04:37:44
|
<morebots>
|
Logged the message, Master
|
2014-04-08 04:38:21
|
<Ryan_Lane>
|
!log upgrading libssl on virt0
|
2014-04-08 04:38:26
|
<morebots>
|
Logged the message, Master
|
2014-04-08 04:41:03
|
<TimStarling>
|
!log upgraded libssl on zirconium.wikimedia.org,neon.wikimedia.org,netmon1001.wikimedia.org,iodine.wikimedia.org,ytterbium.wikimedia.org,gerrit.wikimedia.org,virt1000.wikimedia.org,labs-ns1.wikimedia.org,stat1001.wikimedia.org
|
2014-04-08 04:43:13
|
<TimStarling>
|
!log restarted apache on the above list, failed on labs-ns1, virt1000, ytterbium
|
2014-04-08 04:43:18
|
<morebots>
|
Logged the message, Master
|
2014-04-08 04:43:47
|
<^d>
|
TimStarling: I'll poke ytterbium
|
2014-04-08 04:44:00
|
<^d>
|
Keep moving on to other boxes if you need.
|
2014-04-08 04:44:35
|
<^d>
|
Seems up now.
|
2014-04-08 04:45:04
|
<TimStarling>
|
yeah, labs-ns1 and virt1000 are actually the same server
|
2014-04-08 04:45:19
|
<TimStarling>
|
and apache is running there with stime after the upgrade
|
2014-04-08 04:46:30
|
<TimStarling>
|
!log on dataset1001: upgraded libssl and restarted lighttpd
|
2014-04-08 04:46:34
|
<morebots>
|
Logged the message, Master
|
2014-04-08 04:53:47
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
|
2014-04-08 05:08:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 05:08:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 05:08:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 05:08:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 05:25:10
|
<grrrit-wm>
|
('PS1') 'Aude': Enable Wikibase on Wikiquote [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516'
|
2014-04-08 05:26:24
|
<grrrit-wm>
|
('CR') 'Aude': [C: '-2'] "requires sites and site_identifiers tables to be added and populated on wikiquote" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
|
2014-04-08 05:31:00
|
<_joe_>
|
!log upgraded openssl on cp10* and cp30* servers as well
|
2014-04-08 05:31:06
|
<morebots>
|
Logged the message, Master
|
2014-04-08 05:39:29
|
<apergos>
|
!log restarted apache on fenari magnesium yterrbium antimony
|
2014-04-08 05:39:33
|
<morebots>
|
Logged the message, Master
|
2014-04-08 05:39:51
|
<apergos>
|
with some mispellings but people will get the point
|
2014-04-08 05:47:01
|
<apergos>
|
!log shot many old apache processes running as stats user from 2013, on stat1001 (restarting apache runs it as www-data user)
|
2014-04-08 05:47:06
|
<morebots>
|
Logged the message, Master
|
2014-04-08 06:34:37
|
<grrrit-wm>
|
('PS3') 'Matanya': dataset: fix module path [operations/puppet] - 'https://gerrit.wikimedia.org/r/119212'
|
2014-04-08 06:37:44
|
<grrrit-wm>
|
('PS3') 'Matanya': exim: fix scoping [operations/puppet] - 'https://gerrit.wikimedia.org/r/119496'
|
2014-04-08 06:43:48
|
<matanya>
|
springle: did you hear from otto regarding https://gerrit.wikimedia.org/r/#/c/122406/ ?
|
2014-04-08 06:45:27
|
<springle>
|
matanya: no
|
2014-04-08 06:45:41
|
<matanya>
|
:/ i need to chase him down, thanks
|
2014-04-08 06:46:04
|
<springle>
|
not sure otto knows about it? i emailed analytics lists directly
|
2014-04-08 06:46:29
|
<springle>
|
so far the answer is: probably fine to decom db67, but lets wait for enveryone to chime in
|
2014-04-08 06:46:43
|
<springle>
|
i'll bump it this week
|
2014-04-08 06:47:05
|
<matanya>
|
thank you
|
2014-04-08 07:30:44
|
<grrrit-wm>
|
('PS1') 'Faidon Liambotis': base: add debian-goodies [operations/puppet] - 'https://gerrit.wikimedia.org/r/124524'
|
2014-04-08 07:47:07
|
<_joe|away>
|
!log restarted nginx on cp1044 and cp1043
|
2014-04-08 07:47:12
|
<morebots>
|
Logged the message, Master
|
2014-04-08 07:53:07
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 07:53:07
|
<grrrit-wm>
|
('CR') 'coren': [C: '2'] base: add debian-goodies [operations/puppet] - 'https://gerrit.wikimedia.org/r/124524' (owner: 'Faidon Liambotis')
|
2014-04-08 08:02:57
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 08:09:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 08:09:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 08:09:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 08:09:07
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 08:11:47
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 08:15:17
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
|
2014-04-08 08:36:30
|
<siebrand>
|
ori: still working?
|
2014-04-08 09:03:47
|
<icinga-wm>
|
PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
|
2014-04-08 09:04:07
|
<YuviPanda>
|
hashar: help with setting up zuul for the apps? https://gerrit.wikimedia.org/r/#/c/124539/
|
2014-04-08 09:08:37
|
<icinga-wm>
|
PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
|
2014-04-08 09:08:47
|
<icinga-wm>
|
RECOVERY - RAID on labstore3 is OK: OK: optimal, 12 logical, 12 physical
|
2014-04-08 09:08:57
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
|
2014-04-08 09:11:47
|
<icinga-wm>
|
PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
|
2014-04-08 09:16:55
|
<grrrit-wm>
|
('PS1') 'RobH': Replacing the unified certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124542'
|
2014-04-08 09:24:34
|
<grrrit-wm>
|
('CR') 'RobH': [C: '2'] Replacing the unified certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124542' (owner: 'RobH')
|
2014-04-08 09:29:47
|
<icinga-wm>
|
RECOVERY - RAID on labstore3 is OK: OK: optimal, 12 logical, 12 physical
|
2014-04-08 09:33:47
|
<icinga-wm>
|
PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
|
2014-04-08 09:36:37
|
<icinga-wm>
|
RECOVERY - RAID on labstore3 is OK: OK: optimal, 12 logical, 12 physical
|
2014-04-08 09:37:37
|
<icinga-wm>
|
RECOVERY - Disk space on labstore3 is OK: DISK OK
|
2014-04-08 09:39:19
|
<hashar>
|
YuviPanda: hello
|
2014-04-08 09:39:25
|
<YuviPanda>
|
hashar: hello!
|
2014-04-08 09:40:00
|
<icinga-wm>
|
PROBLEM - RAID on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
|
2014-04-08 09:40:37
|
<icinga-wm>
|
PROBLEM - Disk space on labstore3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
|
2014-04-08 09:40:57
|
<grrrit-wm>
|
('PS1') 'Andrew Bogott': Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560'
|
2014-04-08 09:44:12
|
<hashar>
|
and we lost YuviPanda
|
2014-04-08 09:45:10
|
<sjoerddebruin>
|
Noooo not our panda. :(
|
2014-04-08 09:46:25
|
<Steinsplitter>
|
panda \O/
|
2014-04-08 09:46:28
|
<icinga-wm>
|
PROBLEM - SSH on labstore3 is CRITICAL: Connection refused
|
2014-04-08 09:46:28
|
<icinga-wm>
|
PROBLEM - DPKG on labstore3 is CRITICAL: Connection refused by host
|
2014-04-08 09:46:47
|
<icinga-wm>
|
PROBLEM - puppet disabled on labstore3 is CRITICAL: Connection refused by host
|
2014-04-08 09:47:00
|
<andrewbogott>
|
mutante: https://gerrit.wikimedia.org/r/#/c/124560/
|
2014-04-08 09:47:43
|
<icinga-wm>
|
ACKNOWLEDGEMENT - DPKG on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
|
2014-04-08 09:47:44
|
<icinga-wm>
|
ACKNOWLEDGEMENT - Disk space on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
|
2014-04-08 09:47:44
|
<icinga-wm>
|
ACKNOWLEDGEMENT - RAID on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
|
2014-04-08 09:47:44
|
<icinga-wm>
|
ACKNOWLEDGEMENT - SSH on labstore3 is CRITICAL: Connection refused daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
|
2014-04-08 09:47:44
|
<icinga-wm>
|
ACKNOWLEDGEMENT - puppet disabled on labstore3 is CRITICAL: Connection refused by host daniel_zahn will be decomed - The acknowledgement expires at: 2014-04-09 09:46:44.
|
2014-04-08 09:49:57
|
<matanya>
|
so nice to see all ops in an europian time zone :)
|
2014-04-08 09:50:37
|
<icinga-wm>
|
PROBLEM - Host labstore3 is DOWN: PING CRITICAL - Packet loss = 100%
|
2014-04-08 09:57:12
|
<grrrit-wm>
|
('CR') 'Dzahn': [C: '-1'] Add eth1 checks to nova compute hosts. ('3' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560' (owner: 'Andrew Bogott')
|
2014-04-08 10:00:49
|
<springle>
|
ori: what is udpprofile::collector, and can i move it from db1014 to... somewhere else?
|
2014-04-08 10:02:47
|
<ori>
|
springle: oh, wow. is there any indication that continues to see activity? mediawiki's profiler class can be configured to write to a database, but i didn't know anyone was using it in production. is it not ancient?
|
2014-04-08 10:04:56
|
<andrewbogott>
|
mutante, cmjohnson: https://wikitech.wikimedia.org/wiki/Help:Git_rebase#Don.27t_panic
|
2014-04-08 10:05:21
|
<thedj>
|
andrewbogott: 42
|
2014-04-08 10:05:57
|
<ori>
|
springle: it can go away
|
2014-04-08 10:06:34
|
<ori>
|
springle: it was added in this commit: <https://gerrit.wikimedia.org/r/#/c/83953/>;. the message reads: "testing graphite 0.910 on db1014".
|
2014-04-08 10:07:04
|
<springle>
|
yeah, asher stole db1014 for graphite
|
2014-04-08 10:07:12
|
<springle>
|
trying to steal it back :)
|
2014-04-08 10:07:20
|
<springle>
|
ori: thanks
|
2014-04-08 10:07:46
|
<ori>
|
springle: it's not in any way implicated in our current graphite setup, which exists solely on tungsten.eqiad.wmnet (and labs)
|
2014-04-08 10:08:13
|
<grrrit-wm>
|
('PS2') 'Andrew Bogott': Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560'
|
2014-04-08 10:08:18
|
<andrewbogott>
|
mutante: ^
|
2014-04-08 10:09:24
|
<grrrit-wm>
|
('PS1') 'Cmjohnson': adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572'
|
2014-04-08 10:11:07
|
<grrrit-wm>
|
('CR') 'jenkins-bot': [V: '-1'] adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
|
2014-04-08 10:12:49
|
<grrrit-wm>
|
('CR') 'Dzahn': [C: ''] Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560' (owner: 'Andrew Bogott')
|
2014-04-08 10:15:34
|
<Jeff_Green>
|
!log update & reboot samarium
|
2014-04-08 10:15:38
|
<morebots>
|
Logged the message, Master
|
2014-04-08 10:15:48
|
<grrrit-wm>
|
('CR') 'Andrew Bogott': [C: '2'] Add eth1 checks to nova compute hosts. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124560' (owner: 'Andrew Bogott')
|
2014-04-08 10:16:26
|
<grrrit-wm>
|
('PS1') 'Springle': Remove unused db1014 block. db1014 was renamed tungsten rt5871. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124575'
|
2014-04-08 10:18:19
|
<grrrit-wm>
|
('CR') 'Springle': [C: '2'] Remove unused db1014 block. db1014 was renamed tungsten rt5871. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124575' (owner: 'Springle')
|
2014-04-08 10:21:04
|
<Jeff_Green>
|
!log update & reboot barium
|
2014-04-08 10:21:09
|
<morebots>
|
Logged the message, Master
|
2014-04-08 10:23:09
|
<grrrit-wm>
|
('PS1') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
|
2014-04-08 10:24:10
|
<grrrit-wm>
|
('CR') 'jenkins-bot': [V: '-1'] add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
|
2014-04-08 11:09:28
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 11:09:28
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 11:09:28
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 11:09:28
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 11:32:05
|
<grrrit-wm>
|
('PS20') 'Matanya': etherpad: convert into a module [operations/puppet] - 'https://gerrit.wikimedia.org/r/107567'
|
2014-04-08 11:32:32
|
<matanya>
|
akosiaris: in a meeting or this ^ can be handled ?
|
2014-04-08 11:39:18
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
|
2014-04-08 12:32:58
|
<grrrit-wm>
|
('PS2') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
|
2014-04-08 12:39:13
|
<akosiaris>
|
matanya: in ops meeting
|
2014-04-08 12:39:19
|
<matanya>
|
sorry
|
2014-04-08 12:39:27
|
<akosiaris>
|
and please tell me you did not resubmit from your local repo
|
2014-04-08 12:39:48
|
<akosiaris>
|
rebase* sorry
|
2014-04-08 12:39:50
|
<grrrit-wm>
|
('PS2') 'Cmjohnson': adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572'
|
2014-04-08 12:40:26
|
<grrrit-wm>
|
('CR') 'Andrew Bogott': [V: ''] "This looks good -- we'll see if it makes new alarms go off :)" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
|
2014-04-08 12:46:38
|
<grrrit-wm>
|
('PS3') 'Cmjohnson': adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572'
|
2014-04-08 12:48:28
|
<icinga-wm>
|
PROBLEM - DPKG on strontium is CRITICAL: DPKG CRITICAL dpkg reports broken packages
|
2014-04-08 12:49:28
|
<icinga-wm>
|
RECOVERY - DPKG on strontium is OK: All packages OK
|
2014-04-08 12:49:35
|
<grrrit-wm>
|
('CR') 'Matanya': [C: ''] add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
|
2014-04-08 12:50:21
|
<cmjohnson1>
|
paravoid: can you review please https://gerrit.wikimedia.org/r/124572
|
2014-04-08 12:50:38
|
<andrewbogott>
|
mutante: https://rt.wikimedia.org/Ticket/Display.html?id=5064
|
2014-04-08 12:51:29
|
<grrrit-wm>
|
('CR') 'Dzahn': [C: ''] "yep, if we want to monitor this on everything, then standard-packages sounds good to me" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
|
2014-04-08 12:52:38
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
|
2014-04-08 12:53:10
|
<grrrit-wm>
|
('CR') 'Alexandros Kosiaris': [C: '2'] adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
|
2014-04-08 12:55:34
|
<manybubbles>
|
can anyone around update Elasticsearch in apt?
|
2014-04-08 12:55:55
|
<manybubbles>
|
and ack nagios errors (so they don't spam to irc) for a couple horus?
|
2014-04-08 12:56:39
|
<logmsgbot>
|
!log reedy updated /a/common to {{Gerrit|Id15ddc665}}: Revert "Group0 wikis to 1.23wmf21"
|
2014-04-08 12:56:44
|
<morebots>
|
Logged the message, Master
|
2014-04-08 12:57:23
|
<grrrit-wm>
|
('PS1') 'Reedy': Non wikipedias to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124591'
|
2014-04-08 12:59:03
|
<Reedy>
|
pokes qchris_away and ^d
|
2014-04-08 13:01:42
|
<Reedy>
|
Any idea why https://gerrit.wikimedia.org/changes/?q=status:merged+age%3A0d&o=DETAILED_ACCOUNTS&n=100 doesn't work?
|
2014-04-08 13:02:00
|
<grrrit-wm>
|
('CR') 'Cmjohnson': [C: '2'] adding ethtool to standard-packages.pp to be able to monitor interface speed [operations/puppet] - 'https://gerrit.wikimedia.org/r/124572' (owner: 'Cmjohnson')
|
2014-04-08 13:03:24
|
<Reedy>
|
versus
|
2014-04-08 13:03:24
|
<Reedy>
|
http://review.cyanogenmod.org/changes/?q=status:open+age%3A0d&o=DETAILED_ACCOUNTS&n=100
|
2014-04-08 13:07:41
|
<grrrit-wm>
|
('PS3') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
|
2014-04-08 13:12:48
|
<grrrit-wm>
|
('PS4') 'Dzahn': add nrpe to base [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576'
|
2014-04-08 13:15:18
|
<apergos>
|
test
|
2014-04-08 13:15:42
|
<apergos>
|
test akosiaris
|
2014-04-08 13:15:43
|
<akosiaris>
|
apergos: :-)
|
2014-04-08 13:15:51
|
<apergos>
|
manybubbles:
|
2014-04-08 13:16:54
|
<mutante>
|
already pinged
|
2014-04-08 13:17:06
|
<grrrit-wm>
|
('PS1') 'coren': Tool Labs: forcibly upgrade libssl [operations/puppet] - 'https://gerrit.wikimedia.org/r/124594'
|
2014-04-08 13:19:25
|
<grrrit-wm>
|
('CR') 'Dzahn': [C: '2'] "RT #80 :)" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124576' (owner: 'Dzahn')
|
2014-04-08 13:21:58
|
<_joe_>
|
ori: If you're here, please let me know :)
|
2014-04-08 13:26:57
|
<Reedy>
|
_joe_: Couple of hours from now
|
2014-04-08 13:27:05
|
<Reedy>
|
Though, he is around early sometimes
|
2014-04-08 13:27:31
|
<_joe_>
|
Reedy: thanks
|
2014-04-08 13:30:38
|
<grrrit-wm>
|
('CR') 'RobH': [C: ''] Tool Labs: forcibly upgrade libssl [operations/puppet] - 'https://gerrit.wikimedia.org/r/124594' (owner: 'coren')
|
2014-04-08 13:31:20
|
<manybubbles>
|
ottomata: welcome!
|
2014-04-08 13:31:34
|
<manybubbles>
|
can you help me get started today?
|
2014-04-08 13:31:42
|
<grrrit-wm>
|
('CR') 'coren': [C: '2'] Tool Labs: forcibly upgrade libssl [operations/puppet] - 'https://gerrit.wikimedia.org/r/124594' (owner: 'coren')
|
2014-04-08 13:31:50
|
<Reedy>
|
manybubbles: We have an extension for that
|
2014-04-08 13:31:51
|
<Reedy>
|
grins
|
2014-04-08 13:31:57
|
<manybubbles>
|
Reedy: thanks!
|
2014-04-08 13:32:01
|
<manybubbles>
|
I totally used it a while ago
|
2014-04-08 13:32:27
|
<qchris_away>
|
Reedy: Because we're using /r/ to mark the reverse proxy ...
|
2014-04-08 13:32:33
|
<qchris_away>
|
Reedy: https://gerrit.wikimedia.org/r/changes/?q=status:merged+age%3A0d&o=DETAILED_ACCOUNTS&n=100
|
2014-04-08 13:32:37
|
<qchris_away>
|
Reedy: ^ should work
|
2014-04-08 13:32:47
|
<Reedy>
|
Aha, sweet!
|
2014-04-08 13:33:43
|
<grrrit-wm>
|
('PS1') 'RobH': replace blog.wikimedia.org certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124595'
|
2014-04-08 13:35:07
|
<manybubbles>
|
ottomata: I need Elasticsearch 1.1.0 shoved into apt
|
2014-04-08 13:35:37
|
<grrrit-wm>
|
('PS2') 'RobH': replace blog.wikimedia.org certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124595'
|
2014-04-08 13:36:15
|
<Reedy>
|
qchris: thanks
|
2014-04-08 13:36:22
|
<qchris>
|
yw
|
2014-04-08 13:37:04
|
<icinga-wm>
|
PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds
|
2014-04-08 13:37:33
|
<mutante>
|
!log restarting gitblit
|
2014-04-08 13:37:33
|
<grrrit-wm>
|
('CR') 'RobH': [C: '2'] replace blog.wikimedia.org certificate [operations/puppet] - 'https://gerrit.wikimedia.org/r/124595' (owner: 'RobH')
|
2014-04-08 13:37:37
|
<morebots>
|
Logged the message, Master
|
2014-04-08 13:39:00
|
<RobH>
|
!log replacing the blog cert, if holmium crashes I didn't do it correctly.
|
2014-04-08 13:39:01
|
<grrrit-wm>
|
('PS1') 'Faidon Liambotis': Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597'
|
2014-04-08 13:39:03
|
<ottomata>
|
manybubbles: ok!
|
2014-04-08 13:39:03
|
<morebots>
|
Logged the message, RobH
|
2014-04-08 13:39:04
|
<icinga-wm>
|
RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 305803 bytes in 9.337 second response time
|
2014-04-08 13:39:08
|
<manybubbles>
|
thanks!
|
2014-04-08 13:39:28
|
<Jeff_Green>
|
!log update & reboot tellurium
|
2014-04-08 13:39:33
|
<morebots>
|
Logged the message, Master
|
2014-04-08 13:39:47
|
<grrrit-wm>
|
('CR') 'jenkins-bot': [V: '-1'] Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597' (owner: 'Faidon Liambotis')
|
2014-04-08 13:41:14
|
<icinga-wm>
|
PROBLEM - Host tellurium is DOWN: PING CRITICAL - Packet loss = 100%
|
2014-04-08 13:42:38
|
<grrrit-wm>
|
('PS2') 'Faidon Liambotis': Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597'
|
2014-04-08 13:43:27
|
<grrrit-wm>
|
('CR') 'Faidon Liambotis': [C: '2' V: '2'] Revert "Giving Nik shell access to analytics1004 to do some elasticsearch load testing" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597' (owner: 'Faidon Liambotis')
|
2014-04-08 13:44:28
|
<grrrit-wm>
|
('CR') 'Manybubbles': "Is there a better place to run this?" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124597' (owner: 'Faidon Liambotis')
|
2014-04-08 13:45:14
|
<icinga-wm>
|
RECOVERY - Host tellurium is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms
|
2014-04-08 13:46:13
|
<RobH>
|
!log upgraded libssl on holmium
|
2014-04-08 13:46:18
|
<morebots>
|
Logged the message, RobH
|
2014-04-08 13:48:49
|
<paravoid>
|
ottomata: kafka upgrade doesn't work on an1004
|
2014-04-08 13:49:41
|
<ottomata>
|
paravoid, analytics1004 (and analytics1003) were kafka test brokers, and were never productionized or puppetized
|
2014-04-08 13:49:50
|
<ottomata>
|
i thought I had removed kafka from analytics1004, actually
|
2014-04-08 13:50:38
|
<manybubbles>
|
ottomata: can you install git fat on tin?
|
2014-04-08 13:50:42
|
<manybubbles>
|
I cannot
|
2014-04-08 13:50:46
|
<ottomata>
|
hm, sure, why do you need git-fat there?
|
2014-04-08 13:50:55
|
<manybubbles>
|
to git deploy
|
2014-04-08 13:50:58
|
<manybubbles>
|
to Elasticsearch
|
2014-04-08 13:51:07
|
<manybubbles>
|
the plugins
|
2014-04-08 13:51:14
|
<manybubbles>
|
or is there another server
|
2014-04-08 13:51:17
|
<ottomata>
|
you don't need git-fat on tin though
|
2014-04-08 13:51:23
|
<ottomata>
|
the git-fat commands are run on deplo hsots
|
2014-04-08 13:51:27
|
<ottomata>
|
on the targets
|
2014-04-08 13:51:46
|
<manybubbles>
|
huh, I'm used to running it on the server to check the jars got there. I'll just do it without and see
|
2014-04-08 13:53:21
|
<manybubbles>
|
ottomata: that worked as you said it would
|
2014-04-08 13:53:35
|
<manybubbles>
|
!log synced first Elasticsearch plugin to production Elasticsearch servers
|
2014-04-08 13:53:39
|
<morebots>
|
Logged the message, Master
|
2014-04-08 13:54:01
|
<manybubbles>
|
!log they'll pick it up during the rolling restart today to upgrade to 1.1.0
|
2014-04-08 13:54:05
|
<morebots>
|
Logged the message, Master
|
2014-04-08 13:54:08
|
<ottomata>
|
cool
|
2014-04-08 13:54:18
|
<ottomata>
|
manybubbles: , i was going to start reinstalling an elastic search server today
|
2014-04-08 13:54:33
|
<manybubbles>
|
ottomata: not a _great_ day for it
|
2014-04-08 13:54:37
|
<manybubbles>
|
because I'm upgrading to 1.1.0
|
2014-04-08 13:54:43
|
<ottomata>
|
ok
|
2014-04-08 13:54:45
|
<manybubbles>
|
that is on the deployment calendar and everything
|
2014-04-08 13:55:05
|
<manybubbles>
|
maybe tomorrow?
|
2014-04-08 13:57:09
|
<ottomata>
|
sure
|
2014-04-08 14:04:07
|
<manybubbles>
|
ottomata: please ping me when you get a chance to update apt
|
2014-04-08 14:04:35
|
<ottomata>
|
i was about to to do it, but am in standup now
|
2014-04-08 14:04:36
|
<ottomata>
|
um
|
2014-04-08 14:04:41
|
<ottomata>
|
q for akosiaris, if you are around
|
2014-04-08 14:04:54
|
<ottomata>
|
I should change VerifyRelease, right?
|
2014-04-08 14:04:54
|
<icinga-wm>
|
PROBLEM - DPKG on labstore4 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
|
2014-04-08 14:04:59
|
<ottomata>
|
i'm trying to find the right thing to change it to
|
2014-04-08 14:05:14
|
<ottomata>
|
i downloaded 1.1's Release.gpg and am doing what the reprepro man page says to do
|
2014-04-08 14:05:17
|
<ottomata>
|
but am not sure
|
2014-04-08 14:05:23
|
<ottomata>
|
the output doesn't look like what you have
|
2014-04-08 14:05:54
|
<icinga-wm>
|
RECOVERY - DPKG on labstore4 is OK: All packages OK
|
2014-04-08 14:09:44
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 14:09:44
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 14:09:44
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 14:09:44
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 14:11:17
|
<grrrit-wm>
|
('PS1') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
|
2014-04-08 14:18:13
|
<grrrit-wm>
|
('PS2') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
|
2014-04-08 14:19:21
|
<grrrit-wm>
|
('PS1') 'Ottomata': reprepro/updates - upgrading elasticsearch to 1.1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124603'
|
2014-04-08 14:20:08
|
<grrrit-wm>
|
('CR') 'Ottomata': [C: '2' V: '2'] reprepro/updates - upgrading elasticsearch to 1.1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124603' (owner: 'Ottomata')
|
2014-04-08 14:23:54
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1002 is CRITICAL: Connection refused
|
2014-04-08 14:24:06
|
<ottomata>
|
manybubbles: http://apt.wikimedia.org/wikimedia/pool/main/e/elasticsearch/
|
2014-04-08 14:24:09
|
<ottomata>
|
look ok?
|
2014-04-08 14:28:54
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1002 is OK: OK - Certificate will expire on 01/20/2016 12:00.
|
2014-04-08 14:29:45
|
<manybubbles>
|
ottomata: looks good - let me try elastic1001
|
2014-04-08 14:30:35
|
<grrrit-wm>
|
('PS3') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
|
2014-04-08 14:30:57
|
<andrewbogott>
|
mutante, ^ pls?
|
2014-04-08 14:31:37
|
<manybubbles>
|
!log upgrading elastic1001
|
2014-04-08 14:31:42
|
<morebots>
|
Logged the message, Master
|
2014-04-08 14:32:38
|
<manybubbles>
|
!log woops, just restarted elastic1002. silly me
|
2014-04-08 14:32:42
|
<morebots>
|
Logged the message, Master
|
2014-04-08 14:32:46
|
<manybubbles>
|
!log no harm done, just lost time
|
2014-04-08 14:32:50
|
<morebots>
|
Logged the message, Master
|
2014-04-08 14:33:53
|
<manybubbles>
|
ottomata: can you make nagios not bother us about Elasticsearch warning over the next few hours?
|
2014-04-08 14:33:56
|
<manybubbles>
|
I'm paying attention
|
2014-04-08 14:34:25
|
<ottomata>
|
uh hm
|
2014-04-08 14:35:43
|
<ottomata>
|
i think so, how long manybubbles
|
2014-04-08 14:35:45
|
<ottomata>
|
4 hours?
|
2014-04-08 14:35:48
|
<manybubbles>
|
sure!
|
2014-04-08 14:36:14
|
<icinga-wm>
|
PROBLEM - NTP peers on linne is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown
|
2014-04-08 14:38:14
|
<icinga-wm>
|
RECOVERY - NTP peers on linne is OK: NTP OK: Offset 0.016747 secs
|
2014-04-08 14:44:43
|
<mutante>
|
andrewbogott: https://gerrit.wikimedia.org/r/#/c/77332/7/modules/base/manifests/monitoring/host.pp
|
2014-04-08 14:44:51
|
<grrrit-wm>
|
('PS4') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
|
2014-04-08 14:54:18
|
<grrrit-wm>
|
('PS5') 'Andrew Bogott': Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601'
|
2014-04-08 14:54:59
|
<grrrit-wm>
|
('PS3') 'Cmjohnson': add interface speed check for all hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124606'
|
2014-04-08 15:01:42
|
<cmjohnson>
|
mutante: can you review https://gerrit.wikimedia.org/r/124606
|
2014-04-08 15:02:06
|
<grrrit-wm>
|
('CR') 'Alexandros Kosiaris': [C: '-1'] "Great idea. Minor stuff here and there like making it parameterizable but looks nice." ('6' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124606' (owner: 'Cmjohnson')
|
2014-04-08 15:03:10
|
<ottomata>
|
manybubbles: i think I just scheduled downtime in icinga for elastic search for the next ~4 hours
|
2014-04-08 15:03:19
|
<ottomata>
|
never done that before, so not sure what it will do
|
2014-04-08 15:03:47
|
<grrrit-wm>
|
('PS1') 'Rush': module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608'
|
2014-04-08 15:04:54
|
<manybubbles>
|
ottomata: its cool!
|
2014-04-08 15:04:56
|
<manybubbles>
|
thanks
|
2014-04-08 15:07:45
|
<grrrit-wm>
|
('CR') 'Ottomata': module to manage new python-diamond package ('5' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608' (owner: 'Rush')
|
2014-04-08 15:08:18
|
<grrrit-wm>
|
('CR') 'Dzahn': [C: ''] Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601' (owner: 'Andrew Bogott')
|
2014-04-08 15:12:34
|
<grrrit-wm>
|
('PS2') 'Rush': module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608'
|
2014-04-08 15:13:35
|
<grrrit-wm>
|
('CR') 'jenkins-bot': [V: '-1'] module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608' (owner: 'Rush')
|
2014-04-08 15:15:36
|
<grrrit-wm>
|
('PS3') 'Rush': module to manage new python-diamond package [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608'
|
2014-04-08 15:16:34
|
<icinga-wm>
|
PROBLEM - Host virt1000 is DOWN: CRITICAL - Host Unreachable (208.80.154.18)
|
2014-04-08 15:16:42
|
<RobH>
|
!log all ssl servers in eqiad have been updated with new cert and restarted
|
2014-04-08 15:16:51
|
<RobH>
|
!log rolling updates on ssl3001-3003 presently
|
2014-04-08 15:17:10
|
<grrrit-wm>
|
('PS1') 'Dzahn': enable base monitoring for ALL hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124609'
|
2014-04-08 15:17:24
|
<icinga-wm>
|
PROBLEM - Host labs-ns1.wikimedia.org is DOWN: CRITICAL - Host Unreachable (208.80.154.19)
|
2014-04-08 15:18:04
|
<icinga-wm>
|
RECOVERY - Host virt1000 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms
|
2014-04-08 15:19:03
|
<grrrit-wm>
|
('CR') 'Andrew Bogott': [C: '2'] Install and use check_ssl_cert tool to validate certs. [operations/puppet] - 'https://gerrit.wikimedia.org/r/124601' (owner: 'Andrew Bogott')
|
2014-04-08 15:19:04
|
<icinga-wm>
|
RECOVERY - Host labs-ns1.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.98 ms
|
2014-04-08 15:19:07
|
<mutante>
|
apergos: https://gerrit.wikimedia.org/r/#/c/124609/1
|
2014-04-08 15:19:46
|
<mutante>
|
ugly, eh.. since i have to change all those lines because of indentation :p
|
2014-04-08 15:22:25
|
<grrrit-wm>
|
('CR') 'ArielGlenn': [C: ''] enable base monitoring for ALL hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124609' (owner: 'Dzahn')
|
2014-04-08 15:22:39
|
<grrrit-wm>
|
('CR') 'Dzahn': [C: '2'] enable base monitoring for ALL hosts [operations/puppet] - 'https://gerrit.wikimedia.org/r/124609' (owner: 'Dzahn')
|
2014-04-08 15:23:46
|
<grrrit-wm>
|
('CR') 'Ottomata': module to manage new python-diamond package ('2' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124608' (owner: 'Rush')
|
2014-04-08 15:27:31
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4009 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:41
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl3003 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:41
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1006 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:41
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4014 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:51
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1004 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:51
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1005 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:51
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4008 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:51
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4004 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:51
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4015 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:52
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4001 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:52
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4017 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:53
|
<icinga-wm>
|
PROBLEM - HTTPS on amssq47 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:53
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1002 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:54
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1001 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:54
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4005 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:27:55
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4012 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:01
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4016 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:01
|
<icinga-wm>
|
PROBLEM - HTTPS on sodium is CRITICAL: SSL_CERT CRITICAL lists.wikimedia.org: invalid CN (lists.wikimedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:11
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1007 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:11
|
<icinga-wm>
|
PROBLEM - HTTPS on iodine is CRITICAL: SSL_CERT CRITICAL ticket.wikimedia.org: invalid CN (ticket.wikimedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:11
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl3002 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:11
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl3001 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:11
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4018 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:12
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1008 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:12
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1009 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:13
|
<icinga-wm>
|
PROBLEM - HTTPS on ssl1003 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:13
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4013 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:14
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4003 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:14
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4007 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:15
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4011 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:15
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4010 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:21
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4020 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:21
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4006 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:31
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4002 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:28:31
|
<icinga-wm>
|
PROBLEM - HTTPS on cp4019 is CRITICAL: SSL_CERT CRITICAL *.wikipedia.org: invalid CN (*.wikipedia.org does not match *.wikimedia.org)
|
2014-04-08 15:30:02
|
<greg-g>
|
holy fun :)
|
2014-04-08 15:30:37
|
<aude>
|
:o
|
2014-04-08 15:32:08
|
<greg-g>
|
aude: getting to your email :)
|
2014-04-08 15:32:13
|
<aude>
|
ok
|
2014-04-08 15:32:25
|
<aude>
|
want to see if it's ok to do today
|
2014-04-08 15:32:35
|
<aude>
|
anytime works for us, i suppose
|
2014-04-08 15:34:45
|
<greg-g>
|
aude: tl;dr of email: yep, looks good
|
2014-04-08 15:34:50
|
<aude>
|
ok
|
2014-04-08 15:35:07
|
<aude>
|
we were smart to put i18n stuff a while ago :)
|
2014-04-08 15:35:42
|
<icinga-wm>
|
PROBLEM - RAID on holmium is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded)
|
2014-04-08 15:35:52
|
<icinga-wm>
|
PROBLEM - DPKG on fenari is CRITICAL: NRPE: Command check_dpkg not defined
|
2014-04-08 15:36:01
|
<andrewbogott>
|
the https failures are me muching with monitoring, nothing to worry about
|
2014-04-08 15:36:02
|
<icinga-wm>
|
PROBLEM - Disk space on fenari is CRITICAL: NRPE: Command check_disk_space not defined
|
2014-04-08 15:36:12
|
<icinga-wm>
|
PROBLEM - RAID on fenari is CRITICAL: NRPE: Command check_raid not defined
|
2014-04-08 15:36:22
|
<icinga-wm>
|
PROBLEM - puppet disabled on fenari is CRITICAL: NRPE: Command check_puppet_disabled not defined
|
2014-04-08 15:36:57
|
<hashar>
|
mutante: fenari is not happy :-D
|
2014-04-08 15:38:21
|
<mutante>
|
hashar: thanks, that's cause we just added more monitoring
|
2014-04-08 15:38:33
|
<mutante>
|
RT #80 :)
|
2014-04-08 15:38:48
|
<hashar>
|
mutante: yeah I noticed your puppet change. Guess fenari is missing some bits
|
2014-04-08 15:41:12
|
<mutante>
|
hashar: wasn't running nagios-nrpe-server
|
2014-04-08 15:41:52
|
<mutante>
|
greg-g: re: SSL certs, andrewbogott is on that one
|
2014-04-08 15:41:57
|
<mutante>
|
ops monitoring sprint over here
|
2014-04-08 15:42:11
|
<greg-g>
|
mutante: ahh, good to know who's on point for that, thanks
|
2014-04-08 15:42:23
|
<greg-g>
|
wasn't sure if it'd be a opsen party thing or not
|
2014-04-08 15:42:44
|
<mutante>
|
it is. ops in Athens
|
2014-04-08 15:43:05
|
<mutante>
|
that check is new, in that it checks for validity of cert, not just expiry
|
2014-04-08 15:43:18
|
<mutante>
|
and wikimedia vs. wikipedia thing
|
2014-04-08 15:43:30
|
<greg-g>
|
nods
|
2014-04-08 15:44:52
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 438.266663
|
2014-04-08 15:45:02
|
<grrrit-wm>
|
('PS1') 'Andrew Bogott': When checking unified certs, check for *.wikipedia.org [operations/puppet] - 'https://gerrit.wikimedia.org/r/124616'
|
2014-04-08 15:45:32
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 434.533325
|
2014-04-08 15:46:21
|
<grrrit-wm>
|
('CR') 'Andrew Bogott': [C: '2'] When checking unified certs, check for *.wikipedia.org [operations/puppet] - 'https://gerrit.wikimedia.org/r/124616' (owner: 'Andrew Bogott')
|
2014-04-08 15:46:22
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 12:45:20 PM UTC
|
2014-04-08 15:53:10
|
<icinga-wm>
|
RECOVERY - RAID on fenari is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0
|
2014-04-08 15:53:17
|
<mutante>
|
hashar: ^ :)
|
2014-04-08 15:53:20
|
<icinga-wm>
|
RECOVERY - puppet disabled on fenari is OK: OK
|
2014-04-08 15:53:26
|
<hashar>
|
nice
|
2014-04-08 15:53:40
|
<icinga-wm>
|
RECOVERY - Disk space on fenari is OK: DISK OK
|
2014-04-08 15:53:41
|
<mutante>
|
RT #80 ftw
|
2014-04-08 15:53:48
|
<andrewbogott>
|
With any luck there'll be another flood of OKs in a minute...
|
2014-04-08 15:53:50
|
<icinga-wm>
|
RECOVERY - DPKG on fenari is OK: All packages OK
|
2014-04-08 15:54:10
|
<icinga-wm>
|
PROBLEM - puppet disabled on bast1001 is CRITICAL: NRPE: Command check_puppet_disabled not defined
|
2014-04-08 15:54:10
|
<icinga-wm>
|
PROBLEM - Disk space on cp3003 is CRITICAL: NRPE: Command check_disk_space not defined
|
2014-04-08 15:54:10
|
<icinga-wm>
|
PROBLEM - Disk space on dobson is CRITICAL: Connection refused by host
|
2014-04-08 15:54:10
|
<icinga-wm>
|
PROBLEM - DPKG on pdf2 is CRITICAL: Connection refused by host
|
2014-04-08 15:54:20
|
<icinga-wm>
|
PROBLEM - puppet disabled on iron is CRITICAL: NRPE: Command check_puppet_disabled not defined
|
2014-04-08 15:54:20
|
<icinga-wm>
|
PROBLEM - RAID on dobson is CRITICAL: Connection refused by host
|
2014-04-08 15:54:20
|
<icinga-wm>
|
PROBLEM - RAID on cp3003 is CRITICAL: NRPE: Command check_raid not defined
|
2014-04-08 15:54:20
|
<icinga-wm>
|
PROBLEM - Disk space on pdf2 is CRITICAL: Connection refused by host
|
2014-04-08 15:54:30
|
<icinga-wm>
|
PROBLEM - puppet disabled on dobson is CRITICAL: Connection refused by host
|
2014-04-08 15:54:30
|
<icinga-wm>
|
PROBLEM - RAID on pdf2 is CRITICAL: Connection refused by host
|
2014-04-08 15:54:30
|
<icinga-wm>
|
PROBLEM - DPKG on iodine is CRITICAL: NRPE: Command check_dpkg not defined
|
2014-04-08 15:54:30
|
<icinga-wm>
|
PROBLEM - puppet disabled on pdf2 is CRITICAL: Connection refused by host
|
2014-04-08 15:54:40
|
<icinga-wm>
|
PROBLEM - Disk space on iodine is CRITICAL: NRPE: Command check_disk_space not defined
|
2014-04-08 15:54:40
|
<icinga-wm>
|
PROBLEM - puppet disabled on cp3003 is CRITICAL: NRPE: Command check_puppet_disabled not defined
|
2014-04-08 15:54:40
|
<icinga-wm>
|
PROBLEM - DPKG on pdf3 is CRITICAL: Connection refused by host
|
2014-04-08 15:54:48
|
<andrewbogott>
|
that's not what I meant
|
2014-04-08 15:54:50
|
<icinga-wm>
|
PROBLEM - RAID on iodine is CRITICAL: NRPE: Command check_raid not defined
|
2014-04-08 15:54:50
|
<icinga-wm>
|
PROBLEM - Disk space on pdf3 is CRITICAL: Connection refused by host
|
2014-04-08 15:54:50
|
<icinga-wm>
|
PROBLEM - DPKG on tridge is CRITICAL: NRPE: Command check_dpkg not defined
|
2014-04-08 15:54:50
|
<icinga-wm>
|
PROBLEM - DPKG on bast1001 is CRITICAL: NRPE: Command check_dpkg not defined
|
2014-04-08 15:54:51
|
<icinga-wm>
|
PROBLEM - puppet disabled on iodine is CRITICAL: NRPE: Command check_puppet_disabled not defined
|
2014-04-08 15:54:51
|
<icinga-wm>
|
PROBLEM - RAID on pdf3 is CRITICAL: Connection refused by host
|
2014-04-08 15:54:51
|
<icinga-wm>
|
PROBLEM - Disk space on tridge is CRITICAL: NRPE: Command check_disk_space not defined
|
2014-04-08 15:55:00
|
<icinga-wm>
|
PROBLEM - Disk space on bast1001 is CRITICAL: NRPE: Command check_disk_space not defined
|
2014-04-08 15:55:00
|
<icinga-wm>
|
PROBLEM - puppet disabled on pdf3 is CRITICAL: Connection refused by host
|
2014-04-08 15:55:10
|
<icinga-wm>
|
PROBLEM - Disk space on iron is CRITICAL: NRPE: Command check_disk_space not defined
|
2014-04-08 15:55:10
|
<icinga-wm>
|
PROBLEM - RAID on bast1001 is CRITICAL: NRPE: Command check_raid not defined
|
2014-04-08 15:55:10
|
<icinga-wm>
|
PROBLEM - DPKG on dobson is CRITICAL: Connection refused by host
|
2014-04-08 15:55:10
|
<icinga-wm>
|
PROBLEM - DPKG on cp3003 is CRITICAL: NRPE: Command check_dpkg not defined
|
2014-04-08 15:55:10
|
<icinga-wm>
|
PROBLEM - DPKG on virt1000 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
|
2014-04-08 15:55:10
|
<icinga-wm>
|
PROBLEM - puppet disabled on tridge is CRITICAL: NRPE: Command check_puppet_disabled not defined
|
2014-04-08 15:55:41
|
<greg-g>
|
ahhh, so today is going to be a worthless -operations channel day, more than normal, due to the sprint? :)
|
2014-04-08 15:56:03
|
<andrewbogott>
|
We're about to all go to dinner though.
|
2014-04-08 15:56:09
|
<andrewbogott>
|
So things should quiet down shortly.
|
2014-04-08 15:56:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 12:55:50 PM UTC
|
2014-04-08 15:56:19
|
<andrewbogott>
|
But the channel will still be useless if you want to talk to ops :)
|
2014-04-08 15:56:50
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 15:57:03
|
<mutante>
|
will start nagios-nrpe-server on those
|
2014-04-08 15:57:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 12:56:15 PM UTC
|
2014-04-08 15:58:42
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl3001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 15:58:42
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1006 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 15:58:52
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1007 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 15:58:52
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 15:59:32
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 15:59:52
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 16:00:04
|
<aude>
|
back in 5 min or so
|
2014-04-08 16:00:06
|
<grrrit-wm>
|
('Abandoned') 'Physikerwelt': WIP: Enable orthogonal MathJax config [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/110240' (owner: 'Physikerwelt')
|
2014-04-08 16:00:42
|
<icinga-wm>
|
PROBLEM - DPKG on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
|
2014-04-08 16:00:42
|
<icinga-wm>
|
PROBLEM - Disk space on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
|
2014-04-08 16:00:52
|
<icinga-wm>
|
PROBLEM - RAID on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
|
2014-04-08 16:01:02
|
<icinga-wm>
|
PROBLEM - puppet disabled on mchenry is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
|
2014-04-08 16:02:22
|
<icinga-wm>
|
PROBLEM - Puppet freshness on ms6 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:02:03 PM UTC
|
2014-04-08 16:04:37
|
<aude>
|
back
|
2014-04-08 16:08:22
|
<icinga-wm>
|
PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:07:31 PM UTC
|
2014-04-08 16:09:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:09:07 PM UTC
|
2014-04-08 16:09:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs4003 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:08:32 PM UTC
|
2014-04-08 16:09:27
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4020 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:09:27
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4006 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:09:27
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4013 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:09:37
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4009 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:09:37
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4010 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:09:37
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl3003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:09:47
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl3002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:09:47
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1004 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:09:56
|
<paravoid>
|
ottomata: ping
|
2014-04-08 16:09:57
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4012 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:10:07
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4016 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:10:07
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1008 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:10:07
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 16:10:07
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4018 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:10:17
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1009 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:11:23
|
<paravoid>
|
ottomata: ping ping
|
2014-04-08 16:12:47
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 16:12:49
|
<ottomata>
|
pong pong
|
2014-04-08 16:13:05
|
<ottomata>
|
paravoid
|
2014-04-08 16:13:08
|
<ottomata>
|
wassupp
|
2014-04-08 16:13:14
|
<paravoid>
|
what's with stat1's puppet?
|
2014-04-08 16:13:18
|
<paravoid>
|
why is it admin disabled?
|
2014-04-08 16:13:47
|
<ottomata>
|
because it is going to be decomed very soon
|
2014-04-08 16:13:56
|
<ottomata>
|
and i wanted to make puppet changes that would apply to stat1003 but not mess with what was on stat1
|
2014-04-08 16:14:05
|
<ottomata>
|
and I didn't want to re-write a bunch of statistics.pp stuff :/
|
2014-04-08 16:14:07
|
<_joe_>
|
ori: are you around? seems like graphite is *not* working
|
2014-04-08 16:14:24
|
<paravoid>
|
ottomata: that's bad
|
2014-04-08 16:14:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs1002 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:13:54 PM UTC
|
2014-04-08 16:14:35
|
<ottomata>
|
paravoid: even if we are going to decom it soon?
|
2014-04-08 16:14:36
|
<paravoid>
|
ottomata: can you remove the "include statistics*" stuff and enable it again?
|
2014-04-08 16:14:40
|
<paravoid>
|
yes
|
2014-04-08 16:14:42
|
<ottomata>
|
yeah probably can
|
2014-04-08 16:14:47
|
<paravoid>
|
because it's messing with monitoring and all that
|
2014-04-08 16:15:06
|
<ottomata>
|
ah i see it
|
2014-04-08 16:15:20
|
<ottomata>
|
paravoid, what is the differnece between the 3 numbers in each severity category in icinga?
|
2014-04-08 16:15:25
|
<mark>
|
ottomata: disabling puppet for more than a few hours max is almost always a really bad idea
|
2014-04-08 16:15:31
|
<ottomata>
|
mark, ok, noted.
|
2014-04-08 16:15:36
|
<mark>
|
thanks
|
2014-04-08 16:16:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:16:04 PM UTC
|
2014-04-08 16:16:27
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 16:17:07
|
<_joe_>
|
:/
|
2014-04-08 16:17:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:16:39 PM UTC
|
2014-04-08 16:18:10
|
<ottomata>
|
mark, can you help with the current network ACL problems?
|
2014-04-08 16:18:22
|
<mark>
|
sorry, what's that?
|
2014-04-08 16:18:25
|
<ottomata>
|
analytics nodes can't talk to apt
|
2014-04-08 16:18:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs4001 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:17:50 PM UTC
|
2014-04-08 16:18:30
|
<ottomata>
|
nor statsd.eqiad.wmnet
|
2014-04-08 16:18:32
|
<ottomata>
|
https://rt.wikimedia.org/Ticket/Display.html?id=4433
|
2014-04-08 16:18:37
|
<ottomata>
|
I added to the bottom of that ticket
|
2014-04-08 16:18:51
|
<mark>
|
ok
|
2014-04-08 16:18:59
|
<ottomata>
|
i think vanadium was having the same trouble, is it on the vlan too?
|
2014-04-08 16:19:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:19:10 PM UTC
|
2014-04-08 16:19:31
|
<aude>
|
still working on wikiquote
|
2014-04-08 16:19:35
|
<mark>
|
we can look at getting rid of those ACLs perhaps
|
2014-04-08 16:19:41
|
<mark>
|
but we'll need to discuss what you're doing with firewalling
|
2014-04-08 16:20:18
|
<grrrit-wm>
|
('PS1') 'Ottomata': Disabling statistics roles on stat1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124621'
|
2014-04-08 16:20:18
|
<se4598>
|
the fingerprint of the wikis SSL cert apparently changed, but it is not a new issued cert but with the same dates as the previous one that i saved. Is that okay that the fingerprint changed?
|
2014-04-08 16:20:34
|
<ottomata>
|
mark, yeah, hm, not sure, i kind of like them
|
2014-04-08 16:20:35
|
<paravoid>
|
se4598: yes
|
2014-04-08 16:20:45
|
<ottomata>
|
especially since anyone with hadoop access can launch whatever mapreduce jobs they want
|
2014-04-08 16:21:37
|
<grrrit-wm>
|
('CR') 'Ottomata': [C: '2' V: '2'] Disabling statistics roles on stat1 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124621' (owner: 'Ottomata')
|
2014-04-08 16:21:37
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 16:21:44
|
<ottomata>
|
hmmmm
|
2014-04-08 16:21:48
|
<ottomata>
|
that's weird
|
2014-04-08 16:21:59
|
<ottomata>
|
checking on that 5xx thing in a sec
|
2014-04-08 16:22:05
|
<ottomata>
|
that's surely my fault...
|
2014-04-08 16:22:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:21:21 PM UTC
|
2014-04-08 16:22:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs1001 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:21:26 PM UTC
|
2014-04-08 16:22:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs4002 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:22:07 PM UTC
|
2014-04-08 16:22:53
|
<ottomata>
|
hmm, graphite down?
|
2014-04-08 16:23:04
|
<mark>
|
ottomata: statsd access for analytics seems already there
|
2014-04-08 16:23:07
|
<ottomata>
|
maybe that 5xx thing is not my fault!
|
2014-04-08 16:23:26
|
<ottomata>
|
yeah, mark, i think we already had these set up too
|
2014-04-08 16:23:27
|
<icinga-wm>
|
PROBLEM - Puppet freshness on virt2 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:22:28 PM UTC
|
2014-04-08 16:23:37
|
<icinga-wm>
|
RECOVERY - Puppet freshness on stat1 is OK: puppet ran at Tue Apr 8 16:23:30 UTC 2014
|
2014-04-08 16:23:43
|
<ottomata>
|
but it seems that they aren't working right now, starting yesterday when I tried
|
2014-04-08 16:24:02
|
<grrrit-wm>
|
('PS1') 'Hashar': beta: reenable fatalmonitor script on eqiad [operations/puppet] - 'https://gerrit.wikimedia.org/r/124624'
|
2014-04-08 16:24:13
|
<mark>
|
and carbon is in there already too
|
2014-04-08 16:24:15
|
<ottomata>
|
mark, unless pings just aren't allowed and i'm checking wrong?
|
2014-04-08 16:24:24
|
<mark>
|
pings may not be allowed no
|
2014-04-08 16:24:27
|
<ottomata>
|
ori and I both had trouble runnign apt-get update because we coudln't talk to carbon
|
2014-04-08 16:24:31
|
<mark>
|
check again?
|
2014-04-08 16:24:35
|
<ottomata>
|
yeah checking
|
2014-04-08 16:24:48
|
<ottomata>
|
and i was trying to run sqstat on analytics1003
|
2014-04-08 16:24:52
|
<ottomata>
|
so we can decom emery
|
2014-04-08 16:24:59
|
<ottomata>
|
but it couldn't talk to statsd
|
2014-04-08 16:25:38
|
<ottomata>
|
hm.
|
2014-04-08 16:25:44
|
<ottomata>
|
yeah totally working now
|
2014-04-08 16:25:57
|
<ottomata>
|
ooooook.
|
2014-04-08 16:25:59
|
<ottomata>
|
weird.
|
2014-04-08 16:26:00
|
<_joe_>
|
ottomata: graphite is borked
|
2014-04-08 16:26:04
|
<mark>
|
i think faidon did it earlier
|
2014-04-08 16:26:05
|
<grrrit-wm>
|
('CR') 'Hashar': "puppet is broken on deployment-bastion.eqiad.wmflabs, can't deploy the change right now :-/" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124624' (owner: 'Hashar')
|
2014-04-08 16:26:21
|
<ottomata>
|
oh, fixed the acl problem?
|
2014-04-08 16:26:33
|
<ottomata>
|
maybe something else was just not working, and I assumed because I couldn't ping it was an ACL thing?
|
2014-04-08 16:26:55
|
<mark>
|
ping is not a good way to test that
|
2014-04-08 16:27:10
|
<ottomata>
|
yeah, i just saw the packets being filtered from ping
|
2014-04-08 16:27:11
|
<mark>
|
we allow specific protocols/ports, ping uses different ones
|
2014-04-08 16:27:14
|
<ottomata>
|
aye
|
2014-04-08 16:27:30
|
<ottomata>
|
yeah, just figured if i couldn't at least ping then probably other stuff was blcoked too, but ja
|
2014-04-08 16:27:57
|
<ottomata>
|
but yeah, ori couldn't use apt on vanadium either, so dunno...
|
2014-04-08 16:28:10
|
<ottomata>
|
and sqstat couldnt' talk to tungsten, so hm
|
2014-04-08 16:28:12
|
<ottomata>
|
but ok!
|
2014-04-08 16:28:16
|
<mark>
|
:)
|
2014-04-08 16:28:22
|
<mark>
|
we're going for dinner in a bit
|
2014-04-08 16:28:44
|
<ottomata>
|
mark
|
2014-04-08 16:28:45
|
<ottomata>
|
hm
|
2014-04-08 16:28:53
|
<ottomata>
|
so sqstat is trying to talk to tungsten on 2003
|
2014-04-08 16:28:56
|
<hashar>
|
!log Jenkins: killed jenkins-slave java process on gallium and repooled gallium slave. It was no more registered in Zuul :-/
|
2014-04-08 16:28:57
|
<icinga-wm>
|
RECOVERY - puppet disabled on iron is OK: OK
|
2014-04-08 16:28:57
|
<ottomata>
|
is that open?
|
2014-04-08 16:29:01
|
<morebots>
|
Logged the message, Master
|
2014-04-08 16:29:07
|
<icinga-wm>
|
RECOVERY - Disk space on iron is OK: DISK OK
|
2014-04-08 16:29:09
|
<ottomata>
|
can't seem to reach it from an03
|
2014-04-08 16:29:34
|
<manybubbles>
|
ganglia seems upset
|
2014-04-08 16:29:40
|
<mark>
|
protocol udp;
|
2014-04-08 16:29:40
|
<mark>
|
destination-port 8125;
|
2014-04-08 16:29:45
|
<aude>
|
tables added
|
2014-04-08 16:29:51
|
<mark>
|
so port 2003 isn't
|
2014-04-08 16:29:54
|
<ottomata>
|
ah ok
|
2014-04-08 16:30:03
|
<ottomata>
|
that's why then, could you add?
|
2014-04-08 16:30:13
|
<mark>
|
ok
|
2014-04-08 16:30:40
|
<ottomata>
|
i'm going to see if reqstats gets flaky when we move it to analytics1003
|
2014-04-08 16:30:51
|
<ottomata>
|
it was either flaky because erbium is busy
|
2014-04-08 16:30:57
|
<ottomata>
|
or because the multicast firehose is just too lossy
|
2014-04-08 16:31:37
|
<aude>
|
!log added sites and site_identifiers core tables on wikiquote
|
2014-04-08 16:31:41
|
<morebots>
|
Logged the message, Master
|
2014-04-08 16:32:22
|
<mark>
|
2003 should work now
|
2014-04-08 16:33:36
|
<icinga-wm>
|
RECOVERY - DPKG on iodine is OK: All packages OK
|
2014-04-08 16:33:36
|
<icinga-wm>
|
RECOVERY - Disk space on iodine is OK: DISK OK
|
2014-04-08 16:33:36
|
<icinga-wm>
|
RECOVERY - puppet disabled on cp3003 is OK: OK
|
2014-04-08 16:33:39
|
<ottomata>
|
ah just noticed it is udp, mark, will that work still?
|
2014-04-08 16:33:46
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4014 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:46
|
<icinga-wm>
|
RECOVERY - RAID on cp3003 is OK: OK: optimal, 2 logical, 2 physical
|
2014-04-08 16:33:46
|
<icinga-wm>
|
RECOVERY - RAID on iodine is OK: OK: no disks configured for RAID
|
2014-04-08 16:33:46
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1005 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:46
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:47
|
<mark>
|
yes
|
2014-04-08 16:33:51
|
<ottomata>
|
ok cool
|
2014-04-08 16:33:52
|
<ottomata>
|
thanks
|
2014-04-08 16:33:53
|
<ottomata>
|
ok go eat
|
2014-04-08 16:33:55
|
<ottomata>
|
thank you!
|
2014-04-08 16:33:56
|
<icinga-wm>
|
RECOVERY - DPKG on bast1001 is OK: All packages OK
|
2014-04-08 16:33:56
|
<icinga-wm>
|
RECOVERY - puppet disabled on iodine is OK: OK
|
2014-04-08 16:33:56
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4002 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:56
|
<icinga-wm>
|
RECOVERY - HTTPS on amssq47 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:56
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4004 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:57
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:57
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4017 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:58
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4015 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:58
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4008 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:59
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1001 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:33:59
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4005 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:34:00
|
<icinga-wm>
|
RECOVERY - Disk space on bast1001 is OK: DISK OK
|
2014-04-08 16:34:00
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4019 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:34:06
|
<icinga-wm>
|
RECOVERY - RAID on bast1001 is OK: OK: no RAID installed
|
2014-04-08 16:34:06
|
<icinga-wm>
|
RECOVERY - DPKG on cp3003 is OK: All packages OK
|
2014-04-08 16:34:06
|
<icinga-wm>
|
RECOVERY - HTTPS on ssl1003 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:34:06
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4007 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:34:16
|
<icinga-wm>
|
RECOVERY - puppet disabled on bast1001 is OK: OK
|
2014-04-08 16:34:16
|
<icinga-wm>
|
RECOVERY - Disk space on cp3003 is OK: DISK OK
|
2014-04-08 16:34:16
|
<icinga-wm>
|
RECOVERY - HTTPS on cp4011 is OK: SSL_CERT OK - X.509 certificate for *.wikipedia.org from DigiCert High Assurance CA-3 valid until Jan 20 12:00:00 2016 GMT (expires in 652 days)
|
2014-04-08 16:35:36
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs4004 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 01:35:09 PM UTC
|
2014-04-08 16:35:46
|
<icinga-wm>
|
PROBLEM - HTTPS on cp1044 is CRITICAL: SSL_CERT CRITICAL *.wikimedia.org: invalid CN (*.wikimedia.org does not match *.wikipedia.org)
|
2014-04-08 16:35:56
|
<icinga-wm>
|
PROBLEM - HTTPS on cp1043 is CRITICAL: SSL_CERT CRITICAL *.wikimedia.org: invalid CN (*.wikimedia.org does not match *.wikipedia.org)
|
2014-04-08 16:36:48
|
<grrrit-wm>
|
('PS1') 'Ottomata': Putting sqstat back on analytics1003 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124630'
|
2014-04-08 16:37:16
|
<grrrit-wm>
|
('CR') 'Ottomata': [C: '2' V: '2'] Putting sqstat back on analytics1003 [operations/puppet] - 'https://gerrit.wikimedia.org/r/124630' (owner: 'Ottomata')
|
2014-04-08 16:38:30
|
<grrrit-wm>
|
('PS1') 'Springle': invalid MariaDB variable name: user_stat [operations/puppet] - 'https://gerrit.wikimedia.org/r/124632'
|
2014-04-08 16:40:40
|
<grrrit-wm>
|
('CR') 'Springle': [C: '2'] invalid MariaDB variable name: user_stat [operations/puppet] - 'https://gerrit.wikimedia.org/r/124632' (owner: 'Springle')
|
2014-04-08 16:46:50
|
<grrrit-wm>
|
('PS1') 'RobH': replace misc-web-lb cert [operations/puppet] - 'https://gerrit.wikimedia.org/r/124634'
|
2014-04-08 16:48:11
|
<grrrit-wm>
|
('CR') 'RobH': [C: '2' V: '2'] replace misc-web-lb cert [operations/puppet] - 'https://gerrit.wikimedia.org/r/124634' (owner: 'RobH')
|
2014-04-08 16:49:09
|
<aude>
|
sorry, being slow... populating sites table
|
2014-04-08 16:49:20
|
<grrrit-wm>
|
('PS1') 'Alexandros Kosiaris': Removing ethtool package from other places [operations/puppet] - 'https://gerrit.wikimedia.org/r/124637'
|
2014-04-08 16:49:22
|
<aude>
|
suppose no hurry
|
2014-04-08 16:50:08
|
<grrrit-wm>
|
('CR') 'Dzahn': [C: ''] Removing ethtool package from other places [operations/puppet] - 'https://gerrit.wikimedia.org/r/124637' (owner: 'Alexandros Kosiaris')
|
2014-04-08 16:52:03
|
<grrrit-wm>
|
('CR') 'Dzahn': [C: '2'] "now included in base" [operations/puppet] - 'https://gerrit.wikimedia.org/r/124637' (owner: 'Alexandros Kosiaris')
|
2014-04-08 16:53:08
|
<grrrit-wm>
|
('CR') 'Cmcmahon': [C: ''] "Thanks for putting this back." [operations/puppet] - 'https://gerrit.wikimedia.org/r/124624' (owner: 'Hashar')
|
2014-04-08 16:53:36
|
<icinga-wm>
|
RECOVERY - Puppet freshness on virt2 is OK: puppet ran at Tue Apr 8 16:53:29 UTC 2014
|
2014-04-08 16:53:46
|
<icinga-wm>
|
RECOVERY - Puppet freshness on dataset1001 is OK: puppet ran at Tue Apr 8 16:53:39 UTC 2014
|
2014-04-08 16:55:06
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
|
2014-04-08 16:55:28
|
<ottomata>
|
rats
|
2014-04-08 16:56:36
|
<icinga-wm>
|
RECOVERY - Puppet freshness on amslvs2 is OK: puppet ran at Tue Apr 8 16:56:30 UTC 2014
|
2014-04-08 16:56:46
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Tue Apr 8 16:56:45 UTC 2014
|
2014-04-08 16:59:04
|
<aude>
|
waiting for jenkins
|
2014-04-08 17:01:46
|
<icinga-wm>
|
RECOVERY - Puppet freshness on ms6 is OK: puppet ran at Tue Apr 8 17:01:37 UTC 2014
|
2014-04-08 17:01:48
|
<grrrit-wm>
|
('PS2') 'Manybubbles': Turn on experimental highlighting in beta [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124003'
|
2014-04-08 17:03:06
|
<logmsgbot>
|
!log aude synchronized php-1.23wmf20/extensions/Wikidata 'Update Wikidata build, to allow populating sites table on wikiquote'
|
2014-04-08 17:03:10
|
<morebots>
|
Logged the message, Master
|
2014-04-08 17:05:20
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs4004 is OK: puppet ran at Tue Apr 8 17:05:14 UTC 2014
|
2014-04-08 17:05:30
|
<icinga-wm>
|
PROBLEM - RAID on dataset1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded)
|
2014-04-08 17:06:40
|
<icinga-wm>
|
PROBLEM - LVS HTTPS IPv6 on misc-web-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection refused
|
2014-04-08 17:07:40
|
<icinga-wm>
|
RECOVERY - LVS HTTPS IPv6 on misc-web-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 226 bytes in 0.012 second response time
|
2014-04-08 17:08:20
|
<icinga-wm>
|
RECOVERY - Puppet freshness on amslvs3 is OK: puppet ran at Tue Apr 8 17:08:15 UTC 2014
|
2014-04-08 17:08:30
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs4003 is OK: puppet ran at Tue Apr 8 17:08:25 UTC 2014
|
2014-04-08 17:08:44
|
<grrrit-wm>
|
('CR') 'Chad': [C: '2'] Turn on experimental highlighting in beta [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124003' (owner: 'Manybubbles')
|
2014-04-08 17:08:53
|
<grrrit-wm>
|
('Merged') 'jenkins-bot': Turn on experimental highlighting in beta [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124003' (owner: 'Manybubbles')
|
2014-04-08 17:09:40
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs1006 is OK: puppet ran at Tue Apr 8 17:09:30 UTC 2014
|
2014-04-08 17:10:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 17:10:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 17:10:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 17:10:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 17:10:19
|
<grrrit-wm>
|
('CR') 'QChris': "Prerequisite got merged." [operations/puppet] - 'https://gerrit.wikimedia.org/r/121546' (owner: 'Ottomata')
|
2014-04-08 17:10:52
|
<aude>
|
^demon|away: are you deploying stuff?
|
2014-04-08 17:11:14
|
<aude>
|
i'll need to sneak in some point for a config change, but not yet
|
2014-04-08 17:11:29
|
<grrrit-wm>
|
('PS1') 'Ottomata': Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641'
|
2014-04-08 17:11:38
|
<grrrit-wm>
|
('PS2') 'Ottomata': Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641'
|
2014-04-08 17:11:40
|
<grrrit-wm>
|
('CR') 'jenkins-bot': [V: '-1'] Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641' (owner: 'Ottomata')
|
2014-04-08 17:11:50
|
<grrrit-wm>
|
('CR') 'Ottomata': [C: '2' V: '2'] Moving sqstat back to emery :/ [operations/puppet] - 'https://gerrit.wikimedia.org/r/124641' (owner: 'Ottomata')
|
2014-04-08 17:12:28
|
<manybubbles>
|
aude: no, he just merged something for beta
|
2014-04-08 17:12:34
|
<aude>
|
ok
|
2014-04-08 17:12:41
|
<aude>
|
probably need 10 more minutes
|
2014-04-08 17:12:50
|
<aude>
|
done populating tables, now checking they are ok
|
2014-04-08 17:13:00
|
<aude>
|
then can do the config change and then done :)
|
2014-04-08 17:13:19
|
<^demon|away>
|
aude: Nope, just merged that for Nik for beta.
|
2014-04-08 17:13:21
|
<^demon|away>
|
Like he said :)
|
2014-04-08 17:13:22
|
<aude>
|
going slow and careful since i'm still newish
|
2014-04-08 17:13:25
|
<aude>
|
doign this stuff
|
2014-04-08 17:13:32
|
<^demon|away>
|
Someone should sync it eventually for consistency, but no biggie.
|
2014-04-08 17:13:53
|
<aude>
|
i can do
|
2014-04-08 17:14:04
|
<hoo>
|
so can I
|
2014-04-08 17:14:29
|
<aude>
|
hoo: want to check the sites tables and site_identifiers for wikiquote?
|
2014-04-08 17:14:30
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs1002 is OK: puppet ran at Tue Apr 8 17:14:22 UTC 2014
|
2014-04-08 17:14:36
|
<aude>
|
they look ok to me
|
2014-04-08 17:15:30
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs1005 is OK: puppet ran at Tue Apr 8 17:15:22 UTC 2014
|
2014-04-08 17:16:02
|
<grrrit-wm>
|
('CR') 'Aude': "sites table and site_identifiers are added and populated" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
|
2014-04-08 17:16:10
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs1004 is OK: puppet ran at Tue Apr 8 17:16:02 UTC 2014
|
2014-04-08 17:16:28
|
<manybubbles>
|
!log finished upgrading elastic1001-1006. starting on 1007. yay progress.
|
2014-04-08 17:16:32
|
<morebots>
|
Logged the message, Master
|
2014-04-08 17:16:34
|
<hoo>
|
enwikiqoute looks good to me
|
2014-04-08 17:16:39
|
<aude>
|
alright
|
2014-04-08 17:16:40
|
<hoo>
|
sites and site_identifiers
|
2014-04-08 17:16:44
|
<aude>
|
strip protocals and all
|
2014-04-08 17:16:52
|
<hoo>
|
yep
|
2014-04-08 17:16:58
|
<aude>
|
https://gerrit.wikimedia.org/r/#/c/124516/ want to merge
|
2014-04-08 17:17:07
|
<aude>
|
i can deploy it and sync the cirrus thing
|
2014-04-08 17:17:19
|
<manybubbles>
|
thanks1
|
2014-04-08 17:17:22
|
<hoo>
|
ok, also looks good on WD
|
2014-04-08 17:17:30
|
<aude>
|
ok
|
2014-04-08 17:17:45
|
<aude>
|
let me sync cirrus
|
2014-04-08 17:17:52
|
<hoo>
|
go ahead
|
2014-04-08 17:17:53
|
<Nemo_bis>
|
Oh, today is the day
|
2014-04-08 17:18:06
|
<aude>
|
it's *the* day :)
|
2014-04-08 17:18:10
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs4001 is OK: puppet ran at Tue Apr 8 17:18:03 UTC 2014
|
2014-04-08 17:19:18
|
<hoo>
|
aude: You also sorted the wikidataclient dblist? :P
|
2014-04-08 17:19:53
|
<aude>
|
yes
|
2014-04-08 17:20:04
|
<hoo>
|
Ok, looks good to me, can approve whenever you want
|
2014-04-08 17:20:05
|
<aude>
|
they will get sorted eventually
|
2014-04-08 17:20:13
|
<aude>
|
doing chad's thing
|
2014-04-08 17:20:30
|
<icinga-wm>
|
RECOVERY - Puppet freshness on amslvs1 is OK: puppet ran at Tue Apr 8 17:20:23 UTC 2014
|
2014-04-08 17:21:30
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs1001 is OK: puppet ran at Tue Apr 8 17:21:24 UTC 2014
|
2014-04-08 17:21:50
|
<icinga-wm>
|
RECOVERY - Puppet freshness on amslvs4 is OK: puppet ran at Tue Apr 8 17:21:45 UTC 2014
|
2014-04-08 17:22:30
|
<icinga-wm>
|
RECOVERY - Puppet freshness on lvs4002 is OK: puppet ran at Tue Apr 8 17:22:21 UTC 2014
|
2014-04-08 17:22:43
|
<logmsgbot>
|
!log aude synchronized wmf-config/CirrusSearch-labs.php 'config change for beta, to enable highlighting'
|
2014-04-08 17:22:47
|
<morebots>
|
Logged the message, Master
|
2014-04-08 17:23:06
|
<aude>
|
hoo: ready
|
2014-04-08 17:23:45
|
<grrrit-wm>
|
('CR') 'Hoo man': [C: '2'] "Preparation finished, so do this! \o/" [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
|
2014-04-08 17:23:49
|
<aude>
|
yay!
|
2014-04-08 17:23:51
|
<hoo>
|
there you go ;)
|
2014-04-08 17:23:53
|
<grrrit-wm>
|
('Merged') 'jenkins-bot': Enable Wikibase on Wikiquote [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124516' (owner: 'Aude')
|
2014-04-08 17:27:20
|
<hoo>
|
aude: About to sync or shall I take it?
|
2014-04-08 17:27:21
|
<aude>
|
sync dblist then wmf-config?
|
2014-04-08 17:27:31
|
<Nemo_bis>
|
waiting
|
2014-04-08 17:27:43
|
<aude>
|
no other way
|
2014-04-08 17:27:52
|
<hoo>
|
other way round sounds sane
|
2014-04-08 17:28:02
|
<aude>
|
wmf-config then dblist is good
|
2014-04-08 17:28:06
|
<hoo>
|
wmf-config changes will work w/o the rest
|
2014-04-08 17:28:10
|
<aude>
|
right
|
2014-04-08 17:28:20
|
<aude>
|
that' what ree-dy did for wikisource
|
2014-04-08 17:28:52
|
<aude>
|
doing
|
2014-04-08 17:28:55
|
<hoo>
|
:)
|
2014-04-08 17:28:59
|
<logmsgbot>
|
!log aude synchronized wmf-config 'config changes to enable Wikibase on Wikiquote'
|
2014-04-08 17:29:04
|
<morebots>
|
Logged the message, Master
|
2014-04-08 17:29:12
|
<grrrit-wm>
|
('PS1') 'Matthias Mullie': Increase Flow cache version [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124646'
|
2014-04-08 17:29:52
|
<logmsgbot>
|
!log aude synchronized wikidataclient.dblist 'Enable Wikibase on Wikiquote'
|
2014-04-08 17:29:57
|
<morebots>
|
Logged the message, Master
|
2014-04-08 17:30:01
|
<hoo>
|
oO
|
2014-04-08 17:30:02
|
<hoo>
|
:)
|
2014-04-08 17:30:12
|
<aude>
|
alright time to check it's all good
|
2014-04-08 17:30:17
|
<hoo>
|
on that
|
2014-04-08 17:31:13
|
<hoo>
|
oh well... I think we have to bump wgCacheEpoch once again
|
2014-04-08 17:31:14
|
<hoo>
|
aude: ^
|
2014-04-08 17:31:36
|
<aude>
|
huh
|
2014-04-08 17:31:45
|
<aude>
|
ah, yes
|
2014-04-08 17:32:00
|
<hoo>
|
shall I patch or will you?
|
2014-04-08 17:32:26
|
<Nemo_bis>
|
https://www.wikidata.org/wiki/Q189119#sitelinks-wikiquote
|
2014-04-08 17:32:34
|
<hoo>
|
Nemo_bis: Yes, the usual stuff
|
2014-04-08 17:32:34
|
<aude>
|
go ahead
|
2014-04-08 17:33:06
|
<aude>
|
it says list of values is complete
|
2014-04-08 17:33:09
|
<aude>
|
i assume caching
|
2014-04-08 17:33:16
|
<aude>
|
on Q60
|
2014-04-08 17:33:57
|
<aude>
|
debug=true, i can add wikiquote
|
2014-04-08 17:34:23
|
<Nemo_bis>
|
yep, I did action=purge
|
2014-04-08 17:34:23
|
<grrrit-wm>
|
('PS1') 'Hoo man': Bump wgCacheEpoch for Wikidata after enabling Wikiquote langlinks [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124648'
|
2014-04-08 17:34:24
|
<hoo>
|
yep
|
2014-04-08 17:34:31
|
<hoo>
|
aude: ^
|
2014-04-08 17:34:35
|
<aude>
|
ok
|
2014-04-08 17:35:21
|
<ottomata>
|
!log restarted gmetad on nickel to fix ganglia
|
2014-04-08 17:35:26
|
<morebots>
|
Logged the message, Master
|
2014-04-08 17:35:33
|
<grrrit-wm>
|
('CR') 'Aude': [C: '2'] Bump wgCacheEpoch for Wikidata after enabling Wikiquote langlinks [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124648' (owner: 'Hoo man')
|
2014-04-08 17:35:40
|
<grrrit-wm>
|
('Merged') 'jenkins-bot': Bump wgCacheEpoch for Wikidata after enabling Wikiquote langlinks [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124648' (owner: 'Hoo man')
|
2014-04-08 17:37:00
|
<hoo>
|
aude: Syncing? I have to sync a touch out
|
2014-04-08 17:37:10
|
<aude>
|
doing
|
2014-04-08 17:37:12
|
<hoo>
|
ok
|
2014-04-08 17:37:18
|
<logmsgbot>
|
!log aude synchronized wmf-config/Wikibase.php 'bump wgCacheEpoch for wikidata after enabling wikiquote site links'
|
2014-04-08 17:37:19
|
<aude>
|
just being careful
|
2014-04-08 17:37:22
|
<morebots>
|
Logged the message, Master
|
2014-04-08 17:37:28
|
<logmsgbot>
|
!log hoo synchronized php-1.23wmf20/extensions/Wikidata/extensions/Wikibase/lib/resources/wikibase.Site.js 'touch'
|
2014-04-08 17:37:32
|
<morebots>
|
Logged the message, Master
|
2014-04-08 17:37:34
|
<hoo>
|
that should purge the sites cache
|
2014-04-08 17:37:43
|
<greg-g>
|
"13:37 < aude> just being careful" +1 ;)
|
2014-04-08 17:37:44
|
<hoo>
|
in resource loader
|
2014-04-08 17:37:47
|
<aude>
|
:)
|
2014-04-08 17:38:25
|
<aude>
|
still says complete
|
2014-04-08 17:38:30
|
<hoo>
|
mh :/
|
2014-04-08 17:38:45
|
<aude>
|
sites module has always been a pain
|
2014-04-08 17:40:24
|
<aude>
|
maybe php-1.23wmf20/extensions/Wikidata/extensions/Wikibase/lib/includes/modules/SitesModule.php ?
|
2014-04-08 17:40:43
|
<hoo>
|
aude: Wont help, RL does timestamps based on the JS scripts
|
2014-04-08 17:40:50
|
<aude>
|
hmmm, ok
|
2014-04-08 17:41:13
|
<hoo>
|
works for me
|
2014-04-08 17:41:16
|
<hoo>
|
now at least
|
2014-04-08 17:41:35
|
<aude>
|
trying in firefox
|
2014-04-08 17:41:39
|
<aude>
|
might be my caching
|
2014-04-08 17:41:42
|
<hoo>
|
\o/ Just added the first link
|
2014-04-08 17:41:46
|
<hoo>
|
https://www.wikidata.org/wiki/Q40904#sitelinks-wikiquote
|
2014-04-08 17:41:48
|
<aude>
|
already did one :)
|
2014-04-08 17:41:54
|
<aude>
|
with debug=true
|
2014-04-08 17:41:59
|
<hoo>
|
Cheating :D
|
2014-04-08 17:42:11
|
<aude>
|
heh
|
2014-04-08 17:42:23
|
<aude>
|
looks good in firefox
|
2014-04-08 17:42:30
|
<aude>
|
i have to assume it's my cache
|
2014-04-08 17:42:31
|
<Nemo_bis>
|
I did one ten minutes ago already :P
|
2014-04-08 17:42:35
|
<hoo>
|
:P
|
2014-04-08 17:42:36
|
<aude>
|
yay
|
2014-04-08 17:42:45
|
<hoo>
|
Nemo_bis: with debug true, I guess?!
|
2014-04-08 17:42:50
|
<Nemo_bis>
|
lol Heisenberg
|
2014-04-08 17:42:55
|
<Nemo_bis>
|
19.34 < Nemo_bis> yep, I did action=purge
|
2014-04-08 17:43:01
|
<hoo>
|
:P
|
2014-04-08 17:43:01
|
<aude>
|
ah
|
2014-04-08 17:43:50
|
<Guest75555>
|
Is there a procedure to delete gerrit repositories?
|
2014-04-08 17:45:00
|
<aude>
|
i can add links in wikidata now in chrome
|
2014-04-08 17:45:09
|
<hoo>
|
aude: https://en.wikiquote.org/w/index.php?title=Werner_Heisenberg&action=info mh
|
2014-04-08 17:45:14
|
<hoo>
|
why is it not showing up?
|
2014-04-08 17:45:34
|
<Nemo_bis>
|
Guest64226 / krinkle : probably you can ask on the same gerrit queue page as usual
|
2014-04-08 17:45:53
|
<hoo>
|
ah, I see
|
2014-04-08 17:45:57
|
<Nemo_bis>
|
unless it's not "your" repository, in which case maybe a bug is better
|
2014-04-08 17:46:11
|
<hoo>
|
dispatching is ... :S
|
2014-04-08 17:47:21
|
<aude>
|
hmmm
|
2014-04-08 17:47:28
|
<hoo>
|
https://www.wikidata.org/wiki/Special:DispatchStats
|
2014-04-08 17:47:44
|
<aude>
|
i did action=purge on https://en.wikiquote.org/wiki/New_York_City
|
2014-04-08 17:47:46
|
<hoo>
|
aude: Can we safely skip theses changes? If not just waiting is also fine
|
2014-04-08 17:47:54
|
<hoo>
|
it's catching up rather quickly AFAIS
|
2014-04-08 17:47:55
|
<aude>
|
removed dewikiquote
|
2014-04-08 17:48:08
|
<aude>
|
we can wait
|
2014-04-08 17:48:16
|
<bd808|deploy>
|
waits in line to do a group0 to 1.23wmf21 scap
|
2014-04-08 17:48:28
|
<aude>
|
give us 5 more minutes to poke
|
2014-04-08 17:48:43
|
<bd808|deploy>
|
aude: Sounds good
|
2014-04-08 17:48:59
|
<aude>
|
i think we're ok though...
|
2014-04-08 17:49:32
|
<aude>
|
or nothing we solve in 5 min, but didn't break anything
|
2014-04-08 17:50:51
|
<hoo>
|
aude: I can bump the chd_seen fields
|
2014-04-08 17:51:12
|
<aude>
|
ok
|
2014-04-08 17:52:05
|
<hoo>
|
Just looking for the right change id
|
2014-04-08 17:53:43
|
<hoo>
|
got that
|
2014-04-08 17:54:37
|
<aude>
|
something is weird with wikiquote... like it's not actually enabled now
|
2014-04-08 17:54:45
|
<aude>
|
but sure i saw it was
|
2014-04-08 17:55:29
|
<aude>
|
thinks this happened with wikisource
|
2014-04-08 17:56:19
|
<hoo>
|
!log changed the Wikidata wb_changes_dispatch position of all wikiquote wikis to 118158153
|
2014-04-08 17:56:23
|
<morebots>
|
Logged the message, Master
|
2014-04-08 17:56:39
|
<aude>
|
enwikiquote is in wikidataclient.dblist
|
2014-04-08 17:56:42
|
<hoo>
|
20140408172900
|
2014-04-08 17:57:03
|
<hoo>
|
that was the timestamp, should be a few moments before anything happened regarding wikiquote
|
2014-04-08 17:57:12
|
<aude>
|
ok
|
2014-04-08 17:57:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 540.333313
|
2014-04-08 17:58:28
|
<hoo>
|
still https://en.wikiquote.org/w/index.php?title=Werner_Heisenberg&action=info
|
2014-04-08 17:58:56
|
<hoo>
|
Wikidata is not even loaded there... wtf
|
2014-04-08 17:58:59
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 645.666687
|
2014-04-08 17:59:03
|
<aude>
|
right,
|
2014-04-08 17:59:05
|
<aude>
|
i'm sure it was
|
2014-04-08 17:59:25
|
<aude>
|
do i have to sync dblist again?
|
2014-04-08 17:59:37
|
<aude>
|
did we somehow undo it?
|
2014-04-08 18:00:58
|
<hoo>
|
no, looks good on a random mw* machine
|
2014-04-08 18:01:09
|
<icinga-wm>
|
PROBLEM - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 1694 MB (2% inode=86%):
|
2014-04-08 18:01:14
|
<hoo>
|
ah
|
2014-04-08 18:01:50
|
<logmsgbot>
|
!log hoo synchronized wmf-config/InitialiseSettings.php 'Touch to clear config. cache'
|
2014-04-08 18:01:54
|
<morebots>
|
Logged the message, Master
|
2014-04-08 18:01:55
|
<aude>
|
ok
|
2014-04-08 18:02:09
|
<aude>
|
it's back!
|
2014-04-08 18:02:11
|
<hoo>
|
Sorry, I forgot about that
|
2014-04-08 18:02:33
|
<aude>
|
was about to try that
|
2014-04-08 18:02:37
|
<hoo>
|
:)
|
2014-04-08 18:02:41
|
<aude>
|
touch all the wikidata things :)
|
2014-04-08 18:02:43
|
<bd808|deploy>
|
wants to fix https://bugzilla.wikimedia.org/show_bug.cgi?id=58618 so that's automatic
|
2014-04-08 18:02:56
|
<aude>
|
i think we are done!
|
2014-04-08 18:03:19
|
<aude>
|
i am sure this happened on wikisource or previously where it was enabled and then not
|
2014-04-08 18:03:38
|
<aude>
|
puzzled but we're good now
|
2014-04-08 18:04:13
|
<hoo>
|
Yep, looks good to me
|
2014-04-08 18:04:23
|
<bd808|deploy>
|
aude, hoo: All clear for me to mess with /a/common on tin and then scap?
|
2014-04-08 18:04:37
|
<hoo>
|
Yep, go ahead... we're done for now :)
|
2014-04-08 18:04:47
|
<bd808|deploy>
|
Cool
|
2014-04-08 18:05:08
|
<aude>
|
done
|
2014-04-08 18:06:11
|
<grrrit-wm>
|
('PS1') 'BryanDavis': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124655'
|
2014-04-08 18:06:50
|
<greg-g>
|
crosses fingers and knocks on wood
|
2014-04-08 18:07:03
|
<grrrit-wm>
|
('CR') 'BryanDavis': [C: '2'] Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124655' (owner: 'BryanDavis')
|
2014-04-08 18:07:05
|
<aude>
|
too!
|
2014-04-08 18:07:46
|
<bd808|deploy>
|
greg-g: Aaron merged my fix so in theory I should only need one scap. I'll verify the file after the first scap to be certain
|
2014-04-08 18:08:21
|
<greg-g>
|
nods
|
2014-04-08 18:08:28
|
<grrrit-wm>
|
('Merged') 'jenkins-bot': Group0 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124655' (owner: 'BryanDavis')
|
2014-04-08 18:10:36
|
<logmsgbot>
|
!log bd808 Started scap: group0 wikis to 1.23wmf21 (with patch for bug 63659)
|
2014-04-08 18:10:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 18:10:41
|
<morebots>
|
Logged the message, Master
|
2014-04-08 18:11:25
|
<bd808|deploy>
|
l10n cache did not rebuild which is a great sign
|
2014-04-08 18:11:58
|
<jackmcbarn>
|
Unable to open /usr/local/apache/common-local/wikiversions.cdb.
|
2014-04-08 18:11:58
|
<MatmaRex>
|
https://pl.wikipedia.org/w/index.php?title=Dyskusja_wikiprojektu:%C5%9Ar%C3%B3dziemie&oldid=prev&diff=39218000
|
2014-04-08 18:12:01
|
<MatmaRex>
|
i get a "Unable to open /usr/local/apache/common-local/wikiversions.cdb."
|
2014-04-08 18:12:10
|
<andre__>
|
...and same here.
|
2014-04-08 18:12:12
|
<manybubbles>
|
[2014-04-08 18:11:37] Fatal error: Unable to open /usr/local/apache/common-local/wikiversions.cdb.
|
2014-04-08 18:12:15
|
<rschen7754>
|
uh-oh
|
2014-04-08 18:12:19
|
<bd808|deploy>
|
Yeah. fuck
|
2014-04-08 18:12:21
|
<manybubbles>
|
yeah, you got it
|
2014-04-08 18:12:22
|
<Steinsplitter>
|
here the same
|
2014-04-08 18:12:26
|
<bd808|deploy>
|
It will be fixed in a few moments
|
2014-04-08 18:12:30
|
<manybubbles>
|
thats everything
|
2014-04-08 18:12:31
|
<greg-g>
|
well shit
|
2014-04-08 18:12:45
|
<bd808|deploy>
|
fuuuuck
|
2014-04-08 18:12:49
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000
|
2014-04-08 18:12:57
|
<bd808|deploy>
|
There's my first crash all of the wikis
|
2014-04-08 18:12:59
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 18:13:00
|
<MaxSem>
|
SNAFU?
|
2014-04-08 18:13:05
|
<aude>
|
wtf
|
2014-04-08 18:13:13
|
<Amgine>
|
down on wm
|
2014-04-08 18:13:21
|
<manybubbles>
|
damn it, I was actually reading an article and I reloaded it to test
|
2014-04-08 18:13:23
|
<bd808|deploy>
|
It was my "fix" for the scap problem
|
2014-04-08 18:13:25
|
<manybubbles>
|
now I can't read it while I wait
|
2014-04-08 18:13:29
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1190 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.007 second response time
|
2014-04-08 18:13:29
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1055 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.013 second response time
|
2014-04-08 18:13:29
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1150 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.004 second response time
|
2014-04-08 18:13:29
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1101 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.005 second response time
|
2014-04-08 18:13:29
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1177 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.009 second response time
|
2014-04-08 18:13:29
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1138 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.003 second response time
|
2014-04-08 18:13:30
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1187 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.006 second response time
|
2014-04-08 18:13:30
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1220 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.006 second response time
|
2014-04-08 18:13:31
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1197 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.013 second response time
|
2014-04-08 18:13:31
|
<icinga-wm>
|
PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - check plugin (check_job_queue) or PHP errors -
|
2014-04-08 18:13:33
|
<marktraceur>
|
Whoa
|
2014-04-08 18:13:34
|
<aude>
|
cries
|
2014-04-08 18:13:39
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1213 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.018 second response time
|
2014-04-08 18:13:39
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1113 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.012 second response time
|
2014-04-08 18:13:39
|
<icinga-wm>
|
PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.008 second response time
|
2014-04-08 18:13:42
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1200 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.006 second response time
|
2014-04-08 18:13:42
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1035 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.022 second response time
|
2014-04-08 18:13:42
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1031 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.011 second response time
|
2014-04-08 18:13:42
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1090 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.010 second response time
|
2014-04-08 18:13:42
|
<icinga-wm>
|
PROBLEM - Apache HTTP on mw1154 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal server error - 50485 bytes in 0.007 second response time
|
2014-04-08 18:13:52
|
<bd808|deploy>
|
It will be fixed soon… scap will fix it at the end
|
2014-04-08 18:13:54
|
<logmsgbot>
|
!log bd808 Finished scap: group0 wikis to 1.23wmf21 (with patch for bug 63659) (duration: 03m 18s)
|
2014-04-08 18:13:59
|
<morebots>
|
Logged the message, Master
|
2014-04-08 18:14:00
|
<aude>
|
alright
|
2014-04-08 18:14:01
|
<bd808|deploy>
|
Should be fixed now
|
2014-04-08 18:14:04
|
<manybubbles>
|
fixed
|
2014-04-08 18:14:15
|
<greg-g>
|
breathes again
|
2014-04-08 18:14:22
|
<jackmcbarn>
|
can whoever's in charge of icinga-wm bring it back to life?
|
2014-04-08 18:14:35
|
<sjoerddebruin>
|
Damn it. :P
|
2014-04-08 18:14:37
|
<greg-g>
|
jackmcbarn: it'll again automatically, I *believe*
|
2014-04-08 18:14:38
|
<PiRCarre>
|
Someone
|
2014-04-08 18:14:39
|
<MaxSem>
|
so what happened?
|
2014-04-08 18:14:47
|
<PiRCarre>
|
Oh, you know about it?
|
2014-04-08 18:14:48
|
<Marybelle>
|
greg-g: You accidentally a verb.
|
2014-04-08 18:14:49
|
<PiRCarre>
|
ok
|
2014-04-08 18:14:50
|
<icinga-wm>
|
RECOVERY - Apache HTTP on mw1027 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.066 second response time
|
2014-04-08 18:14:50
|
<icinga-wm>
|
RECOVERY - Apache HTTP on mw1092 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.073 second response time
|
2014-04-08 18:14:51
|
<icinga-wm>
|
RECOVERY - Apache HTTP on mw1073 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.084 second response time
|
2014-04-08 18:14:51
|
<icinga-wm>
|
RECOVERY - Apache HTTP on mw1018 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.111 second response time
|
2014-04-08 18:14:51
|
<bd808|deploy>
|
Patch https://gerrit.wikimedia.org/r/#/c/124627/
|
2014-04-08 18:14:52
|
<icinga-wm>
|
RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.062 second response time
|
2014-04-08 18:14:52
|
<icinga-wm>
|
RECOVERY - Apache HTTP on mw1217 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.059 second response time
|
2014-04-08 18:15:07
|
<greg-g>
|
Marybelle: :)
|
2014-04-08 18:15:16
|
<bd808|deploy>
|
I'll write up the email. I know exactly what I fucked up
|
2014-04-08 18:15:21
|
<PiRCarre>
|
bd808|deploy: thanks, I was just about to report "Unable to open /usr/local/apache/common-local/wikiversions.cdb." - glad to see it's under control
|
2014-04-08 18:15:29
|
<aude>
|
breathes
|
2014-04-08 18:15:54
|
<paravoid>
|
what's going on?
|
2014-04-08 18:16:08
|
<paravoid>
|
we are all at dinner
|
2014-04-08 18:16:23
|
<manybubbles>
|
fixed now
|
2014-04-08 18:16:24
|
<aude>
|
it's ok
|
2014-04-08 18:16:25
|
<bd808|deploy>
|
paravoid: My fault. Should be fixed now
|
2014-04-08 18:16:31
|
<paravoid>
|
okay
|
2014-04-08 18:16:35
|
<greg-g>
|
paravoid: go back to dinner, all's ok again :)
|
2014-04-08 18:16:36
|
<aude>
|
scap temporarily broke everything though
|
2014-04-08 18:16:36
|
<paravoid>
|
do you need anything?
|
2014-04-08 18:16:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 183.266663
|
2014-04-08 18:16:39
|
<paravoid>
|
ok
|
2014-04-08 18:16:44
|
<paravoid>
|
manual page us if something happens
|
2014-04-08 18:16:52
|
<greg-g>
|
paravoid: nope, known ef up
|
2014-04-08 18:16:57
|
<greg-g>
|
paravoid: will do, enjoy!
|
2014-04-08 18:17:05
|
<paravoid>
|
ciao
|
2014-04-08 18:18:17
|
<grrrit-wm>
|
('PS2') 'Gerg? Tisza': Add setting to show a survey for MediaViewer users on some sites [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036'
|
2014-04-08 18:18:56
|
<grrrit-wm>
|
('CR') 'Gerg? Tisza': "Updated to display feedback survey on beta enwiki." [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036' (owner: 'Gerg? Tisza')
|
2014-04-08 18:19:29
|
<bd808|deploy>
|
greg-g: I just reverted my patch to scap that caused that cascade of horribleness
|
2014-04-08 18:19:36
|
<greg-g>
|
:)
|
2014-04-08 18:19:44
|
<bd808|deploy>
|
One the plus side, group0 is on wmf21 now
|
2014-04-08 18:19:50
|
<greg-g>
|
lol
|
2014-04-08 18:19:58
|
<greg-g>
|
literal-lol
|
2014-04-08 18:20:09
|
<aude>
|
scared to change it back
|
2014-04-08 18:20:20
|
<greg-g>
|
"Don't. Touch. Any. Thing."
|
2014-04-08 18:20:25
|
<aude>
|
i suppose if bd808|deploy 's patch is reverted then ok
|
2014-04-08 18:20:39
|
<greg-g>
|
well, we still have the previous issue which it was trying to fix ;)
|
2014-04-08 18:20:59
|
<greg-g>
|
1 step forward, 1 step back
|
2014-04-08 18:21:23
|
<bd808|deploy>
|
So yes we are temporarily back to needing to double-scap, but I'll make a patch that doesn't melt the world after lunch
|
2014-04-08 18:22:25
|
<greg-g>
|
bd808|deploy: :)
|
2014-04-08 18:23:15
|
<aude>
|
wikiquote etc all looks fine, so i'm going home / eating
|
2014-04-08 18:23:20
|
<aude>
|
back in hour
|
2014-04-08 18:23:26
|
<greg-g>
|
k, I'll do the same
|
2014-04-08 18:23:33
|
<Nemo_bis>
|
quite late dinner for berlin
|
2014-04-08 18:23:47
|
<manybubbles>
|
so I told my wife we broke the internet. she told me facebook was working....
|
2014-04-08 18:24:18
|
<hoo>
|
Nemo_bis: It's never to late for food :P
|
2014-04-08 18:24:41
|
<Jamesofur>
|
^
|
2014-04-08 18:28:38
|
<Nemo_bis>
|
hoo: well, I'd call death for starvation, pellagra etc. "too late" :P
|
2014-04-08 18:29:07
|
<hoo>
|
Nemo_bis: :P To late as in time of the day...
|
2014-04-08 18:29:08
|
<hoo>
|
:D
|
2014-04-08 18:30:17
|
<ori>
|
hoo: http://p.defau.lt/?md_cbLJuORDNsGkhY6_NAg :P
|
2014-04-08 18:30:55
|
<hoo>
|
at least the other errors are gone now, I guess
|
2014-04-08 18:31:28
|
<greg-g>
|
manybubbles: :(
|
2014-04-08 18:31:42
|
<greg-g>
|
goes to lunch for real
|
2014-04-08 18:32:34
|
<ori>
|
hoo: yeah, i submitted a patch for hhvm to fix that other issue btw
|
2014-04-08 18:32:49
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.144
|
2014-04-08 18:34:15
|
<hoo>
|
ori: Oh... nice that it's actually done in PHP :)
|
2014-04-08 18:35:34
|
<manybubbles>
|
yeah yeah yeah, elasticsearch 1012 is being upgraded
|
2014-04-08 18:37:56
|
<ori>
|
hoo: which component should that be filed under?
|
2014-04-08 18:39:25
|
<hoo>
|
ori: already done https://bugzilla.wikimedia.org/show_bug.cgi?id=63691
|
2014-04-08 18:39:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 639.299988
|
2014-04-08 18:39:40
|
<ori>
|
oh cool, thanks!
|
2014-04-08 18:42:09
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 530.733337
|
2014-04-08 18:42:20
|
<hoo>
|
ori: Any idea who to poke about https://gerrit.wikimedia.org/r/121709 ?
|
2014-04-08 18:43:46
|
<grrrit-wm>
|
('CR') 'Matanya': add interface speed check for all hosts ('2' comments) [operations/puppet] - 'https://gerrit.wikimedia.org/r/124606' (owner: 'Cmjohnson')
|
2014-04-08 18:44:08
|
<grrrit-wm>
|
('PS2') 'Ori.livneh': Change wgServer and wgCanonicalServer for arbcom wikis [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121709' (owner: 'Hoo man')
|
2014-04-08 18:44:53
|
<grrrit-wm>
|
('CR') 'Ori.livneh': [C: '2'] Change wgServer and wgCanonicalServer for arbcom wikis [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121709' (owner: 'Hoo man')
|
2014-04-08 18:45:06
|
<logmsgbot>
|
!log ori updated /a/common to {{Gerrit|I4b18e4ce8}}: Change wgServer and wgCanonicalServer for arbcom wikis
|
2014-04-08 18:45:11
|
<morebots>
|
Logged the message, Master
|
2014-04-08 18:45:28
|
<hoo>
|
heh :)
|
2014-04-08 18:45:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 18:45:50
|
<logmsgbot>
|
!log ori synchronized wmf-config/InitialiseSettings.php 'I4b18e4ce8: Change wgServer and wgCanonicalServer for arbcom wikis'
|
2014-04-08 18:45:55
|
<morebots>
|
Logged the message, Master
|
2014-04-08 18:53:40
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 18:56:09
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 18:57:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 172.800003
|
2014-04-08 18:58:59
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 18:59:00
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
|
2014-04-08 18:59:00
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
|
2014-04-08 18:59:00
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
|
2014-04-08 18:59:00
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
|
2014-04-08 18:59:09
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.10
|
2014-04-08 18:59:09
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
|
2014-04-08 18:59:09
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
|
2014-04-08 18:59:10
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1016 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
|
2014-04-08 18:59:29
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1895: active_shards: 5202: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 409
|
2014-04-08 19:00:03
|
<manybubbles>
|
blhe
|
2014-04-08 19:00:11
|
<manybubbles>
|
it recovered in a few seconds
|
2014-04-08 19:00:16
|
<manybubbles>
|
not sure why it did that
|
2014-04-08 19:07:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 341.200012
|
2014-04-08 19:12:00
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1001 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:12:00
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:12:00
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:12:00
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:12:10
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:12:11
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1013 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:12:11
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1016 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:12:11
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:13:16
|
<manybubbles>
|
thats right
|
2014-04-08 19:13:18
|
<manybubbles>
|
horrible check
|
2014-04-08 19:13:36
|
<manybubbles>
|
no errors in the logs associated with those warnings
|
2014-04-08 19:18:49
|
<icinga-wm>
|
RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000
|
2014-04-08 19:20:55
|
<huh>
|
https://en.wikipedia.org/wiki/Wikipedia:VPT#Heartbleed_bug.3F
|
2014-04-08 19:23:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 531.166687
|
2014-04-08 19:24:29
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.12
|
2014-04-08 19:24:49
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
|
2014-04-08 19:24:50
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
|
2014-04-08 19:24:50
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
|
2014-04-08 19:24:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
|
2014-04-08 19:24:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
|
2014-04-08 19:24:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
|
2014-04-08 19:24:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
|
2014-04-08 19:24:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
|
2014-04-08 19:24:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5197: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 414
|
2014-04-08 19:25:09
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 635.799988
|
2014-04-08 19:25:11
|
<Jamesofur>
|
kicks icinga-wm
|
2014-04-08 19:26:39
|
<icinga-wm>
|
PROBLEM - DPKG on elastic1015 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
|
2014-04-08 19:28:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 19:29:38
|
<matanya>
|
huh: it is being fixed by ops
|
2014-04-08 19:31:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 19:36:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 19:37:49
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:37:49
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:37:50
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:37:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:37:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:37:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:37:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:37:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:37:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:38:00
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:38:07
|
<huh>
|
again?
|
2014-04-08 19:38:09
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 19:38:10
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:38:10
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:38:10
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1016 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.13
|
2014-04-08 19:38:10
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:38:29
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1894: active_shards: 5204: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 407
|
2014-04-08 19:38:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 224.199997
|
2014-04-08 19:39:39
|
<icinga-wm>
|
RECOVERY - DPKG on elastic1015 is OK: All packages OK
|
2014-04-08 19:40:19
|
<manybubbles>
|
oh shut up
|
2014-04-08 19:40:52
|
<manybubbles>
|
I'm doing rolling restarts
|
2014-04-08 19:41:47
|
<manybubbles>
|
got it: labswiki_content_1394813391
|
2014-04-08 19:41:53
|
<manybubbles>
|
that thing is configured without replicas
|
2014-04-08 19:46:40
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 341.066681
|
2014-04-08 19:48:00
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:01
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:01
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1001 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:01
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:10
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:10
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1013 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:10
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:10
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1016 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:30
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1015 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:43
|
<manybubbles>
|
and, more noise!
|
2014-04-08 19:48:49
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:49
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:49
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:48:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5308: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 303
|
2014-04-08 19:48:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5308: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 303
|
2014-04-08 19:48:59
|
<icinga-wm>
|
PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 1894: active_shards: 5308: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 303
|
2014-04-08 19:49:22
|
<manybubbles>
|
bit me labswiki!
|
2014-04-08 19:52:34
|
<bd808|LUNCH>
|
cheers manybubbles on
|
2014-04-08 19:52:53
|
<manybubbles>
|
it'll spam us again in a few minutes
|
2014-04-08 19:52:59
|
<manybubbles>
|
labswiki recovered a long time ago
|
2014-04-08 19:53:05
|
<manybubbles>
|
it was only out for ~30 seconds each time
|
2014-04-08 19:53:20
|
<manybubbles>
|
but ganglia wants all the shards on all the wikis to be recovered before it is happy
|
2014-04-08 19:53:59
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1005 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:53:59
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:53:59
|
<icinga-wm>
|
RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0
|
2014-04-08 19:56:15
|
<manybubbles>
|
!log upgraded all elasticsearch servers except elastic1008. that is coming now.
|
2014-04-08 19:56:20
|
<morebots>
|
Logged the message, Master
|
2014-04-08 19:58:20
|
<manybubbles>
|
!log finished upgrading to Elasticsearch 1.1.0. The process went well with no issues other then some knocking out search in labs 3 times for 30 seconds a piece. And logging lots of nasty warnings to irc. I've started to the process to fix search in labs so it won't happen again.
|
2014-04-08 19:58:25
|
<morebots>
|
Logged the message, Master
|
2014-04-08 20:05:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 420.066681
|
2014-04-08 20:08:09
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 539.900024
|
2014-04-08 20:10:29
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 20:10:29
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 20:10:29
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 20:10:29
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 20:10:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 20:12:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 20:16:56
|
<se4598>
|
Does someone here know about dns issues with wmflabs-domains or related stuff that happened recently?
|
2014-04-08 20:19:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 20:20:41
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 176.399994
|
2014-04-08 20:22:09
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 20:26:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 368.466675
|
2014-04-08 20:28:02
|
<cajoel>
|
re:heartbleed, I think we'll be wanting a new corp certificate... do you guys have a favorite vendor for star certs these days?
|
2014-04-08 20:28:21
|
<cajoel>
|
it's almost due for a re-up anyway, so it's worth the effort
|
2014-04-08 20:29:53
|
<ebernhardson>
|
r
|
2014-04-08 20:48:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 642.700012
|
2014-04-08 20:51:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 20:51:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 20:52:09
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 537.099976
|
2014-04-08 20:59:46
|
<odder>
|
greg-g: don't believe you
|
2014-04-08 20:59:58
|
<odder>
|
http://lists.wikimedia.org/pipermail/wikitech-ambassadors/2014-April/000666.html
|
2014-04-08 21:00:04
|
<odder>
|
This is the work of the Beast
|
2014-04-08 21:00:11
|
<bd808>
|
greg-g: Do you still want to try group1 to 1.23wmf21 today or have we had enough excitement?
|
2014-04-08 21:00:53
|
<apergos>
|
reminds folks that all ops are out at a bar except for those who are about to go to sleep :-D
|
2014-04-08 21:01:06
|
<greg-g>
|
bd808: we're back to "if you run scap, run it twice" world, right?
|
2014-04-08 21:01:10
|
<greg-g>
|
apergos: :)
|
2014-04-08 21:01:23
|
<greg-g>
|
odder: which part? :)
|
2014-04-08 21:01:36
|
<bd808>
|
greg-g: Yes, but for group1 to 1.23wmf21 we only need to run sync-wikiversions
|
2014-04-08 21:01:49
|
<greg-g>
|
right
|
2014-04-08 21:02:09
|
<greg-g>
|
the world looks sane on phase0?
|
2014-04-08 21:02:11
|
<greg-g>
|
looks
|
2014-04-08 21:02:34
|
<odder>
|
greg-g: all of it - notice the number immediately preceding .html
|
2014-04-08 21:02:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 232.46666
|
2014-04-08 21:02:48
|
<greg-g>
|
odder: haha
|
2014-04-08 21:03:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 21:03:54
|
<greg-g>
|
this is neat: https://graphite.wikimedia.org/render/…
|
2014-04-08 21:04:36
|
<greg-g>
|
I think that's what ori told me yesterdayt to not worry about
|
2014-04-08 21:05:09
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 21:05:25
|
<greg-g>
|
bd808: if we do, we do now, so we have 2 hours before SWAT of settle bug report time. May I take your whole day?
|
2014-04-08 21:06:36
|
<bd808>
|
greg-g: I'm yours to command. :)
|
2014-04-08 21:06:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 269.866669
|
2014-04-08 21:06:42
|
<odder>
|
http://heartbleed.com/
|
2014-04-08 21:06:48
|
<odder>
|
Q&A
|
2014-04-08 21:06:55
|
<odder>
|
:-P
|
2014-04-08 21:07:09
|
<greg-g>
|
bd808: go forth, please
|
2014-04-08 21:09:36
|
<grrrit-wm>
|
('PS1') 'BryanDavis': Group1 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124744'
|
2014-04-08 21:11:12
|
<grrrit-wm>
|
('CR') 'BryanDavis': [C: '2'] Group1 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124744' (owner: 'BryanDavis')
|
2014-04-08 21:11:20
|
<grrrit-wm>
|
('Merged') 'jenkins-bot': Group1 wikis to 1.23wmf21 [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124744' (owner: 'BryanDavis')
|
2014-04-08 21:12:17
|
<logmsgbot>
|
!log bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.23wmf21
|
2014-04-08 21:12:23
|
<morebots>
|
Logged the message, Master
|
2014-04-08 21:12:47
|
<hoo>
|
greg-g: Have you guys already killed all user sessions?
|
2014-04-08 21:12:52
|
<hoo>
|
Can't see a server admin log entry
|
2014-04-08 21:15:44
|
<odder>
|
greg-g: I did a https://commons.wikimedia.org/wiki/Commons:Village_pump#Users_are_being_forced_to_log_out
|
2014-04-08 21:18:21
|
<Jamesofur>
|
Thanks odder, I left a note about it on en VPT since I saw a question about the bug in general
|
2014-04-08 21:18:48
|
<odder>
|
Maybe I'll cross-post that to Meta too
|
2014-04-08 21:19:59
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 21:20:14
|
<logmsgbot>
|
!log bd808 Purged l10n cache for 1.23wmf18
|
2014-04-08 21:20:19
|
<morebots>
|
Logged the message, Master
|
2014-04-08 21:21:46
|
<logmsgbot>
|
!log bd808 Purged l10n cache for 1.23wmf19
|
2014-04-08 21:21:50
|
<morebots>
|
Logged the message, Master
|
2014-04-08 21:21:54
|
<greg-g>
|
hoo: in process
|
2014-04-08 21:22:55
|
<hoo>
|
:)
|
2014-04-08 21:23:09
|
<greg-g>
|
hoo: it takes longer than you'd imagine, maybe :)
|
2014-04-08 21:23:37
|
<bd808|deploy>
|
greg-g: group1 to 1.23wmf21 is {{done}}
|
2014-04-08 21:23:40
|
<se4598>
|
greg-g: just change the cookie name? (like last time)
|
2014-04-08 21:24:09
|
<greg-g>
|
se4598: I'm defering to chris on it (not sure what his exact process is, honestly)
|
2014-04-08 21:24:14
|
<greg-g>
|
bd808|deploy: ty
|
2014-04-08 21:24:53
|
<se4598>
|
mh, the tokens will be still valid I think, wasn't a good idea
|
2014-04-08 21:25:14
|
<bd808>
|
se4598: Yeah I think that's why it takes a while
|
2014-04-08 21:26:45
|
<hoo>
|
greg-g: Well given how many users we have and that we probably don't want to hammer the DBs to much, I can imagine this to take some time
|
2014-04-08 21:26:52
|
<greg-g>
|
nods
|
2014-04-08 21:28:16
|
<hoo>
|
csteipp: Why not run one process per shard?
|
2014-04-08 21:29:24
|
<odder>
|
Jamesofur: if you're keeping track of things, I alerted Commons and Meta; perhaps someone would need to alert the other big Wikipedias
|
2014-04-08 21:29:35
|
<odder>
|
Dunno if the message to tech-ambassadors will be enough; may be.
|
2014-04-08 21:30:35
|
<grrrit-wm>
|
('PS2') 'MaxSem': Put a safeguard on GeoData's usage of CirrusSearch [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121874'
|
2014-04-08 21:30:37
|
<grrrit-wm>
|
('PS1') 'MaxSem': Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747'
|
2014-04-08 21:30:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 21:30:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 535.0
|
2014-04-08 21:30:54
|
<grrrit-wm>
|
('CR') 'jenkins-bot': [V: '-1'] Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747' (owner: 'MaxSem')
|
2014-04-08 21:31:32
|
<csteipp>
|
se4598: Assuming attacker has the login token, they could use the new name and again spoof the user
|
2014-04-08 21:31:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 21:31:46
|
<grrrit-wm>
|
('PS2') 'MaxSem': Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747'
|
2014-04-08 21:32:09
|
<Jamesofur>
|
odder: yeah, I'll see if we can poke people, we're going to send out SM messages as well in a couple minutes
|
2014-04-08 21:32:19
|
<Jamesofur>
|
with a recommendation to password reset
|
2014-04-08 21:33:09
|
<odder>
|
SM?
|
2014-04-08 21:33:22
|
<Jamesofur>
|
sorry, Social Media (Twitter/Facebook/G+ etc)
|
2014-04-08 21:33:42
|
<odder>
|
TMA, Too Many Abbreviations
|
2014-04-08 21:33:45
|
<odder>
|
:)
|
2014-04-08 21:33:59
|
<Jamesofur>
|
yup lol
|
2014-04-08 21:34:09
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 539.133362
|
2014-04-08 21:34:10
|
<Jamesofur>
|
I abuse them, I even make up my own and forget that they are just in my head
|
2014-04-08 21:34:23
|
<HaeB>
|
https://twitter.com/Wikimedia/status/453646877397757953
|
2014-04-08 21:34:49
|
<JohnLewis>
|
Jamesofur: EUS IAA. TA IANAL.
|
2014-04-08 21:34:58
|
<JohnLewis>
|
*EYS :p
|
2014-04-08 21:35:42
|
<odder>
|
thanks HaeB, retweeted
|
2014-04-08 21:40:46
|
<aude>
|
woah, new code on wikidata?
|
2014-04-08 21:40:46
|
<matanya>
|
Jamesofur: using mass-message might be a good idea
|
2014-04-08 21:41:15
|
<greg-g>
|
aude: yep, all ok?
|
2014-04-08 21:41:26
|
<Jamesofur>
|
HaeB: ^ what do you think? (about MM)
|
2014-04-08 21:41:48
|
<greg-g>
|
wdyt?
|
2014-04-08 21:42:08
|
<JohnLewis>
|
greg-g: itjdi
|
2014-04-08 21:42:12
|
<aude>
|
so we're confident?
|
2014-04-08 21:42:39
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 187.866669
|
2014-04-08 21:42:53
|
<greg-g>
|
aude: in that it won't break at 2:00 utc? yeah
|
2014-04-08 21:43:06
|
<greg-g>
|
aude: the only thing we're still not confident about is scap on thursday
|
2014-04-08 21:44:19
|
<aude>
|
alright
|
2014-04-08 21:44:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 21:44:40
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3011 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 320.200012
|
2014-04-08 21:44:55
|
<HaeB>
|
Jamesofur, matanya: i think for the session ending, massmessage would be overkill. regarding the password reset, it's a judgment call (how high one estimates the risk for users who don't change it)
|
2014-04-08 21:45:24
|
<matanya>
|
HaeB: it depends on user rights as well
|
2014-04-08 21:45:27
|
<bd808>
|
aude: The bug that caused all the 1.23wmf21 l10n issues is https://bugzilla.wikimedia.org/show_bug.cgi?id=63659
|
2014-04-08 21:46:31
|
<HaeB>
|
are there any other major sites who notified all users?
|
2014-04-08 21:46:54
|
<Jamesofur>
|
not that I've seen yet, but I have a feeling some are still going through the fixing process
|
2014-04-08 21:46:55
|
<aude>
|
interesting
|
2014-04-08 21:46:59
|
<HaeB>
|
(to recommend a password chanage)
|
2014-04-08 21:47:10
|
<hoo>
|
eg. just got stuff from CloudBees
|
2014-04-08 21:47:15
|
<hoo>
|
github also logged me out
|
2014-04-08 21:47:37
|
<HaeB>
|
would also be interesting to know how quick the wikis were fixed after the news broke yesterday
|
2014-04-08 21:47:40
|
<Jamesofur>
|
latimes has an article about resetting your password, but that's different
|
2014-04-08 21:48:09
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 21:48:13
|
<HaeB>
|
last night (PT) i filed a RT ticket for the blog, which was vulnerable at the time, but at that point the wikis tested ok already
|
2014-04-08 21:48:36
|
<hoo>
|
The wikis auto update OpenSSL via puppet
|
2014-04-08 21:49:00
|
<Jamesofur>
|
hoo: well ya ;) the question is when we updated puppet ;)
|
2014-04-08 21:49:24
|
<hoo>
|
Jamesofur: The servers do that themselves
|
2014-04-08 21:49:39
|
<HaeB>
|
per https://wikitech.wikimedia.org/wiki/Server_admin_log , the blog (holmium) was pretty late in the game
|
2014-04-08 21:49:50
|
<bd808>
|
The timeline is all in SAL from last night
|
2014-04-08 21:49:51
|
<hoo>
|
Yesterday I posted about that to the internal ops list, but forgot to poke a root to do a apt-cache clean and force puppet run
|
2014-04-08 21:50:08
|
<HaeB>
|
"04:03 Tim: upgrading libssl on ssl1001,ssl1002,ssl1003,ssl1004,ssl1005,ssl1006,ssl1007,ssl1008,ssl1009,ssl3001.esams.wikimedia.org,ssl3002.esams.wikimedia.org,ssl3003.esams.wikimedia.org" - is that the entry for the wikis?
|
2014-04-08 21:50:37
|
<bd808>
|
Mostly yes
|
2014-04-08 21:53:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 21:53:39
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3011 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 21:53:59
|
<icinga-wm>
|
RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000
|
2014-04-08 21:54:55
|
<grrrit-wm>
|
('PS1') 'Jean-Frédéric': Add Musées de la Haute-Saône to wgCopyUploadsDomains [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124754'
|
2014-04-08 22:01:11
|
<mwalker>
|
greg-g, poking you because I'm not sure who's on point for the i18n / scap stuff -- but I recall getting pinged a couple of days ago (on a centralnotice keyword) saying that the i18n update was failing due to exceptions on CN (and others). I'm wondering if CN's fail was due to being on a deployment branch that did not have the JSON updates (until just now).
|
2014-04-08 22:01:46
|
<greg-g>
|
shouldn't be
|
2014-04-08 22:01:57
|
<greg-g>
|
there's backward compat in l10nupdate
|
2014-04-08 22:02:17
|
<greg-g>
|
mwalker: see https://bugzilla.wikimedia.org/show_bug.cgi?id=63659 for all the gorey details
|
2014-04-08 22:02:33
|
<mwalker>
|
puts on tyvek suit
|
2014-04-08 22:02:38
|
<greg-g>
|
:)
|
2014-04-08 22:30:59
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 22:33:06
|
<csteipp>
|
greg-g: Could I push a small centralauth update soon?
|
2014-04-08 22:33:44
|
<greg-g>
|
yeah, now is fine, 30 minutes until swat
|
2014-04-08 22:34:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:36:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:37:04
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 34.533333
|
2014-04-08 22:37:34
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 260.733337
|
2014-04-08 22:38:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:40:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:42:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:44:14
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 625.166687
|
2014-04-08 22:44:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:45:36
|
<se4598>
|
marktraceur: I see in deploy-calendar that you have changeset which especially activates MediaViewer on en-beta. You(r pc) may get hit by https://bugzilla.wikimedia.org/show_bug.cgi?id=63709
|
2014-04-08 22:46:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:47:22
|
<marktraceur>
|
se4598: Is there a fix?
|
2014-04-08 22:47:50
|
<marktraceur>
|
I'm guessing it's an SSL problem
|
2014-04-08 22:48:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:48:43
|
<marktraceur>
|
se4598: Replied on bug
|
2014-04-08 22:49:09
|
<grrrit-wm>
|
('PS1') 'BryanDavis': Create symlink for compile-wikiversions in /usr/local/bin [operations/puppet] - 'https://gerrit.wikimedia.org/r/124763'
|
2014-04-08 22:49:23
|
<se4598>
|
marktraceur: We in #wikimedia-labs haven't one. And thats not about https but dns resolve, so I don't understand what do you mean by https?
|
2014-04-08 22:49:35
|
<marktraceur>
|
Oh, hm
|
2014-04-08 22:49:37
|
<marktraceur>
|
Never mind, sorry
|
2014-04-08 22:50:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:52:04
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 22:52:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:52:34
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 22:52:56
|
<se4598>
|
marktraceur: currently the fix is.....: it may work if you try multiple times or wait some time (minutes, hours) ;P
|
2014-04-08 22:54:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:56:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:56:54
|
<icinga-wm>
|
RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000
|
2014-04-08 22:58:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 10:30:08 PM UTC
|
2014-04-08 22:58:41
|
<hoo>
|
greg-g: csteipp: got both core changes ready
|
2014-04-08 22:58:53
|
<hoo>
|
I mean changes to the deploy branch
|
2014-04-08 22:59:52
|
<csteipp>
|
hoo: Cool.. one sec and I'll merge and deploy it
|
2014-04-08 23:00:12
|
<hoo>
|
I can also jump in, am on tin still anyway
|
2014-04-08 23:00:14
|
<icinga-wm>
|
RECOVERY - Puppet freshness on mw1109 is OK: puppet ran at Tue Apr 8 23:00:04 UTC 2014
|
2014-04-08 23:02:24
|
<icinga-wm>
|
PROBLEM - Puppet freshness on mw1109 is CRITICAL: Last successful Puppet run was Tue 08 Apr 2014 11:00:04 PM UTC
|
2014-04-08 23:05:24
|
<greg-g>
|
stupid puppet
|
2014-04-08 23:06:33
|
<Jasper_Deng>
|
always wondered what Puppet does anyways
|
2014-04-08 23:07:09
|
<Jamesofur>
|
pulls the strings ;)
|
2014-04-08 23:07:20
|
<Jamesofur>
|
(or, probably better 'is the strings' )
|
2014-04-08 23:07:26
|
<hoo>
|
Jasper_Deng: Playing with the servers :D
|
2014-04-08 23:08:20
|
<JohnLewis>
|
Technically, the sysadmins are a puppet in the WMFs plans, right? :p
|
2014-04-08 23:08:37
|
<logmsgbot>
|
!log csteipp synchronized php-1.23wmf21/extensions/CentralAuth/maintenance 'Push maintenance script for token reset'
|
2014-04-08 23:08:39
|
<Jamesofur>
|
or we're all just puppets in their plans, duh
|
2014-04-08 23:08:41
|
<morebots>
|
Logged the message, Master
|
2014-04-08 23:09:04
|
<JohnLewis>
|
Jamesofur: You're the past of the puppets :p
|
2014-04-08 23:09:09
|
<JohnLewis>
|
*master of the
|
2014-04-08 23:09:57
|
<csteipp>
|
greg-g: CentralAuth updates are out, so swat can go ahead if they were waiting on me
|
2014-04-08 23:10:01
|
<Jamesofur>
|
;) the user with said name may dislike me claiming the title
|
2014-04-08 23:10:40
|
<greg-g>
|
mwalker: ori ebernhardson ^
|
2014-04-08 23:10:46
|
<greg-g>
|
also, what the heck, oit_display ?
|
2014-04-08 23:10:54
|
<greg-g>
|
:)
|
2014-04-08 23:11:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 23:11:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 23:11:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 23:11:10
|
<icinga-wm>
|
PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC
|
2014-04-08 23:11:51
|
<mwalker>
|
oh
|
2014-04-08 23:11:54
|
<mwalker>
|
yes; it's 4!
|
2014-04-08 23:13:25
|
<Danny_B>
|
SUL doesn't work?
|
2014-04-08 23:14:02
|
<mwalker>
|
csteipp, ^
|
2014-04-08 23:14:03
|
<hoo>
|
Danny_B: We are logging out all users
|
2014-04-08 23:14:10
|
<hoo>
|
see http://lists.wikimedia.org/pipermail/wikitech-ambassadors/2014-April/000666.html
|
2014-04-08 23:14:32
|
<MaxSem>
|
csteipp, warn ppl with a site notice?
|
2014-04-08 23:14:35
|
<se4598>
|
hoo: you know that this isn't merged? https://gerrit.wikimedia.org/r/124756
|
2014-04-08 23:15:00
|
<hoo>
|
se4598: not this important at the very moments
|
2014-04-08 23:15:03
|
<hoo>
|
* moment
|
2014-04-08 23:15:23
|
<csteipp>
|
Danny_B: SUL should work... You should just be logged out. If you can't login, let me know
|
2014-04-08 23:15:53
|
<Jamesofur>
|
csteipp: will we get logged out each time you hit a wiki we've visited recently? or just the once per user in theory
|
2014-04-08 23:16:15
|
<csteipp>
|
If you're a global user, just once (right now as I logout all the centralauth users)
|
2014-04-08 23:16:32
|
<csteipp>
|
If you have multiple ununified local accounts, each will get logged out
|
2014-04-08 23:16:51
|
<Danny_B>
|
csteipp: i have to log in on every single project although i have central username
|
2014-04-08 23:16:54
|
<Amgine>
|
<grumbles about that><waves fist impotently at it.wp>
|
2014-04-08 23:17:30
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 135.300003
|
2014-04-08 23:17:55
|
<mwalker>
|
marktraceur, MaxSem I'm going to +2 and confirm https://gerrit.wikimedia.org/r/#/c/124036/2 , https://gerrit.wikimedia.org/r/#/c/121874/2 , https://gerrit.wikimedia.org/r/#/c/124747/
|
2014-04-08 23:18:30
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 173.666672
|
2014-04-08 23:18:32
|
<mwalker>
|
it would be wonderful if you all could +1 that so that I know you've looked and said this is good to me
|
2014-04-08 23:18:35
|
<marktraceur>
|
'kay
|
2014-04-08 23:18:53
|
<Danny_B>
|
csteipp: +1 to notice ppl with central notice
|
2014-04-08 23:18:57
|
<grrrit-wm>
|
('CR') 'MarkTraceur': [C: ''] Add setting to show a survey for MediaViewer users on some sites [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036' (owner: 'Gerg? Tisza')
|
2014-04-08 23:19:00
|
<MaxSem>
|
+1 ourselves?
|
2014-04-08 23:19:16
|
<MaxSem>
|
doesn't sound very assuring:)
|
2014-04-08 23:19:21
|
<mwalker>
|
nah; you're probably OK MaxSem :p
|
2014-04-08 23:19:27
|
<mwalker>
|
but I don't know who Gergo is
|
2014-04-08 23:19:44
|
<mwalker>
|
but mark was sponsoring the patch
|
2014-04-08 23:19:53
|
<MaxSem>
|
he's tgr :P
|
2014-04-08 23:20:00
|
<grrrit-wm>
|
('CR') 'Mwalker': [C: '2'] Put a safeguard on GeoData's usage of CirrusSearch [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/121874' (owner: 'MaxSem')
|
2014-04-08 23:20:08
|
<grrrit-wm>
|
('CR') 'Mwalker': [C: '2'] Enable $wgGeoDataDebug on labs [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124747' (owner: 'MaxSem')
|
2014-04-08 23:20:21
|
<grrrit-wm>
|
('CR') 'Mwalker': [C: '2'] Add setting to show a survey for MediaViewer users on some sites [operations/mediawiki-config] - 'https://gerrit.wikimedia.org/r/124036' (owner: 'Gerg? Tisza')
|
2014-04-08 23:20:27
|
<ori>
|
greg-g: missed your ping; still need me?
|
2014-04-08 23:21:00
|
<greg-g>
|
dont think so
|
2014-04-08 23:23:33
|
<mwalker>
|
interesting; sync-common doesn't log to IRC?
|
2014-04-08 23:23:34
|
<csteipp>
|
Danny_B: That doesn't sound right.. At the risk of sounding cliche, can you log out and log back in, and see if that helps?
|
2014-04-08 23:23:55
|
<mwalker>
|
marktraceur, MaxSem can you tell if your configuration stuff got pushed?
|
2014-04-08 23:24:15
|
<MaxSem>
|
mwalker, mine's noop on prod
|
2014-04-08 23:24:25
|
<marktraceur>
|
Ditto, but will check on beta
|
2014-04-08 23:24:26
|
<MaxSem>
|
checking if prod still works...
|
2014-04-08 23:24:35
|
<mwalker>
|
also; marktraceur I presume you want https://gerrit.wikimedia.org/r/#/c/124510/ to go to wmf20 and wmf21?
|
2014-04-08 23:24:38
|
<HaeB>
|
Danny_B, hoo : we're still thinking about massmessage instead (more for the password changing advice)
|
2014-04-08 23:24:43
|
<marktraceur>
|
mwalker: Sorry, only 21
|
2014-04-08 23:25:24
|
<marktraceur>
|
mwalker: Confirmed, beta has the configuration we wanted
|
2014-04-08 23:26:36
|
<MaxSem>
|
mwalker, lgtm
|
2014-04-08 23:27:40
|
<icinga-wm>
|
PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: No output from Graphite for target(s): reqstats.5xx
|
2014-04-08 23:28:34
|
<Danny_B>
|
csteipp: log out from any currently logged project, log back to it and then try if sul works on other?
|
2014-04-08 23:29:14
|
<csteipp>
|
Danny_B: Yeah
|
2014-04-08 23:29:22
|
<Danny_B>
|
csteipp: ok, sec
|
2014-04-08 23:29:38
|
<csteipp>
|
Hmm... Danny_B What's you're wiki username?
|
2014-04-08 23:30:51
|
<icinga-wm>
|
RECOVERY - Puppet freshness on mw1109 is OK: puppet ran at Tue Apr 8 23:30:43 UTC 2014
|
2014-04-08 23:30:55
|
<Danny_B>
|
csteipp: Danny B.
|
2014-04-08 23:31:17
|
<Danny_B>
|
csteipp: seems to work now, will let you know if i'll spot another disconnection
|
2014-04-08 23:31:27
|
<csteipp>
|
Danny_B: Cool, thanks
|
2014-04-08 23:32:03
|
<Danny_B>
|
yw
|
2014-04-08 23:32:15
|
<Danny_B>
|
thanks for care
|
2014-04-08 23:33:30
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 23:34:31
|
<logmsgbot>
|
!log mwalker synchronized php-1.23wmf21/extensions/MultimediaViewer/ 'Updating MultimediaViewer for {{gerrit|124510}}'
|
2014-04-08 23:34:35
|
<morebots>
|
Logged the message, Master
|
2014-04-08 23:35:16
|
<mwalker>
|
marktraceur, ^ if you would test what you need to test for that
|
2014-04-08 23:35:26
|
<mwalker>
|
I'm not seeing any fatals or exceptions which is good :)
|
2014-04-08 23:35:31
|
<icinga-wm>
|
RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0
|
2014-04-08 23:35:32
|
<marktraceur>
|
mwalker: Works
|
2014-04-08 23:35:32
|
<marktraceur>
|
Ta
|
2014-04-08 23:35:39
|
<mwalker>
|
cool; greg-g SWAT done
|
2014-04-08 23:58:30
|
<icinga-wm>
|
PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 179.666672
|
2014-04-08 23:59:04
|
<jackmcbarn>
|
"Firefox can't find the server at en.wikipedia.beta.wmflabs.org."
|
2014-04-08 23:59:08
|
<jackmcbarn>
|
why?
|
2014-04-08 23:59:14
|
<grrrit-wm>
|
('CR') 'Aaron Schulz': [C: ''] Create symlink for compile-wikiversions in /usr/local/bin [operations/puppet] - 'https://gerrit.wikimedia.org/r/124763' (owner: 'BryanDavis')
|
2014-04-08 23:59:31
|
<marktraceur>
|
jackmcbarn: https://bugzilla.wikimedia.org/show_bug.cgi?id=63709 probably
|