[00:09:08] about to sync-dir [00:09:35] !log ori synchronized php-1.23wmf8/extensions/WikimediaEvents 'Ibe918eb96: Update WikimediaEvents to master for Ia6f758068 (Log event when new editors reach edit milestone)' [00:09:53] Logged the message, Master [00:10:15] !log ori synchronized php-1.23wmf7/extensions/WikimediaEvents 'Ibe918eb96: Update WikimediaEvents to master for Ia6f758068 (Log event when new editors reach edit milestone)' [00:10:31] Logged the message, Master [00:11:09] Reedy, do you know why fluorine still doesn't have the new bug log file? Do i need to do any extra steps? [00:16:52] Need to get someone to touch it [00:18:01] yurik: i can touch an empty file for you if you tell me exactly what and thats all you need right now [00:19:44] mutante, they don't auto-create? Bug58676.log [00:20:00] thx! [00:20:27] yurik: no idea, where is that on fluorine? [00:20:41] /a/mw-log [00:21:25] mutante, capitalized i think [00:21:43] aa..really i almost asked [00:21:52] and removed capitalization [00:22:51] !log create /a/mw-log/Bug58676.log for yurik, touch, chown udplog ..etc [00:22:55] yurik: done [00:23:05] Logged the message, Master [00:23:12] thx mutante ! now if only i can see some messages in it ... :) [00:23:53] no problem, actually you should hope for NO messages in a bug log,no?:) i didnt read the bug yet:) [00:23:57] do you know where all the generic stuff from debug logs should go? [00:24:19] not really, using the bug number seemed good for now if it's special [00:24:20] if they don't match specific name [00:24:32] i dont know of any schemas here [00:24:38] oki, thx! [00:25:29] Invalid message parameter .. i see [00:25:33] well good luck with debugging then [00:25:40] heh :) [00:26:27] i see some coming in yurik [00:26:32] you got those now, right [00:26:48] mutante, no, tail is blank [00:26:51] or should i paste [00:27:07] please paste (unless private of course :)) [00:27:24] i just see wiki page names and id numbers and mw host numbers [00:27:31] hold on [00:27:37] should be ok... as long as you don't see cookies ;) [00:28:24] mutante, where do you see them? [00:28:33] yurik: it should NOT be capitalized [00:28:35] thats why [00:28:50] because i happened to first not capitalize it and then moved it [00:28:57] ?? that's strange: https://gerrit.wikimedia.org/r/#/c/102604/2/includes/Message.php [00:28:58] the first one caught those in the seconds in between [00:29:13] there it uses capitalization [00:29:28] checking tin... [00:29:32] yurik: http://paste.debian.net/71862/ [00:29:55] duh, wrong bug? [00:29:59] mutante, yep :) [00:30:01] sorry [00:30:33] well then, yea, it's empty [00:30:40] and capitalized [00:30:58] heh... maybe its a randomness factor of sorts? like there is some throtling... doubtful [00:31:31] yurik: fwiw, one capitalized, 2 are not .. so shrug [00:31:39] of other bug log files like that [00:31:53] yes, i saw them, but only after the patch went live [00:31:58] would have made it noncap [00:34:04] or your bug is just not happening that often [00:38:50] well, for completeness yurik [00:38:56] 0 -rw-r--r-- 1 udp2log udp2log 0 Dec 20 00:21 Bug58676.log [00:39:12] in case somebody sees later i made a mistake or whatever [00:40:25] thanks mutante , maybe something else is missing, or it will work after some cron refresh [00:41:25] greg-g: OK, we're ready. It's just three files to sync to 1.23wmf8, OK to do it now? [00:42:46] spagewmf: i need to do another two sync-dirs; i need another moment, please. [00:43:12] ori_ np, brings back happy memories of EventLogging tweaks :) [00:43:30] spagewmf: yeah [00:45:05] jenkins jenkins jenkins jenkins jenkins jenkins jenkins [00:46:05] ori_ do api 500 errors show up in the "MediaWiki errors last 2hr" graph? [00:47:48] !log ori synchronized php-1.23wmf7/extensions/WikimediaEvents 'I7cad8fd35fd: Update WikimediaEvents to master' [00:48:05] Logged the message, Master [00:48:31] !log ori synchronized php-1.23wmf8/extensions/WikimediaEvents 'I7cad8fd35fd: Update WikimediaEvents to master' [00:48:38] spagewmf: done, sorry for the holdup [00:48:49] Logged the message, Master [00:49:00] spagewmf: do they result in PHP exceptions getting thrown? I forget [00:50:52] ori_: I'm not sure either. fluorine's api.log just has the request, not the error status. I'll look into it later today [00:52:47] spagewmf: well, I think exceptions thrown in PHP while serving an API request get logged to the exception log; they are not special. fatals ditto. but a 500 response could be coming from varnish, too, in which case it won't be in either log. [00:54:01] for example, if the backend server is slow to process the request, varnish will eventually give up, and issue a 500 response to the user [00:54:34] but the request could still complete successfully. it just won't make it to the user. [00:57:36] (03PS1) 10Tim Landscheidt: Fix quoting error in generic_vhost.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/102862 [00:57:44] RoanKattouw_away: fyi, trying to unblock a ticket you created on Wed Dec 12 00:49:54 2012 :) [00:57:53] about the times being off on LVS [01:00:28] (03PS2) 10Tim Landscheidt: Fix quoting error in generic_vhost.erb [operations/puppet] - 10https://gerrit.wikimedia.org/r/102862 [01:06:58] !log spage synchronized php-1.23wmf8/extensions/Echo/formatters/BasicFormatter.php 'Bug 58705 notification fix' [01:07:16] Logged the message, Master [01:08:32] !log spage synchronized php-1.23wmf8/extensions/Echo/includes/EmailFormatter.php 'Bug 58705 notification fix pt.2' [01:08:49] Logged the message, Master [01:10:50] (03CR) 10Ori.livneh: "Reviewers: please wait for me to be around before merging, so I can verify log parsing of the updated format on vanadium." [operations/puppet] - 10https://gerrit.wikimedia.org/r/102817 (owner: 10Ori.livneh) [01:12:32] !log spage synchronized php-1.23wmf8/extensions/Flow/includes/Notifications/Controller.php 'Bug 58705 notification fix pt.3' [01:12:52] Logged the message, Master [01:14:32] greg-g bug 58705 fixed, http://tools.wmflabs.org/gerrit-patch-uploader/ working. All because of i18n adding an innocent $wgLang->getDir() to support RTL notification icons [01:36:26] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [01:36:26] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [01:55:51] (03PS8) 10Springle: role and module structure for ishmael [operations/puppet] - 10https://gerrit.wikimedia.org/r/96403 (owner: 10Dzahn) [01:58:02] (03CR) 10Springle: [C: 032] "I'll watch it." [operations/puppet] - 10https://gerrit.wikimedia.org/r/96403 (owner: 10Dzahn) [02:01:58] (03PS1) 10Springle: 'vslow' should be sufficient after https://gerrit.wikimedia.org/r/100727 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102870 [02:03:13] (03CR) 10Springle: [C: 032] 'vslow' should be sufficient after https://gerrit.wikimedia.org/r/100727 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102870 (owner: 10Springle) [02:04:39] !log springle synchronized wmf-config/db-eqiad.php 'vslow query load balancing' [02:04:56] Logged the message, Master [02:16:24] !log LocalisationUpdate completed (1.23wmf7) at Fri Dec 20 02:16:24 UTC 2013 [02:20:06] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:06] PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:35:55] !log LocalisationUpdate completed (1.23wmf8) at Fri Dec 20 02:35:55 UTC 2013 [02:37:29] mutante: 6536 is too meta [02:37:32] Coren: too [02:38:47] jeremyb: The James thing? [02:39:06] yeah [02:42:32] yea, give me code review, not more reasons to slow it down and waiting periods pls [02:44:57] jeremyb: suggestions? [02:47:09] wikitech.wikimedia.org not responding... is it me? [02:48:27] spagewmf: It's not just you! http://wikitech.wikimedia.org looks down from here. [02:48:33] http://www.downforeveryoneorjustme.com/http://wikitech.wikimedia.org/ [02:48:47] sigh.. what... [02:49:05] spagewmf: checked labs channel yet. i am about to [02:50:41] http://www.downforeveryoneorjustme.com/http://wikitech.wikimedia.org/ agrees. It's pingable [02:51:32] geee..why now... [02:52:29] mutante: the ishamel puppet changes broke puppet on neon. could that be related? [02:52:44] springle: i didnt know it was merged [02:52:46] found a bug and about to commit the fix [02:53:04] springle: it doesnt sound very related at all [02:53:10] but with puppet you never know [02:53:22] no? ok. just saying :) [02:53:23] ping others, i'm looking [02:53:30] on the actual server [02:55:19] !log restarting apache on virt0, wikitech is down [02:55:49] springle: eh, back [02:55:53] spagewmf: [02:55:56] RECOVERY - HTTP on virt0 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 0.071 second response time [02:55:57] it works again [02:56:04] i dunno, apache restart [02:56:08] but it was running before too [02:56:47] springle: thanks for that, but seems entirely unrelated [02:56:55] spagewmf: confirmed working? [02:56:57] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.159 second response time [02:57:32] mutante: cool. was jfyi [02:57:45] springle: yea, thanks, i really hadnt even seen the merge yet:) [02:59:19] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Dec 20 02:59:19 UTC 2013 [03:00:06] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:00:08] mutante I thought it was OK but I tried to login again and that's hanging or very slow [03:00:16] springle: oh yea, i figured you might be interested because its ui for mkquery digest:) meant you anyways as one reviewer on it [03:01:20] mutante ^ also virt0 complaint above. [03:01:23] spagewmf: i think it's just catching up a bit, appears relatively normal to me [03:01:36] i can click around etc [03:01:57] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.166 second response time [03:01:59] hmm... yea, icinga .. grr [03:02:31] there can be a little delay there on the way to icinga-wm [03:02:39] as long as it doesnt keep doing this all the time now [03:03:03] mutante I'm in, yup was just very slow. Thanks [03:03:22] spagewmf: it might also just be that so many users are running stuff now [03:03:29] or the puppetmaster tries a whole bunch at a time [03:06:46] spagewmf: root cause [03:06:50] from /usr/lib/phusion_passenger/passenger-spawn-server:61 [03:06:55] *** Exception Errno::EPIPE in Passenger RequestHandler (Broken pipe) (process 23592): [03:07:13] bug in the passenger module it appears [03:11:22] wikitech uses Phusion Passenger unlike our other MW servers? [03:11:28] springle: fyi, i would say blame phusionpassenger.com for that [03:11:30] learn something new every day :) [03:11:46] spagewmf: well, it's not a normal wiki, it is labs [03:11:52] it has been merged into labsconsole [03:12:00] wikitech and labsconsole used to be different [03:12:02] but not anymore [03:12:17] and we have wikitech-static that reedy linked earlier for that reason [03:12:24] that we still have docs when site is down [03:12:40] we had that covered before by putting it on 3rd party VM out of band [03:17:25] (03PS1) 10Springle: Debug https://gerrit.wikimedia.org/r/96403 [operations/puppet] - 10https://gerrit.wikimedia.org/r/102874 [03:20:00] (03CR) 10Springle: [C: 032] Debug https://gerrit.wikimedia.org/r/96403 [operations/puppet] - 10https://gerrit.wikimedia.org/r/102874 (owner: 10Springle) [03:20:36] (03CR) 10Dzahn: "oh, yea, it will be just that: class { '::ishmael': , missing :: , did that before and we wondered why it recincludes the ROLE class" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102874 (owner: 10Springle) [03:22:05] springle: it would declare itself there [03:22:09] without the :: [03:22:20] we had it before in another _to module_ change [03:22:42] will just try to do role::ishmael where you think it's ishmael itself [03:23:33] ah ok [03:23:52] * springle will just get it working :) [03:24:00] awesome:) [03:26:41] (03PS1) 10Springle: Debug https://gerrit.wikimedia.org/r/96403 [operations/puppet] - 10https://gerrit.wikimedia.org/r/102875 [03:27:52] (03CR) 10Springle: [C: 032] Debug https://gerrit.wikimedia.org/r/96403 [operations/puppet] - 10https://gerrit.wikimedia.org/r/102875 (owner: 10Springle) [03:32:59] mutante: for what? :) [03:35:04] jeremyb: code review on private changes [03:35:36] mutante: uhhhh, idk what to tell you about that... [03:35:41] mutante: normal process is? [03:36:10] jeremyb: RT, git log, mail [03:36:18] jeremyb: just thought i heard criticism there [03:37:01] mutante: no, re approval process. you got referral, explicit approval, approval of the approval, etc. :) [03:37:13] seemed kinda meta/recursive :) [03:37:24] jeremyb: we were just having fun and being serious at the same time:) [03:37:28] (03CR) 10Springle: "Is there a way to get the ssl_cert in properly? puppet said:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102874 (owner: 10Springle) [03:37:32] mutante: i figured :) [03:37:47] mutante: you've seen [[list of lists of lists]]? [03:38:15] jeremyb: sounds like a mailing list run by thehelpfulone [03:38:19] that explains how to admin lists [03:38:25] we should have [03:38:39] https://en.wikipedia.org/wiki/List_of_lists_of_lists [03:38:54] hahaa, nice [03:39:17] find one in another language!! go for next level [03:39:43] i don't see one in de [03:39:44] and why arent you already on wikidata editing that there [03:40:12] i'm not editing it anywhere! [03:40:26] :) [03:42:06] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [03:44:23] https://gdash.wikimedia.org/dashboards/reqerror/ did have a spike recently [04:37:26] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [04:37:26] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [04:40:06] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [06:07:53] (03CR) 10Faidon Liambotis: "What Bryan said, let this be handled on the app side. Also, I'm pretty sure not /everything/ in the cache should be Varied on Cookie (e.g." [operations/puppet] - 10https://gerrit.wikimedia.org/r/102744 (owner: 10Ori.livneh) [06:19:16] is mark around by any chance? i'm trying to figure out the current IP ranges for Carolynne [06:19:45] for what? [13:40:26] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [13:40:26] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [13:42:46] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [13:46:09] again... [13:46:42] hah, exactly the same hour too [13:46:46] as yesterday [13:52:06] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 0.20 ms [13:52:13] !log powercycling cp1065 [13:52:29] Logged the message, Master [14:28:55] !log Jenkins setting http_proxy and https_proxy on lanthanum.eqiad.wmnet to point to carbon.wikimedia.org (eqiad web proxy) [14:29:11] Logged the message, Master [14:34:48] !log Jenkins removed http_proxy env variables from lanthanum.eqiad.wmnet does not play well with git urls :D [14:35:05] Logged the message, Master [14:59:13] (03PS1) 10Nemo bis: Relative path in varnish error message: remove excess / [operations/puppet] - 10https://gerrit.wikimedia.org/r/102945 [15:00:43] (03PS2) 10Nemo bis: Relative path in varnish error message: remove excess / [operations/puppet] - 10https://gerrit.wikimedia.org/r/102945 [15:00:58] (03CR) 10Nemo bis: "Followed up in Ie06c65f665d1d2bcca7e89152ff819da8810d379" [operations/puppet] - 10https://gerrit.wikimedia.org/r/65792 (owner: 10Mark Bergsma) [15:57:56] (03PS3) 10John F. Lewis: Redirect m.wikidata to www.wikidata.org [operations/apache-config] - 10https://gerrit.wikimedia.org/r/101787 [16:00:33] (03PS3) 10John F. Lewis: Redirect kr.wikimedia [operations/apache-config] - 10https://gerrit.wikimedia.org/r/101220 [16:13:37] !jenkins visualeditor-doitall [16:13:38] https://integration.wikimedia.org/ci/job/visualeditor-doitall [16:18:24] !log merged a couple of tiny OpenStackManager patches on wikitech [16:18:41] Logged the message, Master [16:20:41] (03CR) 10Jeremyb: [C: 04-1] "Also, needs to be ported to the new redirects system if it's ever ready on the DNS/registration side." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/88705 (owner: 10Dzahn) [16:41:26] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [16:41:26] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [18:44:16] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [19:01:56] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [19:08:49] yep, that died, on it [19:09:34] !log powercycling cp1065 [19:09:51] Logged the message, Master [19:09:52] !log cp1065 - [19060.904886] BUG: scheduling while atomic: kworker/11:0/27073/0x00000200 [19:10:10] Logged the message, Master [19:12:26] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [19:12:33] mutante: i think that's at least the 3rd time in 2 days? [19:12:52] and don't see it in RT [19:13:10] bonus, services upstart vs. salt [19:13:12] Rather than invoking init scripts through /etc/init.d, use the service(8) [19:13:15] utility, e.g. service S20salt-minion start [19:13:18] initctl: Unknown job: S20salt-minion [19:13:59] jeremyb: not seeing it in RT may explain i didnt see it either recently at least [19:14:19] mutante: did you try salt-minion instead of S20salt-minion? [19:15:03] mutante: oh, whoops. actually that's 3x in <1 day [19:15:07] jeremyb: i dont think i need to do anything more, salt-minion is actually running [19:15:19] 10:19:15 13:52:12 19:09:34 [19:15:27] so it is more annoyance in the order/output when it comes up [19:34:36] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [19:35:35] that was fast [19:39:16] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [19:42:26] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [19:42:26] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [20:05:26] that cp1065 crash has got to be hardware related at this point, imho [20:24:57] (03PS1) 10Jgreen: add cname for giantrabbit civi testing on lutetium [operations/dns] - 10https://gerrit.wikimedia.org/r/102986 [20:27:28] (03CR) 10Jgreen: [C: 032 V: 031] add cname for giantrabbit civi testing on lutetium [operations/dns] - 10https://gerrit.wikimedia.org/r/102986 (owner: 10Jgreen) [20:33:03] bah, tons of .exe on my computer because from a disk somebody wrote with windows and rsync is now busy on syncing his trashcan and system restore as well :p [20:34:31] bblack: apergos oh, it's down again? was away for a couple minutes [20:34:39] do we still care now to bring it back? [20:34:58] good question [20:35:36] those that are actively deubgging the issue should weigh in I guess [20:35:38] well or shut it down completely? [20:36:00] or bring it back but leave varnish off? and see if it dies with no load? :-) [20:36:11] pooled or unpooled? [20:36:17] can you bring it up with 3.2? [20:36:21] just in case [20:36:36] ok [20:38:48] !log powercycled cp1065 again, will upgrade kernel this time [20:39:06] Logged the message, Master [20:40:06] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [20:40:16] paravoid: all pending upgrades ok or do you want to make sure it is JUST kernel [20:40:23] didnt get to see how many yet [20:40:29] its coming back up slowly [20:41:08] just kernel please [20:42:06] PROBLEM - Varnish HTTP text-backend on cp1065 is CRITICAL: Connection refused [20:42:43] paravoid: on it, rebooting again, do i need to look at depooling at all? [20:43:18] nah [20:43:45] !cp1065 - was 3.11.0-13-generic, rebooting for 3.2.0-57-generic [20:43:56] thank you [20:43:57] !log cp1065 - was 3.11.0-13-generic, rebooting for 3.2.0-57-generic [20:44:08] it's pretty late here, sorry for not doing it myself [20:44:12] Logged the message, Master [20:44:14] no problem [20:45:46] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [20:46:21] it worked [20:47:06] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 0.41 ms [20:47:07] Linux cp1065 3.11.0-13-generic #20~precise2-Ubuntu SMP Thu Oct 24 21:04:34 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux [20:47:12] and now we shall wait [20:47:19] that's 3.11 [20:47:28] damn it is [20:48:44] but before i get to fix it even [20:48:46] [ 132.718522] BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0] [20:49:10] really? [20:49:56] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [20:50:09] sure i just need to updategrub.. but already hard to use [20:50:20] maybe you should invest in a roller coaster? [20:55:56] ok, it has one more chance now. updated grub [20:56:49] just take it out back and shoot it [20:59:07] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [21:00:39] paravoid: wth, what else besides installing package, and update-grub ..it still booted into 3.11 .. and fwiw it still just gives me a few seconds of time to change things before the next issue [21:00:56] so didnt get to grub conf again :p [21:01:46] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [21:01:46] mutante: disable varnish from starting by itself, fix kernel, reenable varnish? [21:02:30] jeremyb: worth a try,thx [21:06:14] ACKNOWLEDGEMENT - DPKG on cp1065 is CRITICAL: Timeout while attempting connection daniel_zahn kernel bug and/or hardware issue [21:06:14] ACKNOWLEDGEMENT - Disk space on cp1065 is CRITICAL: Timeout while attempting connection daniel_zahn kernel bug and/or hardware issue [21:06:14] ACKNOWLEDGEMENT - NTP on cp1065 is CRITICAL: NTP CRITICAL: No response from NTP server daniel_zahn kernel bug and/or hardware issue [21:06:14] ACKNOWLEDGEMENT - RAID on cp1065 is CRITICAL: Timeout while attempting connection daniel_zahn kernel bug and/or hardware issue [21:06:14] ACKNOWLEDGEMENT - SSH on cp1065 is CRITICAL: Connection timed out daniel_zahn kernel bug and/or hardware issue [21:06:15] ACKNOWLEDGEMENT - Varnish HTCP daemon on cp1065 is CRITICAL: Timeout while attempting connection daniel_zahn kernel bug and/or hardware issue [21:06:15] ACKNOWLEDGEMENT - Varnish HTTP text-backend on cp1065 is CRITICAL: Connection timed out daniel_zahn kernel bug and/or hardware issue [21:06:16] ACKNOWLEDGEMENT - Varnish HTTP text-frontend on cp1065 is CRITICAL: Connection timed out daniel_zahn kernel bug and/or hardware issue [21:06:16] ACKNOWLEDGEMENT - Varnish traffic logger on cp1065 is CRITICAL: Timeout while attempting connection daniel_zahn kernel bug and/or hardware issue [21:06:17] ACKNOWLEDGEMENT - puppet disabled on cp1065 is CRITICAL: Timeout while attempting connection daniel_zahn kernel bug and/or hardware issue [21:06:42] uh, Daniel also is an NTP server, who knew [21:06:49] better than Kant [21:08:06] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 0.99 ms [21:10:34] jeremyb: not fast enough 2/3 services or so:) [21:10:46] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [21:11:33] mutante: you can get to console? boot it up single user and run your grub update/etc. there? [21:14:36] jeremyb: yea, init 1 after a sandwich or so:) [21:15:57] mutante: i meant straight to single user on boot. append ' s' to kernel params [21:16:37] yea, either or, it usually gives me enough and was already coming up just now [21:17:06] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [21:19:06] PROBLEM - Varnish HTCP daemon on cp1065 is CRITICAL: Connection refused by host [21:19:56] PROBLEM - puppet disabled on cp1065 is CRITICAL: Connection refused by host [21:20:06] PROBLEM - SSH on cp1065 is CRITICAL: Connection refused [21:20:06] PROBLEM - Disk space on cp1065 is CRITICAL: Connection refused by host [21:20:06] PROBLEM - DPKG on cp1065 is CRITICAL: Connection refused by host [21:20:06] PROBLEM - RAID on cp1065 is CRITICAL: Connection refused by host [21:26:07] (03PS2) 10Yurik: Fixed a major bug with evaluation of netmapper usage [operations/puppet] - 10https://gerrit.wikimedia.org/r/102887 [21:26:34] (03PS6) 10Yurik: Handle HTTPS for Zero traffic [operations/puppet] - 10https://gerrit.wikimedia.org/r/102316 [21:35:33] (03PS2) 10Ebrahim: Use local Wiki.png for Persian Wikipedia Logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 [21:37:48] !log cp1065 - currently init 1 sitting at maintenance but not powered down, i will poke later or anyone can [21:38:06] Logged the message, Master [21:47:11] greg-g, when you say ZERO DEPLOYS on dec 23rd, are you saying we are the only ones that should deploy that week? thanks! [21:50:04] yurik: :P [21:50:16] all zero, all week [21:55:58] all the cluster HDDs will be zero'ed [21:57:12] so many interpretations :) [22:06:59] (03CR) 10Odder: [C: 031] Use local Wiki.png for Persian Wikipedia Logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/102454 (owner: 10Ebrahim) [22:07:47] (03PS1) 10Nemo bis: [Planet] Update wikimedia.fi URL [operations/puppet] - 10https://gerrit.wikimedia.org/r/103047 [22:29:22] anyone has any idea why Bug58676.log is still empty? there are plenty of them in fatalmonitor: /usr/local/apache/common-local/php-1.23wmf7/includes/Message.php on line 822 [22:43:26] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [22:43:26] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [22:44:25] ok, i'm out of IRC for now, if there is anything for me, use RT please, it's RT duty, not also IRC duty:) or memoserv is fine as stated before.. cya [22:55:09] Coren: reminder re: . If you're already in Christmas mode, it can wait until January. [22:55:59] I'm not in vacation mode yet, but that will require a bit of quiet time and concentration so I'll probably do it between xmas and the new year. [22:56:25] Totally cool, thanks very much.