[00:00:08] PROBLEM - Host analytics1020 is DOWN: CRITICAL - Plugin timed out after 15 seconds [00:01:09] ah I see dhcp now [00:01:23] on brewster [00:03:30] New patchset: GWicke; "WIP: Disable caching in Parsoid front-ends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67929 [00:08:12] RECOVERY - Host ms-fe1004 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [00:08:27] wow, nagios is lagging behind a lot [00:10:12] RECOVERY - Host analytics1020 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [00:10:23] Rats, LeslieCarr. [00:10:24] no go [00:10:32] oh ? what happened ? [00:10:35] it sat at 'scanning for devices' for a long time [00:10:40] and then just booted OS [00:14:01] New review: Asher; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67929 [00:14:32] New patchset: GWicke; "Disable caching in Parsoid front-ends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67929 [00:16:14] New review: GWicke; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67929 [00:16:59] binasher: mid-air collision ;) [00:17:15] :) [00:18:02] New patchset: Aaron Schulz; "Re-enabled TTM server message jobs." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67933 [00:18:34] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67933 [00:19:40] !log aaron synchronized wmf-config/CommonSettings.php 'Re-enabled TTM server message jobs.' [00:19:49] Logged the message, Master [00:23:46] New patchset: GWicke; "Point Parsoid updates to load balancer only" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67935 [00:25:16] New review: Tim Landscheidt; "misctools isn't installed in exec_environ.pp, so we would need another package." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66266 [00:27:36] binasher: can you review https://gerrit.wikimedia.org/r/#/c/67929/ once more? [00:28:30] sure [00:28:52] thanks! [00:28:53] * aude buy beer for whoever can deploy https://gerrit.wikimedia.org/r/#/c/67937/ for us [00:29:08] New review: Tim Starling; "It would be much faster if you combined the two into a single dsh command." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64449 [00:29:23] aude: wrong channel to ask isn't it? [00:29:33] nah [00:31:03] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67935 [00:32:00] gwicke: that looks fine - though i wonder.. if instead there's a return(pipe) in vcl_recv, it may pass the request directly to the backend with all original request headers in place. if so, no need to do a cache lookup -> miss -> set beresp.http.cache-control on every request. though i don't think avoiding the lookup matters here. [00:33:11] binasher: hmm, that sounds like a better option [00:33:37] might bring down the CPU usage from 0.1 to 0.09% ;) [00:34:08] gwicke: i'm not 100% that all headers are preserved thru to the backend in pipe mode [00:34:09] lol [00:34:26] i'd be happy to merge 67929/2 [00:34:39] I'd need to test the pipe thing a bit [00:34:49] can do that in another patch [00:34:55] !log catrope synchronized wmf-config/CommonSettings.php 'Point Parsoid updates to LVS IP' [00:35:03] Logged the message, Master [00:35:04] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67929 [00:35:27] gwicke: merged on the puppetmaster [00:35:38] binasher: awesome, thanks! [00:38:10] !log mw1020: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ [00:38:17] Logged the message, Mr. Obvious [00:43:49] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67544 [00:45:08] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67548 [00:45:59] !log aaron synchronized wmf-config/jobqueue-eqiad.php 'Added parsoid jobs to $wgJobTypesExcludedFromDefaultQueue' [00:46:06] Logged the message, Master [00:46:42] New patchset: Hazard-SJ; "Some fixes:" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67939 [00:54:54] !log catrope synchronized php-1.22wmf5/extensions/Parsoid [00:55:02] Logged the message, Master [00:55:19] !log catrope synchronized php-1.22wmf6/extensions/Parsoid [00:55:26] Logged the message, Master [00:55:43] !log catrope synchronized php-1.22wmf5/extensions/Wikibase [00:55:51] Logged the message, Master [01:00:35] James_F: it should be done to 0 and back for a while [01:00:41] till the next puppet run [01:01:00] AaronSchulz: It's meant to be dead? [01:01:23] *down to [01:01:27] 0 jobs/sec [01:02:07] James_F: https://gerrit.wikimedia.org/r/#/c/67544/ [01:02:21] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.006657838821 secs [01:02:58] AaronSchulz: Ah, right. [01:06:34] !log aaron synchronized php-1.22wmf6/includes/db/LoadMonitor.php '2e62b818a01d714b8464689eec2394a993209a70' [01:06:44] Logged the message, Master [01:06:55] binasher: mw1171 giving "Read-only file system (30)", is that known? [01:07:25] AaronSchulz: yeah, it's disabled in pybal with a comment pointing to an rt ticket [01:07:51] right, I vaguely though I saw some ticket or something [01:09:17] also SAL [01:11:17] What about mw1020? [01:11:26] And mw1173 :) [01:12:18] mw1171 is the only one I know of [01:17:50] New patchset: Faidon; "Ceph: adjust a few OSD settings" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67945 [01:18:31] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67945 [01:23:31] paravoid: can you restart all job runners? [01:24:44] why? [01:25:32] AaronSchulz: ^ [01:25:51] sorry, I'm tailing a loop :) [01:26:06] * AaronSchulz kicks Wikidata exceptions [01:26:12] hmm? [01:26:18] should I? [01:27:25] paravoid: why not :) [01:28:50] done for eqiad [01:28:53] I should do tampa too shouldn't I [01:29:08] it's late and I'm getting more lazy than usual [01:29:50] ah, not running the job loop there at all [01:29:58] I guess it makes more sense that way [01:30:06] !log restarting all jobrunners per Aaron's request [01:30:16] Logged the message, Master [01:33:11] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.01218688488 secs [01:43:56] paravoid: something is off, not sure what...more boxes should be running ParsoidCacheUpdateJob [01:44:29] anyone in the office that can help? [01:44:36] I'm about to go to bed [01:44:42] or TimStarling maybe? [01:44:51] I'm like one of the few people left here :) [01:45:00] hm [01:45:07] I think that's more of a reason to go to bed [01:45:09] paravoid: anyway, I guess it can be dealt with tomorrow [01:45:15] I did european morning today too [01:45:27] the jobs are mostly acting as load testing [01:46:02] ok, see you tomorrow [01:46:03] bye [01:46:10] see you [02:09:34] !log LocalisationUpdate completed (1.22wmf6) at Tue Jun 11 02:09:34 UTC 2013 [02:09:44] Logged the message, Master [02:13:34] hrm, definitely mistracking proc count [02:16:57] !log LocalisationUpdate completed (1.22wmf5) at Tue Jun 11 02:16:57 UTC 2013 [02:17:07] Logged the message, Master [02:30:24] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 11 02:30:24 UTC 2013 [02:30:32] Logged the message, Master [04:07:46] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:07:38 UTC 2013 [04:08:28] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:09:47] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:09:44 UTC 2013 [04:10:07] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:10:03 UTC 2013 [04:10:07] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:10:05 UTC 2013 [04:10:28] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:10:19 UTC 2013 [04:10:36] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:10:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:10:36] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:10:56] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:10:47 UTC 2013 [04:11:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:11:26] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:12:46] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:12:40 UTC 2013 [04:13:36] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:13:36] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:13:28 UTC 2013 [04:13:36] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:14:06] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:13:56 UTC 2013 [04:14:06] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:13:57 UTC 2013 [04:14:16] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:14:06 UTC 2013 [04:14:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:14:26] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:14:24 UTC 2013 [04:14:26] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:14:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:14:37] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:15:16] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:15:12 UTC 2013 [04:15:37] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:16:59] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:16:57 UTC 2013 [04:17:26] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:17:17 UTC 2013 [04:17:26] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:17:20 UTC 2013 [04:17:36] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:17:28 UTC 2013 [04:17:37] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:17:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:17:46] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:17:56] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:17:49 UTC 2013 [04:18:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:18:27] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:18:36] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:18:29 UTC 2013 [04:19:37] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:20:06] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:20:04 UTC 2013 [04:20:26] 2013-06-11 04:20:08 ParsoidCacheUpdateJob Mirror's_Edge_2 STARTING [04:20:30] don't remind me :) [04:20:36] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:20:36] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:20:30 UTC 2013 [04:20:36] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:20:31 UTC 2013 [04:20:38] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:20:46] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:20:36 UTC 2013 [04:20:46] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:21:06] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:20:59 UTC 2013 [04:21:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:21:26] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:21:46] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:21:41 UTC 2013 [04:22:39] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:23:16] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:23:11 UTC 2013 [04:23:37] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:23:37] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:23:34 UTC 2013 [04:23:37] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:23:35 UTC 2013 [04:23:47] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:23:50] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:23:44 UTC 2013 [04:24:06] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:24:01 UTC 2013 [04:24:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:24:27] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:24:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:24:46] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:24:39 UTC 2013 [04:25:36] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:26:06] RECOVERY - Puppet freshness on ms-be1001 is OK: puppet ran at Tue Jun 11 04:25:56 UTC 2013 [04:26:06] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:26:03 UTC 2013 [04:26:26] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:26:22 UTC 2013 [04:26:26] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:26:22 UTC 2013 [04:26:37] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:26:28 UTC 2013 [04:26:37] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:26:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:26:46] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:26:57] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:26:52 UTC 2013 [04:27:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:27:26] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:27:20 UTC 2013 [04:27:36] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:27:36] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:28:56] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:28:48 UTC 2013 [04:29:06] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:28:59 UTC 2013 [04:29:06] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:28:59 UTC 2013 [04:29:06] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:29:05 UTC 2013 [04:29:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:29:26] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:29:19 UTC 2013 [04:29:36] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:29:36] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:29:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:29:48] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:29:56] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:29:54 UTC 2013 [04:30:36] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:30:59] New patchset: Aaron Schulz; "Made job list command skip "stopped" jobs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67948 [04:31:17] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:31:08 UTC 2013 [04:31:26] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:31:23 UTC 2013 [04:31:26] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:31:24 UTC 2013 [04:31:37] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:31:27 UTC 2013 [04:31:37] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:31:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:31:46] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:31:46] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:31:45 UTC 2013 [04:32:16] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:32:07 UTC 2013 [04:32:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:32:38] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:32:38] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:33:06] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:32:56 UTC 2013 [04:33:06] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:33:05 UTC 2013 [04:33:16] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:33:06 UTC 2013 [04:33:26] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:33:17 UTC 2013 [04:33:26] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:33:22 UTC 2013 [04:33:37] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:33:37] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:33:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:33:46] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:33:42 UTC 2013 [04:33:47] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:34:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:34:37] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 04:34:29 UTC 2013 [04:34:37] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:34:37] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [04:35:17] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 04:35:09 UTC 2013 [04:35:17] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 04:35:09 UTC 2013 [04:35:26] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 04:35:24 UTC 2013 [04:35:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:35:36] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 04:35:35 UTC 2013 [04:35:46] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [04:36:18] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [04:36:37] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [04:37:38] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 04:37:19 UTC 2013 [04:37:38] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [04:51:47] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [04:58:27] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [05:40:11] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [05:40:11] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [05:40:11] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [05:40:11] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [05:40:11] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [05:40:11] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: No successful Puppet run in the last 10 hours [05:40:12] PROBLEM - Puppet freshness on mw1020 is CRITICAL: No successful Puppet run in the last 10 hours [05:40:12] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [05:40:13] PROBLEM - Puppet freshness on sockpuppet is CRITICAL: No successful Puppet run in the last 10 hours [05:40:13] PROBLEM - Puppet freshness on spence is CRITICAL: No successful Puppet run in the last 10 hours [05:40:14] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [05:40:14] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [05:40:15] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [06:26:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:27:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [06:34:26] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 06:34:17 UTC 2013 [06:34:37] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [06:34:48] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 06:34:42 UTC 2013 [06:34:56] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 06:34:47 UTC 2013 [06:34:56] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [06:34:56] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 06:34:51 UTC 2013 [06:35:18] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 06:35:06 UTC 2013 [06:35:18] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [06:35:36] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [06:35:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [06:35:47] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 06:35:43 UTC 2013 [06:35:56] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [06:37:26] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 06:37:16 UTC 2013 [06:37:37] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [06:37:56] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 06:37:45 UTC 2013 [06:37:56] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 06:37:46 UTC 2013 [06:37:56] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [06:37:56] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 06:37:52 UTC 2013 [06:38:06] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 06:38:03 UTC 2013 [06:38:16] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [06:38:37] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [06:38:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [06:38:46] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 06:38:41 UTC 2013 [06:39:07] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [06:40:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:41:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [06:50:09] New patchset: Nikerabbit; "ULS config for deployment phase 1" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63113 [06:55:07] New patchset: Nikerabbit; "ULS config for deployment phase 1" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63113 [06:56:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:57:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [07:13:45] PROBLEM - RAID on mc15 is CRITICAL: Timeout while attempting connection [07:14:46] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [07:17:25] PROBLEM - Disk space on mc15 is CRITICAL: Timeout while attempting connection [07:18:15] RECOVERY - Disk space on mc15 is OK: DISK OK [07:20:22] !log updating Board Election translations on cluster [07:20:30] Logged the message, Master [07:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [07:34:36] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.002106189728 secs [07:49:30] apergos: good morning :-) [07:52:02] New patchset: Hashar; "Run db list tests for Wikivoyage as well" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67927 [07:52:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:52:25] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67927 [07:52:31] grrg [07:53:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [08:02:42] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.003076553345 secs [08:03:39] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63113 [08:06:46] morning hashar and YuviPanda [08:07:07] hi Nikerabbit :) [08:07:16] Nikerabbit: jenkins is happy again [08:07:32] Nikerabbit: that was a regression in Gerrit. Thank you again to have raised the issue! [08:09:00] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:08:50 UTC 2013 [08:09:12] !log Starting ULS deployment phase 1 [08:09:19] Logged the message, Master [08:09:19] ohai Nikerabbit [08:09:20] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:09:16 UTC 2013 [08:09:20] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:09:17 UTC 2013 [08:09:30] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:09:20 UTC 2013 [08:09:39] hashar: the thing I wanted to ask you / file a bug about kinda solved itself, so nevermind for now :) [08:09:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:09:40] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:09:35 UTC 2013 [08:09:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:09:56] (it was about switching from maven to gradle for our build system (Android), but looks like Gradle isn't ready yet) [08:10:00] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:05] Nikerabbit: ULS today? nice! [08:10:20] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:10:20] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 08:10:17 UTC 2013 [08:10:40] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:10:40] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [08:10:47] YuviPanda: I guess qchris / ^demon had a similar request to build Gerrit. Though I am not sure it migrated to Gradle [08:11:01] hashar: IIRC it migraded to BUCK [08:11:06] which is far less sexier [08:11:14] or even 'nice' [08:11:28] ^demon was sad about it for a while :) [08:12:00] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:11:52 UTC 2013 [08:12:14] YuviPanda: you will need Gradle installed on gallium. Ubuntu Precise has Gradle version 1.0~m3-1. You can get it installed by submitting a puppet change in modules/contint/manifests/packages.pp . [08:12:20] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:12:16 UTC 2013 [08:12:20] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:12:19 UTC 2013 [08:12:29] not sure gradle 1.0~3.1 will be enough to build your app though [08:12:30] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:12:24 UTC 2013 [08:12:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:12:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:12:44] hashar: nope, needs 1.6. But Gradle on Android at least prefers to just ship a jar along [08:12:50] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:12:40 UTC 2013 [08:13:00] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:10] hashar: besides, we're postponing the migration now - gonna wait at least a month, so nevermind for now :) [08:13:20] hashar: you mentioned a bug you wanted to file.... [08:13:20] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:13:25] !log nikerabbit synchronized wmf-config/CommonSettings.php 'Disable narayam and WebFonts' [08:13:30] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 08:13:22 UTC 2013 [08:13:33] Logged the message, Master [08:13:40] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:13:40] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [08:13:48] YuviPanda: lets switch to #wikimedia-dev [08:13:54] ok [08:13:57] YuviPanda: icinga-wm is too noisy [08:15:00] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:14:56 UTC 2013 [08:15:22] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:15:20 UTC 2013 [08:15:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:15:22 UTC 2013 [08:15:40] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:15:30 UTC 2013 [08:15:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:15:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:15:50] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:15:45 UTC 2013 [08:16:00] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:16:01] !log nikerabbit synchronized wmf-config/InitialiseSettings.php 'Enable ULS' [08:16:08] Logged the message, Master [08:16:20] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:16:30] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 08:16:26 UTC 2013 [08:16:40] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:16:40] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [08:18:00] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:17:54 UTC 2013 [08:18:21] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:18:11 UTC 2013 [08:18:21] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:18:11 UTC 2013 [08:18:30] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:18:22 UTC 2013 [08:18:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:18:40] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:18:37 UTC 2013 [08:18:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:19:02] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:19:22] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:19:30] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 08:19:25 UTC 2013 [08:19:41] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:19:41] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [08:20:42] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:20:39 UTC 2013 [08:21:01] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:20:56 UTC 2013 [08:21:01] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:20:56 UTC 2013 [08:21:01] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:20:58 UTC 2013 [08:21:22] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:21:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:21:40] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:21:33 UTC 2013 [08:21:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:21:50] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 08:21:44 UTC 2013 [08:22:00] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:44] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:22:44] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [08:23:10] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:23:05 UTC 2013 [08:23:21] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:23:18 UTC 2013 [08:23:21] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:23:19 UTC 2013 [08:23:30] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:23:25 UTC 2013 [08:23:35] Nikerabbit: all looks ok on dv.wikt and I even reverted a 1y old vandalism [08:23:41] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:23:41] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:23:50] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:23:42 UTC 2013 [08:24:00] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:24:22] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:24:40] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:25:30] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:25:24 UTC 2013 [08:25:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:25:40] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:25:33 UTC 2013 [08:25:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:25:34 UTC 2013 [08:25:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:25:52] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:25:40 UTC 2013 [08:26:00] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:26:00] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:25:55 UTC 2013 [08:26:20] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:26:41] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:26:41] * Nemo_bis suggests icinga-wm a digest format for puppet freshness complaints [08:27:30] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:27:22 UTC 2013 [08:27:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:27:40] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:27:37 UTC 2013 [08:27:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:27:37 UTC 2013 [08:27:50] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:27:40 UTC 2013 [08:28:00] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:27:50 UTC 2013 [08:28:00] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:28:20] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:28:30] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 08:28:21 UTC 2013 [08:28:40] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:28:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:28:50] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [08:29:10] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:29:01 UTC 2013 [08:29:10] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:29:08 UTC 2013 [08:29:10] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:29:08 UTC 2013 [08:29:20] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:29:10 UTC 2013 [08:29:20] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:29:14 UTC 2013 [08:29:21] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:29:30] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 08:29:25 UTC 2013 [08:29:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:29:40] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:29:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:29:50] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [08:29:53] fubar or snafu? [08:30:00] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:29:56 UTC 2013 [08:30:00] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:30:10] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 08:30:00 UTC 2013 [08:30:10] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:30:01 UTC 2013 [08:30:10] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:30:02 UTC 2013 [08:30:10] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:30:08 UTC 2013 [08:30:21] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:30:30] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 08:30:22 UTC 2013 [08:30:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:30:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:30:40] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:30:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:30:50] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 08:30:40 UTC 2013 [08:30:50] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [08:30:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:30:46 UTC 2013 [08:30:50] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 08:30:48 UTC 2013 [08:31:00] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [08:31:21] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [08:31:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [08:31:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:32:07] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 08:30:46 UTC 2013 PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:32:10] wtf? [08:32:15] make up your mind [08:33:00] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 08:32:50 UTC 2013 [08:33:13] Nemo_bis: something is wrong witht the bot? [08:33:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [08:33:40] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [08:42:33] AzaToth: or puppet is broken :D [08:42:59] hashar: perhaps the puppet went into puberty [08:54:04] New patchset: Raimond Spekking; "Exclude *.wmflabs.org from the need to enter captchas" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67953 [09:09:07] New patchset: Hashar; "beta: fill 'rendering' cache backend" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67954 [09:09:53] hey mark, are you still interested in adding to beta a varnish text cache ? I have noticed it missed a 'rendering' backend which is https://gerrit.wikimedia.org/r/67954 :D [09:10:07] sure [09:10:29] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67954 [09:10:34] that will let us phase out the old squid instances that got a ton of lame hacks :-) [09:10:44] yep [09:11:02] if you have any idea for beta future, I have created a roadmap placeholder at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Roadmap [09:11:12] * mark checks it out [09:11:22] feel free to add your ideas there [09:11:26] apergos: ^^^ :) [09:11:43] the issue with purging is multicast? [09:11:48] I don't have that many ideas myself, will probably look at bugzilla and find out what could be added to that list [09:12:11] the fact that multicast doesn't work is pretty lame [09:12:21] but anyway [09:12:30] purging can be done with unicast too, if you have only few hosts [09:12:41] I think that is how I have set it up [09:12:51] by having Mediawiki send PURGE requests to each of the cache [09:13:00] though I haven't looked at it in a while [09:13:08] it can also send htcp messages to be more like production [09:13:08] but it seems to purge some of the pages [09:13:29] it doesn't necessarily need to send that to a multicast address I think [09:13:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:13:34] (if it does, we shouldfix that :) [09:14:14] err: /Stage[main]/Role::Cache::Text/Varnish::Setup_filesystem[sdb3]/Mount[/srv/sdb3]: Could not evaluate: Execution of '/bin/mount -o noatime,nodiratime,nobarrier,comment=cloudconfig /srv/sdb3' returned 32: mount: special device /dev/sdb3 does not exist [09:14:16] :-D [09:14:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [09:15:32] sigh [09:15:52] wanna put in a patchset for that or shall I? [09:16:02] similar to how we solved it for the other clusters [09:16:28] go ahead [09:17:18] ok [09:17:28] there is a lot of repetition, I am wondering whether we could create a define to generate the mount options [09:22:50] experiencing weirdness... [09:22:58] cannot reach *.wikimedia.org at the moment. [09:23:06] gerrit, bugzilla [09:23:21] (hmm, meta does work) [09:24:00] maybe a DNS issue? [09:25:01] From Finland and India it apparently works. [09:25:05] Will try to find out more... [09:25:08] do you have the 'dig' utility ? [09:25:13] that let you do dns queries [09:25:48] example output http://paste.openstack.org/show/38354/ [09:26:06] hashar: I cannot ping kaulen.wikimedia.org from my location. I can from for example the translatewiki.net server. [09:26:06] at the bottom is a SERVER entry that shows up which DNS server gave the response [09:26:33] try pinging 208.80.152.149 (kaulen address) [09:27:03] hashar: Not responding from my local machine. [09:27:25] weird. [09:27:44] ahh [09:27:55] hmm? [09:28:03] tracert is your friend now [09:28:10] or traceroute [09:28:11] or mtr [09:28:23] will show up the path from your machine to the server [09:28:30] there must be something wrong on the way [09:28:37] checking [09:29:03] I can do the same from kaulen to your ip (give it to me in private :p ) [09:29:15] got it [09:29:18] you are not cloacked [09:32:43] New patchset: Mark Bergsma; "Steps towards more uniform varnish storage backends between production and labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67957 [09:34:03] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:34:54] RECOVERY - DPKG on mc15 is OK: All packages OK [09:36:16] hashar: ^ [09:36:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:36:34] i think it's probably wise to make the varnish storage backend names independent of their locations [09:37:00] and meanwhile I looked at the squid purge configuration on beta, it list the squid cache instance IP in $wgSquidPurge and has no multicast setup [09:37:10] ok [09:37:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [09:41:20] I lose the route to manganese after: [09:41:21] 5 adm-b5-link.telia.net (80.239.167.229) 8.405 ms 11.850 ms 10.983 ms [09:41:31] New review: Hashar; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67957 [09:42:00] mark: I am not sure puppet let you chain ternary operator such as varying on $::realm and then inside the choices vary by $::hostname [09:42:06] Guess I should call my provider's help desk… I'm sure they won't understand... [09:42:34] hashar: it does, but not inside hashes [09:46:22] siebrand: i'm trying something... [09:47:38] New review: Hashar; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67957 [09:49:10] mark: I'm calling Ziggo. They're telling me of course the web server on the other side is broken. [09:49:22] it's not :) [09:49:25] mark: I of course told them colleagues in Finalnd and India already told me that's not true :) [09:49:41] do you have traceroute output for me? [09:49:47] i traced towards you, and it breaks in amsterdam [09:49:51] I do -> private msg in a sec. [09:49:51] but it might be different in the other direction [09:50:22] New patchset: Akosiaris; "Revert "Update to 3b470f56b479f618b7e90577f9857b4e25b38c1a"" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/67958 [09:50:35] adm-b5-link.telia.net looks like AMS [09:51:53] New review: Akosiaris; "Will import latest upstream tar in upstream branch soon" [operations/debs/kafka] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/67958 [09:51:53] Change merged: Akosiaris; [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/67958 [09:53:12] mark / hashar : Routing just recovered... [09:53:13] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:53:18] yep [09:53:55] Still a bit flaky. Getting 3 different IPs for hops 6/7/8/9. [09:54:03] RECOVERY - DPKG on mc15 is OK: All packages OK [09:54:10] thanks for the help. [09:54:15] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67957 [09:54:48] Ziggo help desk cannot help. They advised me to call "Service+" which is a paid help line. I asked them why I should pay to get a problem report on routing by a transit provider in. He couldn't tell me... [09:54:58] :-D [09:54:59] When I was in that discussion, routing recovered... [09:55:38] I would have contacted them then [09:55:48] that would probably have been easier [09:56:00] have you fixed anything mark ? [09:56:01] their network engineers directly [09:56:07] no [09:57:35] 11:53:54 Still a bit flaky. Getting 3 different IPs for hops 6/7/8/9. [09:57:41] siebrand: that's usually normal :) [09:57:51] oh, didn't know [10:02:40] ahh puppet ran successfull [10:04:04] hashar: I still want to make multiple storage backends in labs also [10:04:12] I want to use the same setup and same VCL [10:04:19] but at least it works for now [10:04:28] i'll have to convert upload to use that too [10:04:33] fortunately new upload servers are arriving tomorrow [10:04:38] i hate init.d [10:04:41] and I'll just convert the new boxes, not the old ones that will go away [10:06:43]

Your cache administrator is nobody.

[10:06:43] ahah [10:06:47] lovely [10:07:27] :) [10:07:35] Error: (G) size "-spersistent": Invalid number [10:07:35] :D [10:07:47] where does it say that? [10:07:59] on deployment-cache-text1.pmtpa.wmflabs [10:08:00] when starting the service 'varnish' [10:08:04] oh I see it [10:08:06] stupid ' ' [10:08:47] New patchset: Mark Bergsma; "Single quotes don't expand variables" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67960 [10:09:04] ohhh [10:09:09] I should have spotted that one [10:09:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67960 [10:11:07] -VARNISH_STORAGE="-s main1=persistent,/srv/vdb/varnish.persist,${storage_size_main}G" [10:11:08] +VARNISH_STORAGE="-s main1=persistent,/srv/vdb/varnish.persist,19G" [10:11:13] better [10:12:03] Symbol not found: 'pmtpa' (expected type BACKEND): [10:12:04] ('input' Line 199 Pos 27) [10:12:05] set req.backend = pmtpa; [10:12:06] --------------------------#####- [10:12:11] I have been hit by that previously [10:12:38] so the pmtpa backend isn't defined [10:13:54] quite possibly that's broken in production too ;) [10:13:58] * mark checks [10:14:50] anyone can fix a git repo for moi? [10:15:02] need operations/debs/buck setup [10:15:18] what is buck? [10:15:27] the shit that build gerrit [10:15:47] added a req here yesterday in my anger: http://www.mediawiki.org/w/index.php?title=Git/New_repositories/Requests/Entries&curid=84638&diff=708815&oldid=708814 [10:16:46] buck is made by facebook to use primarly for android but is also now used by gerrit [10:16:58] dont ask me why [10:17:18] it is in ubuntu isn't it ? [10:17:25] hashar: nope [10:17:40] made a package (works) myself [10:17:56] though I skipped the daemon part [10:18:08] and all jar deps are bundled [10:18:20] yeah we had that discussion internally [10:18:31] about shipping jar in debian packages. i can't remember the status quo [10:18:37] but that is certainly a no [10:18:39] let me find it [10:19:13] hashar: then I need probably setup 20 more repos [10:19:31] Andrew Otto has a similar issue package Kafka [10:19:35] packaging [10:19:41] that got tons of jar dependencies [10:19:46] http://paste.debian.net/9647/ [10:19:53] yea [10:20:03] yeah that [10:20:05] I wasn't sure if they had modified the jars even [10:20:16] but I at least need a repo for buck [10:20:35] akosiaris1 has the package work on https://github.com/akosiaris/kafka [10:20:40] ok [10:21:04] it fallbacks to the Ubuntu provided jar [10:21:19] well, I could upload it to github, but hw who shall not be named who asked me to fix the gerrit package might be sad then [10:21:20] quoting him The extra jars are installed in /usr/share/java as per debian policy [10:21:21] while kafka's jars go to /usr/share/kafka. [10:21:59] https://github.com/akosiaris/kafka/blob/debian/debian/kafka.install nasty :D [10:22:18] AzaToth: mind you I had to reinvent the wheel with that package because we also had to avoid various problems sbt (simple build tool) was creating [10:22:24] that's what I hate with java [10:22:28] and here he is :) [10:22:29] the culture to bundle jars [10:22:49] you are not alone in this club man... [10:23:40] let me find out how to create the repo :D [10:24:28] and in buck's case, I had to write a totally new launch script as they assumed you always run from the git repo [10:24:41] and they don't have binary dists [10:24:44] yet [10:25:05] well, uploaded the shit at https://github.com/azatoth/buck for the moment [10:25:25] https://github.com/azatoth/buck/commit/e2387f9456a85723666d1d2a27d4fef43592e44f [10:25:49] AzaToth: I think I have it created [10:26:07] AzaToth: git clone ssh://hashar@gerrit.wikimedia.org:29418/operations/debs/buck [10:26:09] eerrrr [10:26:13] git clone ssh://gerrit.wikimedia.org:29418/operations/debs/buck [10:26:43] wow.... man page ?? nice [10:27:04] Permission denied [10:27:07] that's something you never see with java ... [10:27:14] akosiaris: ran help2man [10:27:25] I made the man page stupido!!!! [10:27:34] lol [10:27:59] hashar: have no access to that repo [10:28:32] erm, ignore that [10:31:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:32:14] damn, i hate that gerrit for me is so slow [10:32:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [10:32:22] Writing objects: 32% (986/3075), 24.81 MiB | 294 KiB/s [10:32:37] I really need to talk to my ISP again [10:38:44] hashar: I don't want to make patchsets for the whole buck repo [10:38:52] i.e. upstream branch [10:39:08] Hi. Asking for permission to perform a bigdelete on commonswiki. Revisions: +/- 9,404 [10:39:09] https://gerrit.googlesource.com/buck [10:39:25] AzaToth: I can't give you push access to upstream branch [10:39:49] Can you prefill it? [10:39:51] AzaToth: do you really need to put all upstream code in the debs repo ? [10:40:11] they don't have any release [10:40:24] AzaToth: to me our ops/debs/buck should just contains the debian files. [10:40:25] ah [10:40:26] although vvv_ tool says there's only 4900+ revs [10:40:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:41:09] maurelio: sorry I have zero idea how to delete those pages :/ [10:41:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [10:41:43] I'm steward, I can do that. However we were told to ask for permission before doing that from the tech people, hashar [10:43:02] ohhh [10:45:27] however idk which one should approve... last time I got the 'ok' from Ariel T. Glenn. [10:45:42] AzaToth: Looks like you did not package upstream buck, but gerrit's "fork" of it. On purpose? Can we use that from Jenkins? [10:46:03] maurelio: sorry I have no idea what the consequence are. I guess that might cause some database replication lag. [10:46:29] apergos: maurelio is asking for permission to delete a 5000 revision page from commons. Seems you granted such a permission before :) [10:46:38] "serious database disruption" according to the email, hashar [10:46:58] maurelio: so I am afraid I have to step out :-] I am not a db folk :D [10:47:27] hashar: no problem and thanks for the help. [10:47:34] I'm here [10:47:57] Hello apergos: I've been asked to perform a couple of bigdeletes on commons and enwiki. [10:48:32] well if you were going to do commons this is not a horrible time of day... what are the current limits I wonder [10:49:22] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:49:27] 'wgDeleteRevisionsLimit' => array( [10:49:27] 'default' => 5000, [10:49:27] ), [10:49:33] yep [10:49:33] so sayeth InitialiseSettings [10:50:10] so this page is right on the border is it? [10:50:12] RECOVERY - DPKG on mc15 is OK: All packages OK [10:50:23] the enwiki page has 9370 revs according to vvv tool [10:50:32] the commons page, ~5000 [10:51:19] if all is OK, I'll delete; waiting for your advice. [10:51:36] I'd do the commons one now and let's see how it behaves (check in wikimedia-tech and wikimedia-dev that nothing exciting is happening, I see a gap in the deployment calendar though) [10:51:55] apergos: I believe Tim was saying last time, that 5000 is a old outdated figure from when there was only one db server and its unlikely to cause issues these days [10:52:11] we'll see soon enough :-) [10:52:20] ok, doing commons now [10:52:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:53:12] commons done [10:53:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.148 second response time [10:53:42] pretty quick [10:54:38] ok, I'd say to go ahead with enwiki now [10:54:42] no time like the present [10:54:43] AzaToth: so potentially you could use the date has a version number. Something like 0.0.0~git.-1 [10:55:09] AzaToth: there is a bunch of packages like that (reference: apt-cache dump|grep Version:|grep git ) [10:55:45] maurelio: [10:55:59] yes apergos ? [10:56:12] PROBLEM - DPKG on mc15 is CRITICAL: Timeout while attempting connection [10:56:16] go ahead with enwiki I'd say [10:56:30] ok, doing [10:56:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:56:47] doing... [10:57:12] RECOVERY - DPKG on mc15 is OK: All packages OK [10:57:12] PROBLEM - Disk space on mc15 is CRITICAL: Timeout while attempting connection [10:57:17] Done. [10:57:33] great [10:58:12] RECOVERY - Disk space on mc15 is OK: DISK OK [10:58:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.171 second response time [10:58:32] hope that all is ok [10:58:37] thanks for your help [10:58:46] I'm watching replag, seems fine [10:58:50] happy trails [10:59:16] maurelio: thanks :) [10:59:28] :D [11:01:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:02:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [11:09:15] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:10:15] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [11:22:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:24:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [12:09:03] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 12:08:59 UTC 2013 [12:09:33] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 12:09:28 UTC 2013 [12:09:43] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 12:09:42 UTC 2013 [12:09:53] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 12:09:42 UTC 2013 [12:09:53] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [12:09:53] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 12:09:46 UTC 2013 [12:09:53] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [12:09:53] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [12:09:54] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:10:13] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 12:10:06 UTC 2013 [12:10:33] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [12:10:33] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:43] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 12:10:34 UTC 2013 [12:10:53] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [12:12:03] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 12:11:56 UTC 2013 [12:12:15] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 12:12:08 UTC 2013 [12:12:15] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 12:12:08 UTC 2013 [12:12:23] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 12:12:22 UTC 2013 [12:12:33] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [12:12:43] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 12:12:39 UTC 2013 [12:12:53] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:12:53] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [12:12:53] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [12:13:03] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 12:12:54 UTC 2013 [12:13:33] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [12:13:43] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 12:13:41 UTC 2013 [12:13:54] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [12:13:54] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [12:14:03] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 12:13:54 UTC 2013 [12:14:03] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 12:13:56 UTC 2013 [12:14:03] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 12:14:01 UTC 2013 [12:14:13] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 12:14:06 UTC 2013 [12:14:24] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 12:14:20 UTC 2013 [12:14:33] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [12:14:33] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:53] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:14:53] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [12:14:53] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [12:15:03] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 12:15:01 UTC 2013 [12:15:13] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 12:15:09 UTC 2013 [12:15:13] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 12:15:09 UTC 2013 [12:15:13] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 12:15:11 UTC 2013 [12:15:33] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:33] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 12:15:26 UTC 2013 [12:15:43] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 12:15:40 UTC 2013 [12:15:53] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:15:53] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [12:15:53] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [12:15:53] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [12:16:23] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 12:16:20 UTC 2013 [12:16:33] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [12:16:33] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 12:16:27 UTC 2013 [12:16:33] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 12:16:28 UTC 2013 [12:16:33] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 12:16:29 UTC 2013 [12:16:44] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 12:16:34 UTC 2013 [12:16:54] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:16:54] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [12:16:54] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [12:16:54] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 12:16:52 UTC 2013 [12:17:24] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 12:17:21 UTC 2013 [12:17:34] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [12:17:34] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [12:17:34] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 12:17:29 UTC 2013 [12:17:34] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 12:17:29 UTC 2013 [12:17:34] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 12:17:31 UTC 2013 [12:17:44] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 12:17:36 UTC 2013 [12:17:53] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:17:53] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [12:17:53] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [12:17:53] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [12:17:53] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 12:17:51 UTC 2013 [12:18:13] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 12:18:10 UTC 2013 [12:18:24] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 12:18:18 UTC 2013 [12:18:24] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 12:18:20 UTC 2013 [12:18:34] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [12:18:34] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [12:18:53] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:18:54] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [12:18:54] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [12:18:54] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [12:26:53] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 12:26:52 UTC 2013 [12:27:53] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [12:28:43] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 12:28:38 UTC 2013 [12:28:53] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:29:53] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 12:29:45 UTC 2013 [12:29:53] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [12:30:43] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 12:30:38 UTC 2013 [12:31:33] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [12:33:24] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 12:33:19 UTC 2013 [12:33:33] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [13:40:56] New patchset: Cmjohnson; "fixing rdb1003/4 entry" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67978 [13:41:48] re [13:42:11] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67978 [13:45:13] hashar: was it possilbe? [13:45:33] anything is possible [13:52:27] hashar: https://gerrit.googlesource.com/buck [13:52:38] you just need to clone it into gerrit [13:53:00] New patchset: Akosiaris; "Imported Upstream version 3b470f" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/67979 [13:53:20] AzaToth: what are you talking about ? [13:53:29] buck [13:53:29] New patchset: Reedy; "Add favico for testwikidatawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67980 [13:54:57] hashar: I needed the repo to be filled, so you don't need to close 110 patchsets [14:01:21] mark, can you sign off on https://gerrit.wikimedia.org/r/67881 and https://gerrit.wikimedia.org/r/67882? Should be quick reads. [14:01:46] ok [14:03:57] thanks [14:04:12] New review: Mark Bergsma; "This is needed on McHenry (the MediaWiki outgoing mail relay), but mchenry doesn't use the Puppet Ex..." [operations/puppet] (production) C: -2; - https://gerrit.wikimedia.org/r/67882 [14:04:18] done [14:04:28] +1 & -2 [14:05:10] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67881 [14:05:51] New patchset: Hashar; "pep8 cleanup for udpprofile" [operations/software] (master) - https://gerrit.wikimedia.org/r/67893 [14:05:51] New patchset: Hashar; "Pep8 cleanup for swiftcleaner and fwconfigtool." [operations/software] (master) - https://gerrit.wikimedia.org/r/67903 [14:05:51] New patchset: Hashar; "pep8: removes udpprofile.init filename" [operations/software] (master) - https://gerrit.wikimedia.org/r/67981 [14:06:25] New review: Hashar; "trivial enough that I should get +2 on ops repos :-]" [operations/software] (master) - https://gerrit.wikimedia.org/r/67981 [14:07:32] New review: Hashar; "The udpprofile.init was listed in the 'filename' section of the .pep8 file, that forced it to parse ..." [operations/software] (master) - https://gerrit.wikimedia.org/r/67893 [14:08:19] New review: Mark Bergsma; "quick enough that you don't need +2 on ops repos ;-)" [operations/software] (master) - https://gerrit.wikimedia.org/r/67981 [14:08:26] :-] [14:08:33] New review: Mark Bergsma; "quick enough that you don't need +2 on ops repos ;-)" [operations/software] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/67981 [14:08:33] Change merged: Mark Bergsma; [operations/software] (master) - https://gerrit.wikimedia.org/r/67981 [14:12:58] andrewbogott: udpprofile pep8 works for me ( https://gerrit.wikimedia.org/r/#/c/67893/ ) :D [14:13:02] andrewbogott: good morning ! [14:13:17] Ryan_Lane: need to do more stuff in the buck package, bacuase my current package bundles all the jars, waiting for hashar to init the repo with upstream [14:14:08] AzaToth: so I told you already I can not give you push rights to the upstream branch. Although I can create a repo, I can not actually administer it under the ops/ hierarchy. [14:14:31] I know you can't give me push rights to a branch [14:14:35] New patchset: Akosiaris; "Imported Upstream version 3b470f" [operations/debs/kafka] (upstream) - https://gerrit.wikimedia.org/r/67982 [14:14:39] mark that template refers to enable_mediawiki_relay but the class defines $mediawiki_relay. Are those meant to be the same thing? Neither of those are used elsewhere best I can tell. [14:14:40] I just thought you could init the repo [14:14:47] AzaToth: I am not sure it is a good idea to actually fork the repo in our Gerrit install when git build package would take care of fetching it for us. But there might be something I am missing. [14:14:58] andrewbogott: yeah probably [14:15:02] it's just never been used in the template [14:15:10] you mean I should pull it as a sub module? [14:15:38] could be possible [14:16:00] AzaToth: I have no idea honestly. Maybe that is how we are using build package, that might even be how we are doing it on other repos. [14:16:30] hashar: anyway I can't run "git review" when remotes/gerrit/master is nil [14:16:53] hashar: I'm using git buildpackage [14:17:41] looking at the operations/debs/varnish repo, it has the upstream branch :D [14:18:07] there are normally two ways it's used, either you have the upstream git direclty as a branch, or you make upstream out of releases [14:18:19] but you _always_ have a upstream branch [14:18:22] and they do not do any release [14:18:29] not yet [14:19:01] hashar: read gerrit/Documentation/dev-buck.txt [14:19:12] "There is currently no binary distribution of Buck, so it has to be manually [14:19:12] built and installed. Apache Ant is required. Currently only Linux and Mac [14:19:12] OS are supported. [14:19:12] " [14:19:13] New patchset: Andrew Bogott; "Replaced enable_mediawiki_relay with mediawiki_relay." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67882 [14:19:48] mark: OK, resubmitted ^ [14:19:58] offcourse I could make a "fake" release and have it as a base upstream [14:20:31] but as long they don't have a release, it's difficult [14:20:35] I hate gerrit [14:20:40] (sometime) [14:20:50] you might have notice I sat the version to 0.0+g410fcf3-0 [14:21:05] Change merged: Andrew Bogott; [operations/software] (master) - https://gerrit.wikimedia.org/r/67893 [14:21:42] hashar: I hate java programmers in general ツ [14:21:53] New review: Mark Bergsma; "(1 comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/67882 [14:22:16] andrewbogott: so that looks like it was wrong before in the template [14:22:32] AzaToth: bahhhh 50MBytes :( [14:22:42] hashar: *bundled jars* [14:23:21] 20M lib/ [14:23:21] 12M third-party/ [14:23:43] New patchset: Hashar; ".gitreview file" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/67984 [14:23:44] as I said, I have in general java programmers who do that [14:23:55] hmm [14:24:37] ^demon: hello :) Do you have any idea how to create an orphan / empty branch in Gerrit GUI ? [14:24:59] ^demon: it complains about the "initial revision" that must be filed despite the repo having nothing in it (it is new) [14:25:02] New patchset: Andrew Bogott; "Replaced enable_mediawiki_relay with mediawiki_relay." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67882 [14:25:04] mark: OK, once more... [14:25:26] andrewbogott: could you merge in https://gerrit.wikimedia.org/r/67984 , .gitreview file for operations/debs/buck [14:25:35] +1d [14:25:44] <^demon> hashar: You can't create a branch from nothing? [14:26:04] ^demon: maybe I can force push [14:26:05] hmm [14:26:11] hashar: sure, if I have merge rights there... [14:26:19] hashar: if you have the rights, "git push gerrit" should suffice [14:26:21] Change merged: Andrew Bogott; [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/67984 [14:26:26] <^demon> hashar: If you've got a local repo, you can push to a non-existing branch yeah [14:26:43] hashar: you didn't fork the upstream I assume [14:26:47] andrewbogott: oops [14:26:49] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67882 [14:27:08] but now it's too late [14:27:13] …oops? [14:27:14] :( [14:27:27] asctually it's not [14:27:31] andrewbogott: ah yo got merge rights :-] [14:27:47] run "git push -f master:upstream gerrit" I think [14:28:01] AzaToth: na you can t create branches this way [14:28:11] i don't think so either [14:28:20] true [14:28:31] make a upstream branch and git push -f upstream gerrit [14:28:55] so I have created an 'upstream' branch that points to 8b27c0bdfff71aabec1c0b3c654001a1a2627c9d which is the change andrew merged [14:28:59] now going to force push [14:29:07] hmm [14:29:24] the one before is 410fcf3420cb06e62cbe9ee93eff931fa9a9b1a2 then I assume [14:29:51] git push -f gerrit github/master:upstream [14:29:59] luckily I still have my super powers on that repo :-] [14:30:29] AzaToth: I am pushing * 952ceec - (github/master) Check that we have Java 7 installed before building (2 weeks ago) [14:31:04] oh [14:31:14] is everything in that repo open source? [14:31:20] including all the jars [14:31:23] I'm, using https://gerrit.googlesource.com/buck [14:31:35] mark: they are, but they are jars [14:31:50] mark: if not, then we can't use gerrit [14:31:57] yeah that's my point [14:32:18] but they all look like open source jars [14:32:42] all code is Apache-2.0 and is made by facebook [14:32:49] AzaToth: https://gerrit.googlesource.com/buck seems to be a fork by the Gerrit folks [14:32:59] hashar: yea [14:33:02] ok [14:33:16] I used it because it was for gerrit primarly [14:33:19] AzaToth: I used upstream ( git://github.com/facebook/buck.git ) [14:33:22] ah [14:33:29] no probs [14:33:38] it's only a few extra commits [14:33:49] hopefully Shawn Pearce (Gerrit guy at Google) will submit pull reuqests [14:34:21] hopefully [14:34:31] Writing objects: 9% (276/3032), 30.05 MiB | 105 KiB/s [14:34:40] though, google help facebook? [14:34:41] that is like 300MB ? :( [14:34:59] azatoth@azaboxen:~/build/buck «master % u+2»$ du -sh .git [14:34:59] 51M .git [14:35:09] AzaToth: so yeah don't tell anyone, there is an internet cabal that newbies knows as "open source software" [14:35:21] AzaToth: those guys are really smart and managed to sneak in all the biggest comapnies [14:35:30] AzaToth: and are working together to take other the world :-] [14:35:33] hehe [14:36:12] AzaToth: in short, yes, engineers do work together even when in different companies. They have their legals to double check what they are doing though, such as making sure no IP is wrongly open sourced. [14:36:27] hey [14:36:29] AzaToth: you usually don't want to open source an algorithm which is the core of your business [14:36:35] need anything from me? [14:36:48] paravoid: 20 java packages please [14:37:08] paravoid: AzaToth is packaging Facebook buck, an android build tool written in java that has tons of java dependencies [14:37:14] paravoid: http://paste.debian.net/9688/ [14:37:17] just like kafka handled by andrew / alexandros [14:37:32] paravoid: haven't dona any jar-minimization yet [14:37:47] paravoid: buck is required to even build gerrit [14:38:28] To ssh://hashar@gerrit.wikimedia.org:29418/operations/debs/buck.git [14:38:29] ! [remote rejected] github/master -> upstream (non-fast forward) [14:38:30] error: failed to push some refs to 'ssh://hashar@gerrit.wikimedia.org:29418/operations/debs/buck.git' [14:38:32] I HATE YOU [14:38:32] :D [14:38:36] hahaha [14:38:39] forgot -f? [14:38:54] what I thought [14:38:59] ! [remote rejected] github/master -> upstream (non-fast forward) [14:39:06] must not be allowed [14:39:50] can you give me push rights to the repo as whole for a sec and I can try [14:40:03] !log granted myself force push rights on operations/debs/buck to push in git://github.com/facebook/buck.git@master to 'upstream' branch [14:40:10] ah [14:40:10] Logged the message, Master [14:40:23] that blody force push right [14:40:33] 104 KiB/s [14:40:39] I need to get optiocal fiber at home [14:40:41] akosiaris is our expert on java build systems [14:40:43] * paravoid ducks [14:40:55] hashar: I'm only getting like 300KiB/s myself :( [14:41:01] paravoid: hehe [14:41:01] paravoid: if you are looking for something, I could use a cherry pick on our PHP package :-] [14:41:13] damn, completely dropped the ball on that [14:42:57] paravoid: me gonna hit you with smt ... [14:42:58] paravoid: need the RT / bug ? [14:43:20] paravoid: https://rt.wikimedia.org/Ticket/Display.html?id=5209 :-D [14:43:56] paravoid: ideally we should get it deployed on gallium only to make sure it is actually fixing the bug I am facing then schedule a deployment on production. I am not sure how we push new PHP versions. [14:44:08] apt usually but dpkg -i works too [14:46:49] AzaToth: Writing objects: 33% (1025/3032), 44.80 MiB | 105 KiB/s .. [14:47:49] New patchset: Jgreen; "remove db1013, fix typo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67990 [14:49:00] AzaToth: so it should have the upstream branch now. [14:49:09] hashar: ok [14:49:30] you take share of shawns commits as well? [14:49:37] if you need to have it updated, push upstream:refs/for/upstream and get someone from ops to approve / merge your change [14:49:40] or should I skip them? [14:49:57] maybe that could be another branch like upstream-google [14:50:04] yea [14:51:39] New patchset: Reedy; "Sync w at the same time as docroot" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64449 [14:51:52] you make that branch now? [14:52:10] * AzaToth has nil powah [14:52:40] AzaToth: so our git repo now has an 'upstream' branch which is the Facebook master branch, and it has an 'upstream-google' branch which is Shawn fork [14:52:47] ok [14:52:51] sounds good [14:53:17] then the 'master' branch only has one commit which is the .gitreview file to use with our installation [14:53:36] ouch [14:54:02] need to rebase master ontop upstream-google [14:54:58] just push upstream-google to master and I'll make a patchset for the gitreview file [14:55:17] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67990 [14:55:59] AzaToth: isn't git buildpackage taking care of the merge ? [14:56:29] no [14:57:03] I've not heard about such procedure [15:00:07] AzaToth: I guess you could submit a merge commit :-] [15:01:26] hashar: you dont' want me to do that [15:01:35] New patchset: Hashar; "merging in Google fork" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/67995 [15:01:54] http://paste.debian.net/9695/ [15:02:17] New patchset: Hashar; "merging Google fork into our master branch" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/67995 [15:02:22] oh god [15:03:12] hashar: just force push upstream-google to master [15:03:13] AzaToth: so here you have the upstream from Facebook and the Google fork. + a merge commit for master. [15:03:46] AzaToth: if you get any Debian package already, you can rebase your work on top of https://gerrit.wikimedia.org/r/67995 :D [15:04:11] you can't force push? [15:04:28] New patchset: Mark Bergsma; "Add patch for CVE-2013-4090" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/67996 [15:04:28] New patchset: Mark Bergsma; "varnish (3.0.3plus~rc1-wm11) precise; urgency=high" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/67997 [15:04:43] to just reset master to upstream-google [15:05:23] after that I can make patchsets [15:05:59] New review: AzaToth; "dont like" [operations/debs/buck] (master) C: -1; - https://gerrit.wikimedia.org/r/67995 [15:08:12] !log Inserted varnish 3.0.3plus~rc1-wm11 packages into the precise-wikimedia APT repository [15:08:21] Logged the message, Master [15:11:06] !log reedy synchronized php-1.22wmf5/extensions/Wikibase/ [15:11:13] Logged the message, Master [15:12:04] !log reedy synchronized php-1.22wmf6/extensions/Wikibase/ [15:12:12] Logged the message, Master [15:12:53] !log reedy synchronized php-1.22wmf6/extensions/DataValues/ [15:13:01] Logged the message, Master [15:13:04] hashar: ping [15:13:17] AzaToth: pong [15:14:10] could you reset master to upstream-google or not? [15:15:03] Change merged: Andrew Bogott; [operations/software] (master) - https://gerrit.wikimedia.org/r/67903 [15:15:18] AzaToth: potentially [15:15:25] AzaToth: isn't it nicer to have a merge commit ? ;D [15:15:35] AzaToth: this way you know what got imported in the debian package [15:15:55] AzaToth: whenever upstream release a new version the upstream branch ill get updated and merged in master again [15:15:58] not relevat for the initial upstream [15:16:08] yea [15:16:12] and I had to create a first commit to create the upstream branch [15:17:02] not really [15:17:36] you never even merged the merge commit, so I can't actually use it [15:18:20] Change merged: Mark Bergsma; [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/67995 [15:18:29] but for initial, I feel it's best to reset master to upstream-google [15:18:43] mark merged it :-] [15:19:07] so you can rebase your debian commit on top of it, submit for review, done [15:20:26] New patchset: AzaToth; "Initial debian build" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/67999 [15:20:39] \O/ [15:21:06] added the debian folks to it [15:21:20] still, I alwats add the gitreview ontop of latest upstream commit [15:22:00] !log cleaned up permissions on operations/debs/buck , that is now a regular ops repo (ops are owner + force push ) [15:22:08] Logged the message, Master [15:22:17] AzaToth: yeah that does not really matter [15:23:37] as long upstream doesn't become rebased ontop of it [15:23:55] New patchset: Andrew Bogott; "pep8 cleanup of swiftrepl" [operations/software] (master) - https://gerrit.wikimedia.org/r/67898 [15:26:42] mark: do you whom I can ask to review a ganglia plugin addition ? [15:26:48] (for jenkins : https://gerrit.wikimedia.org/r/#/c/66960/ ) [15:26:57] i can do that [15:27:54] hashar: I assume I would need to clean up the jar mess before aprooving the patchset [15:28:07] AzaToth: I am not going to review that patchset :-] [15:28:12] heh [15:28:44] I assume I would need to setup 10-20 more repos for the deps [15:29:23] the 3 people I have added as reviewers have the expertise in debian packaging and at least two of them worked on packaging Kafka which has ton of java deps as well :D [15:29:27] I'll pöoke you to death then ツ [15:29:52] hashar: make a group of them [15:30:28] did kafka solve the deps issue [15:30:39] yeah I wish we had some reviewers-java reviewers-python reviewers-debian groups [15:30:56] no idea about Kafka [15:31:08] will need to dig in https://github.com/akosiaris/kafka :-] [15:33:35] New review: Mark Bergsma; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66960 [15:34:27] αχα... so i am reviewing another java app [15:34:40] Change merged: Mark Bergsma; [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/67996 [15:34:42] * akosiaris runs to updated linked profile: Java reviewer... [15:35:01] Change merged: Mark Bergsma; [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/67997 [15:35:31] hehe [15:35:34] suddenly I don't want coffee but indonesian food [15:36:03] what you want is a greasy hamburger with extra cheese and bacon [15:36:14] no [15:36:46] * woosters thinks mark wants satay, gado-dgado and rendang [15:36:54] yes [15:36:56] :-) [15:36:59] :-) [15:38:51] MMmm gadogado [15:38:54] love gado gado [15:38:57] I had some in amsterdam [15:40:21] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [15:40:21] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [15:40:21] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [15:40:21] PROBLEM - Puppet freshness on ms-be1 is CRITICAL: No successful Puppet run in the last 10 hours [15:40:21] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [15:40:21] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [15:40:21] PROBLEM - Puppet freshness on mw1020 is CRITICAL: No successful Puppet run in the last 10 hours [15:40:22] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [15:40:22] PROBLEM - Puppet freshness on spence is CRITICAL: No successful Puppet run in the last 10 hours [15:40:23] PROBLEM - Puppet freshness on sockpuppet is CRITICAL: No successful Puppet run in the last 10 hours [15:40:23] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [15:40:24] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [15:40:24] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [15:43:18] u can get that in NY City [15:43:29] SF too! [15:43:40] but not a lot [15:49:10] AaronSchulz: So you said load on the Parsoid cluster was gonna go away for a little bit and then come back, right? I just looked at it and it's still flat. Any idea what's going on there? [15:49:14] ( gwicke_away ---^^ ) [15:51:12] New patchset: Reedy; "(bug 49334) Create an 'arbcom' group on the Russian Wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67598 [15:51:38] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67598 [15:55:57] !log asher restarted twemproxy on all servers [15:56:06] Logged the message, Master [15:57:45] New patchset: Reedy; "Some fixes:" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67939 [15:58:03] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67939 [15:59:20] New patchset: Hashar; "jenkins: add in ganglia monitoring" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66960 [15:59:27] New review: Hashar; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66960 [16:00:56] New patchset: Reedy; "Add favico for testwikidatawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67980 [16:01:36] New review: Hashar; "I have made the ganglia files to belong to root:root with mode 0444. Also replaced trailing semicol..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66960 [16:01:40] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67980 [16:02:09] mark: I have fixed the Jenkins ganglia plugin file permissions. pep8 is an upstream issue for which I submitted them a pull request :) ( https://gerrit.wikimedia.org/r/#/c/66960/ ) [16:02:41] !log reedy synchronized docroot/bits/favicon/testwikidata.ico [16:02:48] Logged the message, Master [16:03:03] New patchset: Mark Bergsma; "jenkins: add in ganglia monitoring" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66960 [16:03:20] !log reedy synchronized wmf-config/InitialiseSettings.php [16:03:27] Logged the message, Master [16:04:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/66960 [16:04:48] New patchset: Reedy; "Fixup docroot code to work for wikimanias all from one docroot folder" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/67341 [16:05:49] hashar: it's live [16:07:59] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 16:07:51 UTC 2013 [16:08:49] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 16:08:44 UTC 2013 [16:09:01] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [16:09:01] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 16:08:53 UTC 2013 [16:09:01] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 16:08:54 UTC 2013 [16:09:01] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 16:08:57 UTC 2013 [16:09:09] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:22] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 16:09:16 UTC 2013 [16:09:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:09:39] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [16:09:39] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 16:09:35 UTC 2013 [16:09:39] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [16:09:49] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [16:09:59] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [16:10:29] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 16:10:20 UTC 2013 [16:10:39] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 16:10:31 UTC 2013 [16:10:39] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 16:10:35 UTC 2013 [16:10:39] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [16:10:39] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 16:10:36 UTC 2013 [16:10:49] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 16:10:47 UTC 2013 [16:11:09] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:09] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 16:11:05 UTC 2013 [16:11:20] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:11:40] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [16:11:49] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [16:11:49] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 16:11:46 UTC 2013 [16:11:59] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [16:11:59] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 16:11:54 UTC 2013 [16:12:00] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 16:11:55 UTC 2013 [16:12:00] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 16:11:57 UTC 2013 [16:12:09] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [16:12:09] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 16:12:05 UTC 2013 [16:12:19] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:12:29] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 16:12:26 UTC 2013 [16:12:39] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [16:12:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [16:12:49] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [16:12:59] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [16:13:10] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 16:13:02 UTC 2013 [16:13:19] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 16:13:12 UTC 2013 [16:13:19] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 16:13:12 UTC 2013 [16:13:19] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 16:13:14 UTC 2013 [16:13:19] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:13:30] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 16:13:20 UTC 2013 [16:13:41] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [16:13:42] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [16:13:54] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [16:13:54] mark: awesome. thanks [16:14:09] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:09] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 16:14:07 UTC 2013 [16:14:21] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 16:14:15 UTC 2013 [16:14:21] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 16:14:16 UTC 2013 [16:14:21] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 16:14:17 UTC 2013 [16:14:35] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 16:14:28 UTC 2013 [16:14:39] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [16:14:39] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [16:14:49] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [16:15:11] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [16:15:12] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 16:15:10 UTC 2013 [16:15:20] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:15:21] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 16:15:18 UTC 2013 [16:15:21] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 16:15:18 UTC 2013 [16:15:29] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 16:15:22 UTC 2013 [16:15:29] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 16:15:26 UTC 2013 [16:15:39] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 16:15:35 UTC 2013 [16:15:39] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [16:15:39] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [16:15:50] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [16:16:00] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [16:16:00] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 16:15:57 UTC 2013 [16:16:09] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [16:16:09] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 16:16:03 UTC 2013 [16:16:09] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 16:16:07 UTC 2013 [16:16:19] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 16:16:11 UTC 2013 [16:16:19] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:16:29] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 16:16:26 UTC 2013 [16:16:39] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [16:16:39] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [16:16:49] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [16:16:59] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [16:17:09] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [16:23:40] dinner time see you later [16:25:09] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 16:25:05 UTC 2013 [16:25:40] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [16:28:39] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 16:28:38 UTC 2013 [16:28:59] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 16:28:49 UTC 2013 [16:29:09] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [16:29:19] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:29:27] RoanKattouw: the dedicated Parsoid job runner does not seem to be working so well yet [16:29:49] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 16:29:47 UTC 2013 [16:30:17] I pinged AaronSchulz earlier, but I guess he is on his way to the office [16:30:39] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [16:33:20] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 16:33:11 UTC 2013 [16:33:49] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [16:39:51] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 16:39:48 UTC 2013 [16:40:31] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [16:46:55] heya ^demon, I forget…can we rename repositories? [16:47:01] in gerrit? [16:47:06] <^demon> Not yet, work in progress. [16:47:12] hm, ok [16:47:12] <^demon> It's hugely disruptive atm. [16:47:38] heya paravoid, I need to push a brand new repo to ops/debs/kafka where we've been doing the reviews [16:47:48] which means I need to either push to a different repo name [16:47:54] or delete the existing one [16:48:06] we'd lose the reviews thus far [16:48:09] objections? [16:49:12] New patchset: Andrew Bogott; "Typo fix" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68006 [16:51:08] mark, thanks for the heads up about the ACL bug. Is it in production everywhere? I am still seeing one too many issues in the zero.log :( [16:51:50] yurik: yes [16:54:41] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 16:54:39 UTC 2013 [16:54:48] ottomata: why? [16:55:09] alex pushed his stuff on top of my 0.7.2 attempt [16:55:20] which was creating using git-import-orig and a 0.7.2 tarball [16:55:27] it doesn't have trunk or the 0.8 tag in it [16:55:42] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [16:55:46] i'm actually not sure how he's buidling it, but he's having weird troubles too, and I told him we should just fork from apache's git repo and add a debian branch [16:55:50] there are outstanding review comments on that patchset, as long as these are not lost I don't mind much what you do [16:56:18] ok i'll see if I can place them in a comment on the new review patchset I will create [16:58:38] well [16:58:43] can we just finish up this round of reviews? [16:58:50] have something that we are all happy with [16:59:02] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 16:58:50 UTC 2013 [16:59:02] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 16:58:50 UTC 2013 [16:59:06] then mess with the repos all you want? [16:59:21] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:59:51] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [16:59:51] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 16:59:45 UTC 2013 [17:00:41] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [17:02:04] paravoid: i digged through the jar deps in buck and compiled following list: http://paste.debian.net/9714/ [17:02:33] *sigh* [17:02:57] AzaToth: are you giving up? :) [17:03:21] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 17:03:19 UTC 2013 [17:03:41] paravoid: no, not yet [17:03:47] cool! [17:03:51] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [17:03:58] am I getting to sponsor a dozen java packages then? :-) [17:04:23] paravoid: dunno what to do with sdklib.jar (I assume it's android sdk) and wtf is ddmlib [17:04:50] google says some android thing [17:05:21] and I'm uncertain if normal cassandra/thrift is needed as it's using something astyanaxy [17:05:32] libguava just needs to update [17:05:41] gerrit needs android libs?! [17:05:52] buck is made for android [17:06:03] "Buck is a build system for Android that encourages the creation of small, reusable modules consisting of code and resources." [17:06:26] dont ask me why they decided to use it [17:06:59] I don't know if all libs are needed for normal tasks [17:07:09] or if they are only needed for their developemnt/test [17:07:53] and if it really need guava 14 [17:13:24] New patchset: Asher; "twemproxy: set server_connections to 2, from default of 1" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68010 [17:18:56] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68010 [17:19:07] binasher: ugh, can you look at https://gerrit.wikimedia.org/r/#/c/67948/ ? [17:19:48] Aaron|home: sure [17:19:57] !log asher synchronized wmf-config [17:20:05] Logged the message, Master [17:20:53] !log asher restarted twemproxy on all servers [17:21:01] Logged the message, Master [17:23:40] Aaron|home: why'd you add the -r in: local subproccount=$((`jobs -pr | wc -w`)) [17:24:03] Aaron|home: are there even suspended children? [17:24:17] same for what gets killed in the sigterm trap [17:24:40] they aren't actually suspended they don't exist [17:25:04] in theory, that should happen...I couldn't find this phenomena mentioned anywhere [17:25:08] jobs -p returns nonexistent pids? [17:25:28] after a while there is always/often 1 bogus one it [17:25:32] o_O [17:25:50] it probably was valid but closed and the table still had it for a some reason [17:26:01] it seems to eventually get replaced with another one though [17:26:07] so they don't build up and up [17:26:34] (I was testing around with this script a bit and with some echo statements) [17:26:41] New patchset: Andrew Bogott; "Refactor exim::rt to use the new exim template." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68011 [17:26:46] binasher: all the more reason to do that rewrite someday ;) [17:27:42] New review: Andrew Bogott; "I'm largely flying blind here since I don't know much about exim and don't have a good setup. This ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68011 [17:27:58] Aaron|home: instead of what trap does [ trap 'pids=$(jobs -pr); if [ -n "$pids" ]; then kill $pids; fi; exit' SIGTERM ] more than once in the script, could you move that to cleanup function that trap calls? [17:29:57] I can make a follow-up commit [17:31:33] ok [17:31:45] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/67948 [17:32:33] Aaron|home: merged on sockpuppet, will be out in ~30min [17:34:46] tx [17:38:12] New patchset: Aaron Schulz; "Factored the exit trap into a function." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68016 [17:40:04] binasher: ^ [17:40:37] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68016 [17:40:45] btw, what is the sample duration for graphite? [17:42:05] AzaToth: what's buck? [17:42:12] Ryan_Lane: hell on earth [17:42:16] :D [17:42:28] Aaron|home: see https://git.wikimedia.org/blob/operations%2Fpuppet/47aefff683cfa6ffc232ff6f81425a51e836f6c2/files%2Fgraphite%2Fstorage-schemas.conf [17:42:29] well, it sounds like java, so that's not a surprise [17:42:31] Ryan_Lane: the shit gerrit devs have decided is the "ultimate build tool" [17:42:35] ahhhhhh [17:42:39] fun [17:43:02] paravoid: I think I can remove half of the deps, if we just focus on making a package that can build gerrit [17:43:40] have following still loaded: http://paste.debian.net/9726/ [17:43:47] and it seems to build [17:44:28] offcourse I can't install it, as install requires com/android/ddmlib/AndroidDebugBridge$IDebugBridgeChangeListener [17:45:16] Ryan_Lane: did I say I hate java "enlightened" programmers [17:45:18] ? [17:48:25] New patchset: AzaToth; "Initial debian build" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/67999 [17:48:33] AzaToth: :D [17:48:36] AzaToth: yeah, me too [17:49:03] New review: AzaToth; "removed half of the deps as they doesn't seems to prevent gerrit from building" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/67999 [17:57:27] mutante, if you're interested I have an rt/exim patch for you to test out. I'm happy to set it aside though if you're busy. [18:01:16] PROBLEM - Parsoid on wtp1012 is CRITICAL: Connection refused [18:01:36] PROBLEM - Parsoid on wtp1006 is CRITICAL: Connection refused [18:01:46] PROBLEM - Parsoid on wtp1010 is CRITICAL: Connection refused [18:01:56] PROBLEM - Parsoid on wtp1024 is CRITICAL: Connection refused [18:01:56] PROBLEM - Parsoid on wtp1023 is CRITICAL: Connection refused [18:01:56] PROBLEM - Parsoid on wtp1011 is CRITICAL: Connection refused [18:02:27] RoanKattouw: ^^ [18:02:30] what's up with parsoid? [18:02:40] Ahm... [18:02:47] PROBLEM - LVS HTTP IPv4 on parsoid.svc.eqiad.wmnet is CRITICAL: Connection refused [18:03:03] Crap [18:03:05] Looking [18:03:23] PROBLEM - Parsoid on wtp1004 is CRITICAL: Connection refused [18:03:23] PROBLEM - Parsoid on wtp1013 is CRITICAL: Connection refused [18:03:23] PROBLEM - Parsoid on wtp1022 is CRITICAL: Connection refused [18:03:23] PROBLEM - Parsoid on wtp1015 is CRITICAL: Connection refused [18:03:32] PROBLEM - Parsoid on wtp1002 is CRITICAL: Connection refused [18:03:32] PROBLEM - Parsoid on wtp1003 is CRITICAL: Connection refused [18:03:32] PROBLEM - Parsoid on wtp1005 is CRITICAL: Connection refused [18:03:33] PROBLEM - Parsoid on wtp1014 is CRITICAL: Connection refused [18:03:33] PROBLEM - Parsoid on wtp1021 is CRITICAL: Connection refused [18:03:33] PROBLEM - Parsoid on wtp1016 is CRITICAL: Connection refused [18:03:33] PROBLEM - Parsoid on wtp1001 is CRITICAL: Connection refused [18:03:41] PROBLEM - Parsoid on wtp1007 is CRITICAL: Connection refused [18:03:43] PROBLEM - Parsoid on wtp1020 is CRITICAL: Connection refused [18:03:43] PROBLEM - Parsoid on wtp1018 is CRITICAL: Connection refused [18:04:03] Ryan_Lane: Did you touch salt? [18:04:05] nope [18:04:16] let me see if it restarted on any of them [18:04:32] RECOVERY - Parsoid on wtp1023 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.007 second response time [18:04:32] RECOVERY - Parsoid on wtp1002 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.004 second response time [18:04:32] RECOVERY - Parsoid on wtp1024 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.007 second response time [18:04:32] RECOVERY - Parsoid on wtp1011 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.007 second response time [18:04:32] RECOVERY - Parsoid on wtp1003 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.003 second response time [18:04:32] RECOVERY - Parsoid on wtp1005 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.011 second response time [18:04:32] RECOVERY - Parsoid on wtp1014 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.009 second response time [18:04:33] RECOVERY - Parsoid on wtp1021 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.008 second response time [18:04:33] RECOVERY - Parsoid on wtp1012 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.009 second response time [18:04:34] RECOVERY - Parsoid on wtp1006 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.007 second response time [18:04:34] RECOVERY - Parsoid on wtp1016 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.014 second response time [18:04:35] RECOVERY - Parsoid on wtp1001 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.004 second response time [18:04:35] I restarted everything just now [18:04:44] RECOVERY - Parsoid on wtp1007 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.002 second response time [18:04:44] RECOVERY - LVS HTTP IPv4 on parsoid.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.009 second response time [18:04:45] I don't know why they suddenly went down [18:04:54] RECOVERY - Parsoid on wtp1004 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.003 second response time [18:04:54] RECOVERY - Parsoid on wtp1013 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.011 second response time [18:04:54] RECOVERY - Parsoid on wtp1022 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.018 second response time [18:04:54] RECOVERY - Parsoid on wtp1020 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.008 second response time [18:04:54] RECOVERY - Parsoid on wtp1018 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.004 second response time [18:05:04] RECOVERY - Parsoid on wtp1015 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.006 second response time [18:05:04] RECOVERY - Parsoid on wtp1010 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.007 second response time [18:05:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:06:14] RoanKattouw: salt has been up on them since June 07 [18:06:21] New patchset: Andrew Bogott; "Added a basic nginx module and two (labs) use cases." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43886 [18:06:32] also, I think the way salt is restarting the service will fork the process anyway [18:06:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [18:06:46] Well, somehow they all just died [18:06:54] I checked top, no running Parsoid processes [18:06:56] I'm restarting it on one box with salt [18:07:05] then I'm going to restart salt [18:07:07] to see what happens [18:07:08] OK [18:07:13] Take wtp1008 [18:07:20] too late [18:07:25] did it on 1004 [18:07:25] OK nm [18:07:38] seems the salt restart worked fine [18:07:51] so, the workaround I'm using solves our issue [18:07:59] something else is crashing it now [18:08:15] They just mysteriously went away, logs don't say why [18:08:27] It wasn't a botched start/restart attempt because then the exception would've been logged [18:08:28] getting a lot of CPU core power limit messages in dmesg [18:08:29] on 1004 [18:08:40] 1008 is depooled for exactly that issue [18:08:52] 1004 is showing it in dmesg [18:08:52] cmjohnson1: Looks like wtp1004 may have the same issue as 1008? ---^^ [18:08:57] and 1001 [18:09:04] and 1002 [18:09:11] and 1003 [18:09:17] * RoanKattouw dsh's it [18:09:21] all of them [18:09:32] sorry to be the bearer of bad news ;) [18:09:44] Yup [18:09:45] Crap [18:09:57] dsh -cM -g parsoid -o -lroot 'dmesg | grep "power limit" | wc -l' [18:10:27] cmjohnson1: Strike that, ALL of the wtp* boxes have the CPU power limit warning :( [18:10:36] * RoanKattouw files a ticket [18:13:13] https://rt.wikimedia.org/Ticket/Display.html?id=5271 [18:14:25] !log Parsoid jobs pipeline fixed with 866e66104fa6d285911da1888b9f9dcece2f53a4, pipeline should be working now [18:14:33] Logged the message, Master [18:15:00] can someone send me the output of "SELECT * FROM transcode WHERE transcode_image_name = 'Einstein.ogg';" on commonswiki [18:17:59] roankattouw: going to be moving wtp1001-2 and wtp1005-1008 (8 total) within the hour...still ok? [18:18:20] New patchset: Andrew Bogott; "Added a basic nginx module and two (labs) use cases." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43886 [18:18:33] cmjohnson1: Yeah go ahead [18:18:51] k..cool [18:20:12] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/43886 [18:25:54] j^: PM [18:26:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:27:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [18:28:26] Aaron|home: thanks, we are back in business! [18:32:57] New patchset: Bsitu; "Enable Echo interaction schema and cohort study" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68023 [18:34:51] Change merged: Bsitu; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/68023 [18:39:17] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable Echo interaction schema and cohort study' [18:39:25] Logged the message, Master [18:39:57] !log depooling and powering down wtp1001-1002 and wtp1005-1tp1008 [18:40:05] Logged the message, Master [18:40:08] !log bsitu synchronized wmf-config/CommonSettings.php 'Enable Echo interaction schema and cohort study' [18:40:15] Logged the message, Master [18:55:53] !log Cleared rows for queued transcodes on commonswiki [18:56:01] Logged the message, Master [18:56:32] Aaron|home: hmm? [18:56:45] Aaron|home: I ran a query the other day [18:56:51] Aaron|home: per j^'s request [18:57:12] maybe the same one then ;) [18:57:33] no that was related to failed jobs [18:58:53] PROBLEM - Host wtp1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:58:53] PROBLEM - Host wtp1002 is DOWN: PING CRITICAL - Packet loss = 100% [18:59:13] PROBLEM - Host wtp1005 is DOWN: PING CRITICAL - Packet loss = 100% [18:59:23] PROBLEM - Host wtp1007 is DOWN: PING CRITICAL - Packet loss = 100% [18:59:23] PROBLEM - Host wtp1006 is DOWN: PING CRITICAL - Packet loss = 100% [18:59:31] RoanKattouw: ^^^ [18:59:43] PROBLEM - Host wtp1008 is DOWN: PING CRITICAL - Packet loss = 100% [18:59:49] * paravoid senses a page coming [19:00:22] paravoid: That's cmjohnson1's doing [19:00:27] He's moving the machines between racks [19:00:34] ah [19:00:48] And there are 24 backends so LVS won't freak out until >12 are down [19:00:55] k [19:01:11] (We might want to decrease the depool threshold, actually, 0.5 is a bit high for a 24-machine cluster with our load characteristics) [19:03:22] New patchset: Ottomata; "Debianize Kafka" [operations/debs/kafka] (debian) - https://gerrit.wikimedia.org/r/68026 [19:05:39] New review: Ottomata; "This change was initially submitted under another repository at the same location. I have deleted t..." [operations/debs/kafka] (debian) - https://gerrit.wikimedia.org/r/68026 [19:08:41] Aaron|home: if the new job code with more error messages id deployed it might be worth to queue some of the jobs again with: mwscript extensions/TimedMediaHandler/maintenance/retryTranscodes.php --wiki commonswiki --error "timeout" [19:10:27] New review: Ottomata; "Alex, this builds just fine by specifying full relative paths in kafka.install, so I've removed the ..." [operations/debs/kafka] (debian) - https://gerrit.wikimedia.org/r/68026 [19:10:30] ok [19:10:46] ottomata: sigh, why did we have to move repos in the middle of the review...? [19:11:42] we had 6 patchsets or more over there [19:12:57] i thought you said that was fine [19:13:11] because it doesn't actually build properly as is, alex and I were both doing weird things to make it work [19:13:15] (not sure what he was doing) [19:13:39] hume testwikidatawiki Error selecting database testwikidatawiki on server 10.0.6.74 [19:13:45] starting scap [19:14:04] e.g. the old repo in gerrit didn't have upstream trunk committed as a branch [19:14:12] which is what we were telling gbp to use as upstream [19:14:16] 19:58 < paravoid> can we just finish up this round of reviews? [19:14:17] 19:58 < paravoid> have something that we are all happy with [19:14:17] 19:59 < paravoid> then mess with the repos all you want? [19:14:19] that was my main problem, not sure what alex's was [19:14:20] is what I said :) [19:14:25] hume votewiki Error selecting database votewiki on server 10.0.6.74 [19:14:25] oop, i thikn i missed those [19:14:38] Reedy: ? [19:14:38] i think I got the "i'm fine with it as long as we keep the review stuff" or whateer it was you said [19:14:40] any idea what that is? [19:14:56] what's done is done [19:14:58] too late now [19:14:58] paravoid [19:14:58] there are outstanding review comments on that patchset, as long as these are not lost I don't mind much what you do [19:15:03] didn't read your ones after that eek [19:15:05] sorry [19:15:08] i was in meetings just after that [19:15:11] Aaron|home: It's for securepoll use only [19:15:21] i was really careful about bringing over the outstanding comments though [19:15:26] why is it running on hume? [19:15:29] i looked at all of the comments in all of the files [19:15:34] and also on the changeset [19:15:34] It's not? [19:15:35] and do those DBs not exist? [19:15:41] and brought over the unresolved ones [19:16:00] mysql:wikiadmin@db1019 [votewiki]> Bye [19:16:12] Reedy: unless there is another server named "hume" [19:16:23] nfi [19:16:27] It's on s3 [19:20:41] * Aaron|home wonders wtf the popularity-contest cron is [19:21:03] tells debian/ubuntu what packages you have installed [19:21:08] so "what's popular" [19:21:10] what Reedy said [19:21:43] http://popcon.debian.org/ [19:22:37] !log kaldari Started syncing Wikimedia installation... : [19:22:49] Logged the message, Master [19:23:15] Aaron|home: it does tell you what it is and asks if you want to install it during Debian installation [19:25:58] New patchset: Andrew Bogott; "Add a robots.txt for all labs proxy access" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68034 [19:28:07] PROBLEM - RAID on analytics1015 is CRITICAL: Timeout while attempting connection [19:28:07] PROBLEM - RAID on analytics1012 is CRITICAL: Timeout while attempting connection [19:28:07] PROBLEM - RAID on analytics1013 is CRITICAL: Timeout while attempting connection [19:28:16] PROBLEM - RAID on analytics1017 is CRITICAL: Timeout while attempting connection [19:28:33] New patchset: Hashar; "jenkins: fix up ganglia monitoring" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68035 [19:30:16] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:30:36] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:30:36] PROBLEM - HTTP radosgw on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:30:42] ignore that [19:30:46] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:30:56] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:31:15] looking [19:31:56] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 9.811 second response time [19:32:06] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.003 second response time [19:32:27] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.003 second response time [19:32:36] RECOVERY - HTTP radosgw on ms-fe1003 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.004 second response time [19:32:36] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.005 second response time [19:34:08] !log kaldari Finished syncing Wikimedia installation... : [19:34:19] Logged the message, Master [19:34:24] did we have any network hiccups? [19:35:40] hard to know when the monitoring system does not provide root cause analysis :-] [19:35:52] New patchset: Andrew Bogott; "Add a robots.txt for all labs proxy access" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68034 [19:36:47] New patchset: Ottomata; "Debianize Kafka" [operations/debs/kafka] (debian) - https://gerrit.wikimedia.org/r/68026 [19:37:03] anyone could look at https://gerrit.wikimedia.org/r/68035 ? [19:37:57] New patchset: Andrew Bogott; "Add a robots.txt for all labs proxy access" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68034 [19:39:53] RECOVERY - Host wtp1005 is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [19:39:54] RECOVERY - Host wtp1007 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms [19:40:04] RECOVERY - Host wtp1008 is UP: PING OK - Packet loss = 0%, RTA = 0.87 ms [19:40:13] RECOVERY - Host wtp1006 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [19:40:15] no fucking wonder, reporting-setup.php is not versioned [19:40:29] * Aaron|home can stop pulling his hair out [19:40:49] $wgContributionReportingReadDBserver = 'db1025.eqiad.wmnet'; [19:40:53] RECOVERY - Host wtp1002 is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms [19:40:53] RECOVERY - Host wtp1001 is UP: PING OK - Packet loss = 0%, RTA = 1.06 ms [19:42:35] roankattouw finsihed with the move and repooled [19:42:58] Yay thanks [19:43:36] cmjohnson1: sweet, was just wondering about it ;) [19:44:36] andrewbogott: hey :) I got a mixup for a ganglia plugin monitoring jenkins https://gerrit.wikimedia.org/r/68035 should be simple enough [19:45:08] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68034 [19:45:39] LeslieCarr: hey [19:45:53] LeslieCarr: so the above ceph outage wasn't a ceph issue, surpisingly [19:46:00] !log updated Parsoid to ab8d15596f [19:46:01] LeslieCarr: network outage, asw-c-eqiad isn't happy [19:46:08] Logged the message, Master [19:46:44] Jun 11 19:27:53 asw-c-eqiad chassism[1093]: cm_pic_cleanup FPC 1 PIC 1 NULL [19:46:47] Jun 11 19:27:53 asw-c-eqiad chassism[1093]: CM_CHANGE: Member 1->1, Mode B->B, -1M 1B, GID 0, Master Changed, Members Changed [19:46:50] Jun 11 19:27:53 asw-c-eqiad chassism[1093]: CM_CHANGE: 1B 2L 3L 4L 5L 6L 7L [19:46:53] Jun 11 19:27:53 asw-c-eqiad chassism[1093]: CM_CHANGE: Signaling license service [19:46:56] ετψ. [19:46:59] er, etc. [19:47:32] New patchset: Aaron Schulz; "Disabled broken ContributionReporting crons that do nothing and clutter dberror.log." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68039 [19:47:32] it complains about licenses now [19:49:04] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68035 [19:49:29] andrewbogott: thanks. will run puppet on ganglia. [19:57:55] Aaron|home: I am going to run a maintenance script that does a massive update on an Echo table@extension1 db, is it okay to run it now? [19:58:56] I don't see why not [19:59:16] Aaron|home: thx, just want to make sure, :) [19:59:31] I'm assuming it's batched and so on, so it should be fine [19:59:47] yes, it's batched [20:00:34] there isn't anything going on AFAIK [20:02:35] bsitu: yeah, echo maint on ex1 is ok now [20:03:06] binasher: thx [20:03:09] hmm, sync-common on testwiki is suspiciously fast... [20:03:41] binasher: https://bugs.launchpad.net/graphite/+bug/949046 please [20:03:59] PROBLEM - Solr on vanadium is CRITICAL: Average request time is 1003.2449 (gt 1000) [20:04:54] Aaron|home: yep, i started building pkgs of 0.9.10 but got distracted.. we'll be running it when graphite moves to eqiad [20:06:45] paravoid: oh noes [20:06:58] :-) [20:07:51] lesliecarr and paravoid: the error may be from me unplugging the switch on c8. I am going to be replacing the 4200 with the 4500 [20:08:02] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 20:07:59 UTC 2013 [20:08:05] cmjohnson1: that is exactly it [20:08:07] the switch is plugged back in until I am ready [20:08:18] i checked the alarms and there are none [20:08:18] ah [20:08:23] cmjohnson1: yeah, so next time we should switch the mastership before unpugging [20:08:27] lots of licensing errors on that log, but that's irrelevant [20:08:30] because right now c8 is the "brain" [20:08:32] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [20:08:43] oh yeah, licensing errors always come up whenever you put a "new" switch in… oi [20:08:52] this produced a network cut that lasted more than 20s [20:08:59] ceph nodes noticed and started marking each other as down [20:09:28] probably because i pulled a piece of the network fabric out reducing redundancy (my guess) [20:09:32] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 20:09:23 UTC 2013 [20:09:34] and we have another bug (the one that's holding back full on production deployement right now) which makes recovery a PITA [20:09:43] a ceph bug that is [20:09:52] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 20:09:43 UTC 2013 [20:09:52] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 20:09:43 UTC 2013 [20:09:52] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [20:09:52] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [20:09:52] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 20:09:48 UTC 2013 [20:10:01] it makes convergence a lengthy process during which no requests can be replied [20:10:12] hence tha page [20:10:12] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 20:10:11 UTC 2013 [20:10:16] maybe this warrants an outage report [20:10:25] although it didn't affect production it seems [20:10:32] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:10:42] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [20:10:42] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 20:10:38 UTC 2013 [20:10:53] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [20:11:43] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [20:12:02] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 20:12:00 UTC 2013 [20:12:22] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 20:12:13 UTC 2013 [20:12:22] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 20:12:17 UTC 2013 [20:12:32] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:12:32] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 20:12:27 UTC 2013 [20:12:35] lesliecarr: I am going to replace the switches in about 10 minutes...do you wanna switch the mastership now? [20:12:42] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [20:12:52] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [20:12:52] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [20:13:02] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 20:12:52 UTC 2013 [20:13:12] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 20:13:07 UTC 2013 [20:13:28] cmjohnson1: yes [20:13:42] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [20:13:52] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 20:13:46 UTC 2013 [20:13:52] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [20:13:52] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [20:14:02] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 20:14:00 UTC 2013 [20:14:02] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 20:14:00 UTC 2013 [20:14:12] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 20:14:02 UTC 2013 [20:14:12] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 20:14:09 UTC 2013 [20:14:32] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 20:14:22 UTC 2013 [20:14:32] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:14:42] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [20:14:42] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [20:14:57] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [20:14:57] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [20:15:09] New patchset: Jgreen; "remove db1013" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68042 [20:15:12] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 20:15:05 UTC 2013 [20:15:22] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 20:15:13 UTC 2013 [20:15:22] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 20:15:14 UTC 2013 [20:15:22] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 20:15:19 UTC 2013 [20:15:26] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68042 [20:15:32] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 20:15:26 UTC 2013 [20:15:32] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:15:42] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [20:15:52] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 20:15:42 UTC 2013 [20:15:52] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [20:15:52] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [20:15:52] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [20:16:01] RobH: When you have time, please depool the Parsoid backends in pmtpa. In the meantime I'll try to make icinga stop whining about them [20:16:20] yea wont be today, at ulsfo [20:16:20] RoanKattouw: plz open ticket to depool if you have not yet [20:16:22] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 20:16:13 UTC 2013 [20:16:22] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 20:16:19 UTC 2013 [20:16:22] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 20:16:19 UTC 2013 [20:16:24] lesliecarr: let me know when to move fwd....also will need to replace the stacking cables...and I am sure that you will need to update the OS and firmware [20:16:25] since RobH is in ulsfo wiht me [20:16:28] LeslieCarr: Will do [20:16:32] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 20:16:22 UTC 2013 [20:16:32] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 20:16:26 UTC 2013 [20:16:32] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:16:42] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [20:16:42] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [20:16:42] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 20:16:37 UTC 2013 [20:16:52] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [20:16:52] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [20:16:52] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [20:17:00] !log disabling asw-c8-eqiad [20:17:13] Logged the message, Mistress of the network gear. [20:17:13] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 20:17:09 UTC 2013 [20:17:18] !log halting asw-c8-eqiad [20:17:23] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Tue Jun 11 20:17:15 UTC 2013 [20:17:23] RECOVERY - Puppet freshness on wtp1 is OK: puppet ran at Tue Jun 11 20:17:15 UTC 2013 [20:17:23] RECOVERY - Puppet freshness on lardner is OK: puppet ran at Tue Jun 11 20:17:19 UTC 2013 [20:17:26] Logged the message, Mistress of the network gear. [20:17:34] RECOVERY - Puppet freshness on tola is OK: puppet ran at Tue Jun 11 20:17:25 UTC 2013 [20:17:34] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:17:43] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [20:17:43] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [20:17:43] RECOVERY - Puppet freshness on kuo is OK: puppet ran at Tue Jun 11 20:17:41 UTC 2013 [20:17:53] PROBLEM - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours [20:17:54] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [20:17:54] PROBLEM - Puppet freshness on tola is CRITICAL: No successful Puppet run in the last 10 hours [20:18:42] PROBLEM - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours [20:19:22] LeslieCarr: How do I log into the icinga web interface again? [20:19:28] icinga-admin.wm.org [20:19:33] Aha [20:19:34] Thanks paravoid [20:20:25] ACKNOWLEDGEMENT - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours Catrope To be decommissioned [20:20:53] ACKNOWLEDGEMENT - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours Catrope To be decommissioned [20:21:11] New patchset: Hashar; "ganglia: define a graph view for continuous integration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68044 [20:21:18] ACKNOWLEDGEMENT - Puppet freshness on kuo is CRITICAL: No successful Puppet run in the last 10 hours Catrope To be decommissioned [20:22:00] ACKNOWLEDGEMENT - Puppet freshness on wtp1 is CRITICAL: No successful Puppet run in the last 10 hours Catrope To be decommissioned [20:24:53] RECOVERY - Puppet freshness on constable is OK: puppet ran at Tue Jun 11 20:24:44 UTC 2013 [20:24:53] PROBLEM - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours [20:25:21] ACKNOWLEDGEMENT - Puppet freshness on constable is CRITICAL: No successful Puppet run in the last 10 hours Catrope To be decommissioned [20:27:23] !log powering off asw-c8-eqiad and removing from rack [20:27:31] Logged the message, Master [20:27:33] PROBLEM - RAID on analytics1016 is CRITICAL: Timeout while attempting connection [20:27:33] PROBLEM - RAID on analytics1018 is CRITICAL: Timeout while attempting connection [20:29:43] PROBLEM - HTTP radosgw on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:30:03] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:30:13] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:30:13] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:30:23] PROBLEM - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:30:53] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 2.876 second response time [20:31:03] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.014 second response time [20:31:04] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.006 second response time [20:31:15] RECOVERY - LVS HTTP IPv4 on ms-fe.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.008 second response time [20:31:35] RECOVERY - HTTP radosgw on ms-fe1003 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.003 second response time [20:32:57] !log deploying bugzilla frontpage css/html changes (bug 22170) [20:33:07] Logged the message, Master [20:35:39] check Bugzilla front page [20:46:22] ooo, fancy underlines [20:47:13] mutante: oi. [20:47:55] odder: oi [20:48:53] mutante: http://tools.wikimedia.pl/~odder/screenshots/Bugzilla-MainPage.png [20:51:34] odder: ugh, that doesn't look that good, what browser is it? looks fine to use in Firefox,Iceweasel,Chrome etc [20:52:00] odder: did you force refresh? [20:53:19] oh I see now [20:53:25] looks really nice :) [21:00:41] about to scap... [21:06:59] PROBLEM - Puppet freshness on lardner is CRITICAL: No successful Puppet run in the last 10 hours [21:10:42] !log maxsem Started syncing Wikimedia installation... : Weekly mobile deployment [21:10:50] Logged the message, Master [21:13:19] New patchset: Demon; "Typofix and block another bot" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68109 [21:17:37] New patchset: Demon; "Rewrite old LocalizationUpdate url for core" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68110 [21:18:46] !log maxsem Finished syncing Wikimedia installation... : Weekly mobile deployment [21:18:52] cmjohnson1, can you power down mw1171 again please? [21:18:53] Logged the message, Master [21:20:57] !log maxsem synchronized php-1.22wmf5/extensions/MobileFrontend/ [21:21:04] Logged the message, Master [21:24:38] ^demon: is that for third party wikis or does it help us too [21:24:57] maxsem: sure [21:24:58] <^demon> Both? [21:25:14] oh, I'm never optimistic enough it seems [21:25:58] !log mw1171 powering down via racadm [21:26:11] Logged the message, Master [21:27:49] PROBLEM - Host mw1171 is DOWN: PING CRITICAL - Packet loss = 100% [21:29:15] paravoid, around? [21:30:12] cmjohnson1, cheers [21:30:36] yurik...he is afk atm [21:31:04] cmjohnson1, thx! [21:34:39] New patchset: Demon; "Typofix and block another bot" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68109 [21:35:33] New patchset: Demon; "Rewrite old LocalizationUpdate url for core" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68110 [21:41:29] PROBLEM - Host wtp1008 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:09] RECOVERY - Host wtp1008 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [21:44:39] Aaron|home: want me to merge https://gerrit.wikimedia.org/r/#/c/68039/ ? [21:44:47] are you sure that it's not just a feature? ;) [21:45:35] heh [21:45:50] it can be [21:46:02] ok, merging [21:46:18] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68039 [21:50:41] andrewbogott: we were still using the ve-change-marking VM you just deleted [21:51:26] New review: Cmcmahon; "Wikilove is working now, this should probably be abandoned." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/62606 [21:54:29] gwicke: Roan just told me I could delete it :( [21:54:43] andrewbogott: yeah, I just read that mail [21:54:55] sorry. Let me know if I can help getting it set back up. [21:55:00] it is not a catastrophe, we'll just need to set up a similar VM again [21:55:52] Well, the new one will be up to date I guess :) [21:55:57] Again, let me know if I can help [21:57:23] it was a VM with MW & VE that was configured to use parsoid.wmflabs.org as the Parsoid backend [21:57:38] the db had some test cases we used for VE testing [21:58:16] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68109 [21:58:49] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/68110 [22:00:55] ^demon: tell him he has until you move, once you are in SF you never want to hear about SVN again ;] [22:13:50] ^demon: merged all the way through for you [22:13:56] <^demon> ty [22:13:59] yw [22:33:19] PROBLEM - RAID on analytics1019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [22:37:27] anyone knows how HTTPS traffic goes through varnish? [22:37:42] ZERO is apparently not identified at all if its going over https [22:45:38] I'm back now [22:51:06] paravoid: can you expand on your comment on https://gerrit.wikimedia.org/r/#/c/63558/1 by any chance? [22:51:40] sure [22:52:27] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [22:53:28] basically, it's what you said on your commit message [22:56:01] paravoid: oh, ok. I don't know why, but I somehow thought you were thinking of some performance issue that would result from referencing it in site.pp. But I see what you're saying. [22:56:41] oh no [22:56:53] not one I can think of anyway [22:56:57] it's just ugly :-) [22:57:55] !log wtp1008 rebooting to check bios setting [22:58:04] Logged the message, Master [22:58:04] yes. I'm rewriting the module. [22:58:28] yurik: https traffic terminates at SSL terminators and gets proxied with X-Forwarded-Proto set to https and X-Forwarder-For set to the original IP [22:58:56] paravoid, hi, i have been looking at the logs - and all entries don't have either opera nor X-CS detection [22:59:05] (https traffic) [22:59:07] PROBLEM - Host wtp1008 is DOWN: PING CRITICAL - Packet loss = 100% [23:04:27] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [23:06:19] RECOVERY - Host wtp1008 is UP: PING OK - Packet loss = 0%, RTA = 1.31 ms [23:11:33] yurik, these are for the ones that SHOULD have matched a Wikipedia Zero partner, correct (i.e., matching IP, subdomain, and language)? i see 112 unique IP addresses making HTTPS requests in that sample. yurik, i see what you mean about all of X-Subdomain: M *never* being an HTTPS request in these logs, which is strange. [23:13:02] yurik: don't have opera? [23:13:24] yurik: https shouldn't modify anything other than XFP and XFF [23:13:27] dr0ptp4kt, according to paravoid, current varnish code does not handle our SSL proxy correctly because everywhere it compares ACL values against client.ip -- which in SSL case will be our SSL proxy [23:13:58] (as paravoid mentioned) [23:14:01] which means neither opera-mini nor carrier detection is broken [23:14:22] i meant - both broken :) [23:16:15] I don't think we ever provided SSL for zero officially [23:16:30] we don't even have a *.zero.wikipedia.org certificate [23:16:50] so if you go to https://en.zero you'll get a browser security alert [23:17:02] that's true, but zero is also working on m. for all newer carriers [23:17:42] the real question is if carrier will properly detect when users connect to https for a whitelisted host [23:17:56] without it we shouldn't show "free by XXX" [23:19:13] that's another issue, yes. [23:19:28] the point is that this was never designed/officially offered [23:19:50] feel free to coordinate this new feature internally and supply us with requirements from our side [23:19:58] I think you know well what needs changed by now [23:20:27] needs to be changed even [23:20:27] :) yes, thanks paravoid, now at least i understand the issue, will organize it through RT. Thx for your help!!! [23:21:13] I see SSL cert, carrier detection for SSL & carrier detection + opera_mini + SSL as three semi-distinct items that need fixing from our side [23:21:31] possibly more that I can't think of, but this isn't a very organized way to think :-) [23:21:42] for me to think I mean [23:24:08] paravoid, yes, thanks, i will talk to dfoy about it to schedule it and put some RT tickets in [23:24:16] cool [23:24:17] thanks! [23:27:55] how do we currently authenticate users? through SSL? [23:28:39] if you mean exclusively via SSL, no, that's not the case yet afaik [23:28:58] there's been some trouble doing so I think [23:29:00] paravoid, i meant during HTTP session [23:29:12] csteipp said the other day that this will be enabled after SUL [23:29:27] ohh boy, i hope we don't break zero with it :) [23:29:47] like if zero starts doing magic redirects to random places :) [23:29:49] there's threads about it on wikitech among others [23:30:00] Yeah, after we fix some stuff with SUL, we'll turn on wgSecureLogin [23:33:40] paravoid: mobile authenticates purely through https [23:33:52] ah [23:34:00] they have for quite a while now [23:34:12] didn't know that [23:34:15] yurik: ^ [23:34:15] that was the big push for getting the m domains working certs [23:34:40] and once they are logged in, I'd imagine all other requests are over https as well [23:34:46] could be wrong on that, though [23:35:03] Ryan_Lane, does it auto-drops back to HTTP after authentication? [23:35:09] the apps have been using https for even longer, but I think they just use the api [23:35:17] yurik: you'd need to ask the mobile team that [23:35:26] Ryan_Lane, thx, will do [23:35:29] yw [23:35:38] i.e. your team :P [23:35:46] I'd imagine it doesn't drop back, because the cookie is marked as secure [23:35:50] unless that's been chanegd [23:35:52] *changed [23:36:14] oh yes, we have a forceHTTPS cookie thing on varnish too [23:37:00] I wonder how much time I'll get on https this year [23:37:23] thanks to prism/gfw stuff I'd really like to push that forward some [23:38:10] Ryan_Lane: did you see my mail on ops@ about having it as our team's goal this year? [23:38:18] oh. I didn't [23:38:27] noone did it seems [23:38:30] I'd love to have https by default for anons [23:38:36] Ryan_Lane, it won't save anyone from zomgsecret court orders [23:38:38] that's a tough goal [23:38:50] MaxSem: of course not, but there's nothing we can do about that [23:39:39] even with https by default we'd still need to protect against downgrade attacks [23:39:54] doing HSTS properly scares me, though [23:40:35] one step at a time :-) [23:40:41] indeed [23:40:52] we can actually switch one wiki at a time [23:40:54] to anon [23:41:02] using rel=canonical [23:42:02] paravoid: ahh. it's in a reply to another email [23:42:04] now I see it [23:42:41] I'm all for this [23:46:03] MaxSem: for read logs, it would at least mean they have to have a court order / letter... which right now they can just sniff our traffic, no agreement required [23:46:45] For IP<->author, timing attacks are still going to be possible, even with https :( [23:52:17] csteipp: indeed. timing attacks are a problem with tor too