[00:00:11] New patchset: coren; "Tool Labs: HBA apparently only works with RSA" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63352 [00:00:53] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63352 [00:07:57] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 00:07:55 UTC 2013 [00:08:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:17] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 00:09:07 UTC 2013 [00:09:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:17] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 00:10:11 UTC 2013 [00:10:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:17] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 00:11:10 UTC 2013 [00:11:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:12:07] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 00:12:03 UTC 2013 [00:12:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:12:57] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 00:12:48 UTC 2013 [00:13:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:13:27] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 00:13:25 UTC 2013 [00:14:36] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:14:57] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 00:14:49 UTC 2013 [00:15:27] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:34:45] New patchset: coren; "Tools Labs: Attempt clean override of access.conf" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63354 [00:35:21] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63354 [00:39:30] New patchset: coren; "Tool Labs: Minor tweak, still access.conf" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63355 [00:40:19] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63355 [01:04:50] New patchset: coren; "Tool Labs: More data in the project store" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63357 [01:05:37] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63357 [01:22:44] !log aaron synchronized php-1.22wmf3/includes/filebackend/FileBackendMultiWrite.php '308ebddc237e76a9602b20585386fc26876f0bee' [01:28:11] !log aaron synchronized php-1.22wmf3/maintenance/copyFileBackend.php '3e103e93f01c24d7e454aaa5db07418405d273c3' [02:07:14] New patchset: Diederik; "Added per dc / server role breakdown of udp2log packetloss monitoring." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63220 [02:08:40] New patchset: coren; "Tool Labs: finish with the access.conf" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63361 [02:09:24] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63361 [02:11:47] New patchset: Diederik; "Added per dc / server role breakdown of udp2log packetloss monitoring." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63220 [02:13:17] !log LocalisationUpdate completed (1.22wmf3) at Sun May 12 02:13:17 UTC 2013 [02:14:31] New patchset: Diederik; "Added per dc / server role breakdown of udp2log packetloss monitoring." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63220 [02:16:23] New patchset: coren; "Tool Labs: Minor fix" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63362 [02:16:59] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63362 [02:30:10] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [02:34:10] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [03:03:03] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun May 12 03:03:03 UTC 2013 [03:16:26] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [03:16:26] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [03:55:17] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [04:08:03] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 04:07:56 UTC 2013 [04:08:53] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:09:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 04:09:05 UTC 2013 [04:09:53] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:10:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 04:10:07 UTC 2013 [04:10:53] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:11:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 04:11:04 UTC 2013 [04:11:53] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:11:53] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 04:11:52 UTC 2013 [04:12:53] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:13:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 04:13:08 UTC 2013 [04:13:53] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:15:13] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 04:15:06 UTC 2013 [04:15:53] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [04:49:23] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [05:13:53] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:14:43] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [05:39:29] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:40:29] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [06:21:55] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [06:22:55] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [06:30:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 06:30:12 UTC 2013 [06:31:05] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:31:45] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 06:31:35 UTC 2013 [06:32:05] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:32:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 06:32:05 UTC 2013 [06:33:05] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:35:33] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:36:23] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [06:53:13] PROBLEM - search indices - check lucene status page on search19 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern found - 60051 bytes in 0.124 second response time [06:57:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:58:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [07:17:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:18:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [07:56:02] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:02] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:02] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [08:01:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:02:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [08:07:57] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 08:07:49 UTC 2013 [08:07:57] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:37] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 08:08:28 UTC 2013 [08:08:57] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:07] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 08:09:04 UTC 2013 [08:09:57] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:14:27] PROBLEM - Packetloss_Average on analytics1003 is CRITICAL: CRITICAL: packet_loss_average is 15.0887134091 (gt 8.0) [08:14:37] PROBLEM - Packetloss_Average on analytics1006 is CRITICAL: CRITICAL: packet_loss_average is 15.2498793077 (gt 8.0) [08:14:38] PROBLEM - Packetloss_Average on analytics1004 is CRITICAL: CRITICAL: packet_loss_average is 14.6372049618 (gt 8.0) [08:14:57] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 08:14:47 UTC 2013 [08:14:57] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:27] RECOVERY - Packetloss_Average on analytics1003 is OK: OK: packet_loss_average is 0.64198720339 [08:18:37] PROBLEM - Packetloss_Average on oxygen is CRITICAL: CRITICAL: packet_loss_average is 14.8711309302 (gt 8.0) [08:18:38] PROBLEM - Packetloss_Average on gadolinium is CRITICAL: CRITICAL: packet_loss_average is 13.2656689167 (gt 8.0) [08:18:39] RECOVERY - Packetloss_Average on analytics1006 is OK: OK: packet_loss_average is -0.139292672414 [08:18:39] PROBLEM - Packetloss_Average on analytics1008 is CRITICAL: CRITICAL: packet_loss_average is 14.2065396947 (gt 8.0) [08:18:40] RECOVERY - Packetloss_Average on analytics1004 is OK: OK: packet_loss_average is 0.175152916667 [08:22:37] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is 0.655485726496 [08:22:38] RECOVERY - Packetloss_Average on gadolinium is OK: OK: packet_loss_average is 0.676446956522 [08:22:39] RECOVERY - Packetloss_Average on analytics1008 is OK: OK: packet_loss_average is -0.419277350427 [08:23:17] PROBLEM - Packetloss_Average on analytics1005 is CRITICAL: CRITICAL: packet_loss_average is 14.1588751145 (gt 8.0) [08:27:17] RECOVERY - Packetloss_Average on analytics1005 is OK: OK: packet_loss_average is 0.373761680672 [08:31:37] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:33:37] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [09:29:06] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/62998 [09:32:18] !log olivneh synchronized wmf-config/throttle.php '(Bug 48301) Add throttle exception for Haifa University workshop' [09:37:07] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:39:07] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [10:57:54] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:59:44] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [11:03:04] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:03:54] RECOVERY - DPKG on mc15 is OK: All packages OK [11:31:48] New patchset: Faidon; "swiftrepl: fetch all containers, not just the first 10k" [operations/software] (master) - https://gerrit.wikimedia.org/r/63370 [11:33:24] Change merged: Faidon; [operations/software] (master) - https://gerrit.wikimedia.org/r/63370 [12:06:18] bblack: paravoid: want to give RT a kick? (not an emergency i guess but I don't see an explanation in the SAL or from skimming a couple channels a little) [12:06:33] it's showing the default index.html that dpkg provided [12:06:44] bblack: (SAL = server admin log) [12:08:14] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 12:08:04 UTC 2013 [12:09:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:14] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 12:09:05 UTC 2013 [12:10:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:14] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 12:10:07 UTC 2013 [12:10:59] jeremyb: I have less a clue than you do about the rt machine :) [12:11:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:11:05] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 12:11:02 UTC 2013 [12:11:06] trying puppet, maybe it knows what to fix [12:12:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:12:22] bblack: i was making a wild guess that maybe puppet broke it just because i remember recent (in the last week or two) puppet changes about puppetizing it [12:12:34] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 12:12:33 UTC 2013 [12:12:39] or about making it work on labs or something [12:12:51] well, so far puppet doesn't seem to finish running in a timely manner there at all. So yeah, maybe [12:13:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:14] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 12:13:08 UTC 2013 [12:14:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 12:14:50 UTC 2013 [12:15:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:16:12] anyway, i guess it's not an emergency but leaving it broken will eventually bite someone. (i wonder if the mail is getting queued?) [12:19:51] bblack, https://wikitech.wikimedia.org/wiki/RT might help? [12:22:16] Thehelpfulone: you know who last edited that page? :-) [12:22:22] i didn't until i just checked [12:22:42] Daniel? [12:22:52] oh heh you :) [12:23:26] i guess that page *might* be useful for RT in general [12:24:08] but I'm thinking maybe this is the sort of problem that doesn't fix itself [12:24:41] it could just be as simple as apache site is in -available but not -enabled. but i guess it's probably more work to fix than that [12:24:54] and in any case if puppet's taking forever that's also a problem [12:31:04] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [12:32:18] yup [12:32:37] it's actually lighttpd for rt apparently, although apache's also on the same host for observium [12:35:04] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [12:48:11] PROBLEM - Lighttpd HTTP on streber is CRITICAL: Connection refused [12:52:50] heh, i knew it was lighttpd from the SAL but i guess i wasn't careful enough about not mentioning apache. i did think about it as i typed. :-P [12:58:11] RECOVERY - Lighttpd HTTP on streber is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 169 bytes in 0.056 second response time [13:03:07] New patchset: Matmarex; "Remove definition of wgHandheldStyle" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63371 [13:04:19] ah I've found the basic issue [13:06:07] the commit that had the typo is old, I guess some of the newer traffic just finally enabled those bits on the prod rt server instead of just labs [13:07:30] but in puppet commit e6a10f8b, the update to manifests/misc/rt-server.pp has a typo. The rt hostname for prod there is rt.wikmedia.org (no second eye in "wiki") [13:08:00] hence lighttpd doesn't match the host header and doesn't invoke the rt software for the request [13:10:57] aha! [13:11:32] what I'm lost on now is where the doc is for our procedure for making commits to the puppet repo. I assume from previous traffic we never commit directly to master there, there's some merge host [13:11:37] I haven't found the doc on it yet [13:12:38] well, first of all that repo's special: there is no master. HEAD is production [13:13:00] otherwise it's pretty much the same as any gerrit repo. you push for review [13:13:09] i'll do the commit for you if you like [13:13:23] go for it [13:14:01] that still doesn't seem to fix another issue with the root https://rt.wikimedia.org , but it does fix https://rt.wikimedia.org/Ticket/Display.html (etc) [13:14:17] did you disable puppet? [13:14:29] yeah temporarily so I could try the fix locally and not have it step on me [13:14:34] ok, good :) [13:16:48] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [13:16:48] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [13:17:14] New patchset: Jeremyb; "RT manifest: typo fix wikmedia -> wikimedia" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63372 [13:21:05] Change merged: BBlack; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63372 [13:21:06] New patchset: Jeremyb; "add wikmedia to typos" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63373 [13:21:40] i don't know for sure if that will work but it's easy enough to test i guess [13:22:32] https://gerrit.wikimedia.org/r/gitweb?p=integration/jenkins-job-builder-config.git;a=blob;f=operations-puppet.yaml;h=a6ee6617589881bd2fd2afc5ddfcac626d57c3a3;hb=refs/heads/master#l47 [13:23:25] bblack: you know how to merge on the puppetmaster? [13:23:33] nope [13:23:45] I know nothing about the jenkins/puppetmaster stuff, other than seeing traffic about it here [13:23:51] is there a wiki doc about it somewhere? [13:23:59] yeah, i'm sure. looking already [13:24:38] probably documented somewhere at wikitech.wikimedia.org [13:24:45] right [13:25:22] oh the typo fix you committed, it does fix everything. I was looking at a cached broken https://rt.wikimedia.org/ [13:25:36] ohhh, cached for me too then [13:25:48] i has a login prompt now [13:26:57] ah, after much fruitless searching, I found the obviously-named https://wikitech.wikimedia.org/wiki/Puppet :) [13:27:07] aha, https://wikitech.wikimedia.org/wiki/Puppet_usage#Updating_operations.2Fpuppet_on_production_nodes [13:27:22] right, but that page doesn't really have the answer. see my link :) [13:28:28] ok [13:28:33] have you done that stuff already? [13:28:41] no, i don't have root :) [13:28:43] ok [13:28:52] or well even shell [13:31:49] New patchset: Jeremyb; "this is a test..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63374 [13:33:00] Change abandoned: Jeremyb; "yay!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63374 [13:33:43] New review: Jeremyb; "seems to work, see I0bc0d399b06e56d8b5813e0462603bc48eb5d6f7" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/63373 [13:34:40] New review: Jeremyb; "fu I5305c89ccf7ef3f548a551ea05f1253a085445e6" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47026 [13:37:02] I did the sockpuppet merge a few mins ago, but what I can't find now is: how does stafford (the puppetmaster) get /var/lib/git/operations/puppet updated from there? [13:38:21] did you forward your key to sockpuppet? [13:38:33] i think it's a git hook that pushes it on to stafford [13:40:42] Change merged: BBlack; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63373 [13:42:25] yup, thanks, that was it [13:42:42] nice how it's just silent about the hook not working without the key [13:43:45] but where's the hook? [13:44:04] i see files/puppet/git/p*/* [13:44:08] but none of those look like it [13:44:14] it is on sockpuppet, there's a post-merge that basically ssh's to stafford and does a git pull [13:44:23] it's probably just local [13:44:43] I logged back in with the correctly-forwarded key and merged the typos change and it all went through [13:47:20] woot [13:47:27] && danke! [13:48:25] thanks for walking me through all the basics :) [13:52:00] np! [13:52:29] i even just did some RT duty now that you let me back in :) [13:52:35] 5121 [13:55:57] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [13:58:50] New patchset: coren; "Tool Labs: Moar comments in the module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63375 [13:59:49] New review: coren; "Is just comments." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63375 [13:59:49] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63375 [14:10:04] New review: Krinkle; "fixme: How is $wmgUseZeroNamespace relevant?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/62814 [14:10:40] wtf, how does Coren get a lowercase name. i wants! [14:11:57] jeremyb: I think I was grandfathered in from a bygone age. [14:12:07] ugh [14:17:37] New patchset: coren; "Tool Labs: Add mosh for non-NA friends." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63377 [14:17:46] !log fixed typo in rt.wikimedia.org vhost hostname, was breaking RT http stuff [14:17:48] Coren: NA? [14:17:55] North-American [14:17:58] ahhhhh [14:18:26] aka for Tim and yurik when he goes to south africa? [14:18:27] :-) [14:19:05] Or for the EU peeps. Sometimes get up to 300-400ms lag, and that makes interactive use teh pain. [14:19:05] bblack: so morebots isn't here. so that's the next thing to fix! :-) [14:19:13] lol [14:19:31] !log morebots is broken, someone should fix that when they read this [14:19:38] New review: coren; "Yeah, that works." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/63377 [14:19:39] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63377 [14:19:41] it missed the last few !logs [14:20:17] bblack: fenari's ~root should have docs with a password or something for wikitech-static which is where morebots lives. should just be a `service $servicename restart` [14:20:27] ok [14:20:28] or /home/w/doc [14:21:35] That bot really needs to learn to love netsplitmas [14:21:57] https://bugzilla.wikimedia.org/show_bug.cgi?id=47228 [14:22:01] hi morebots! [14:22:08] !log fixed typo in rt.wikimedia.org vhost hostname, was breaking RT http stuff [14:22:16] Logged the message, Master [14:22:22] !log restarted adminbot on wikitech-static [14:22:31] Logged the message, Master [14:25:21] i'm filling in the missing !logs now [14:26:18] New review: Krinkle; "I won't delete the files, just trim them. So re-creation won't be an issue." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61997 [14:30:17] !log imported missing !log entries from channel log [14:30:26] Logged the message, Master [14:50:06] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [14:58:47] !log Graceful reload of Zuul to deploy Ic22d7c6cb811fd7e3 [14:58:55] Logged the message, Master [15:43:43] New review: Anomie; "@Krinkle: I1b47571c added inclusion of CodeEditor when that variable is set. Ask them." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/62814 [16:08:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 16:07:55 UTC 2013 [16:08:22] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:12] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 16:09:06 UTC 2013 [16:09:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:10:12] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 16:10:04 UTC 2013 [16:10:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 16:10:58 UTC 2013 [16:11:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:52] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 16:11:49 UTC 2013 [16:12:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:12:32] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 16:12:28 UTC 2013 [16:13:23] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:52] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 16:14:48 UTC 2013 [16:15:22] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:16:23] PROBLEM - Apache HTTP on mw1220 is CRITICAL: Connection refused [16:18:02] PROBLEM - SSH on cp1044 is CRITICAL: Server answer: [16:19:02] RECOVERY - SSH on cp1044 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [16:19:22] RECOVERY - Apache HTTP on mw1220 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.065 second response time [16:22:02] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [16:23:02] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [16:24:32] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:26:32] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [17:15:26] New review: MZMcBride; "This looks fine." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/62817 [17:23:54] New review: Thehelpfulone; "Actually with https://bugzilla.wikimedia.org/show_bug.cgi?id=48379 the only one that would need chan..." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/62817 [17:29:15] New patchset: Odder; "(bug 48236) Fix login.wm.o's (and other wikis') logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/62817 [17:29:34] Change abandoned: Odder; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/62817 [17:56:10] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [17:56:10] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [17:56:10] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [18:16:35] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:18:25] RECOVERY - RAID on searchidx1001 is OK: OK: State is Optimal, checked 4 logical device(s) [18:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [18:23:16] New patchset: Odder; "(bug 48236) Update login.wikimedia.org logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63384 [18:23:48] New patchset: coren; "Tool Labs: Package request" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63385 [18:24:10] hey [18:24:16] hoy [18:24:17] dont know if you know [18:24:28] but wikipedia.de says 502 bad gateway [18:24:31] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63385 [18:24:37] de.wikipedia.org works though [18:24:40] hey coren [18:24:45] no weekend too? ;) [18:25:29] What is this weekend of which you speak? [18:25:32] :-) [18:25:39] Final sprint before the Hackaton. [18:27:14] Coren: ah :) [18:41:40] New review: Reedy; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63141 [19:09:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:10:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [19:12:01] New review: MZMcBride; "All right." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63384 [19:13:21] Reedy, could you merge that ^ when you get a minute? There'll be a few more later for some other logo changes so if you wanted to do them all in one go then that'd be fine too [19:13:39] Heh. [19:18:59] Thehelpfulone: hm? what logo changes? [19:20:47] Nemo_bis, wikimania ones mostly [19:20:48] https://bugzilla.wikimedia.org/show_bug.cgi?id=48376 [19:20:50] https://bugzilla.wikimedia.org/show_bug.cgi?id=48379 [19:20:53] https://bugzilla.wikimedia.org/show_bug.cgi?id=48236 [19:20:55] https://bugzilla.wikimedia.org/show_bug.cgi?id=48382 [19:22:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:23:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [19:24:10] New patchset: Odder; "(bug 48376) Customise Wikimania team wiki logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63389 [19:28:51] Nemo_bis: yes, where are my logos changes. [19:29:32] WHERE ARE MY DRAGONS? [19:40:26] odder: true, let's make them today [19:42:47] New review: Reedy; "The text is split onto 2 lines" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/63384 [19:55:10] LOL. [19:57:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:58:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.182 second response time [20:03:09] New patchset: Odder; "(bug 48236) Update login.wikimedia.org logo" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63384 [20:08:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 20:07:54 UTC 2013 [20:08:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:09:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 20:09:00 UTC 2013 [20:09:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:10:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 20:10:01 UTC 2013 [20:10:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:11:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 20:10:56 UTC 2013 [20:11:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:11:52] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 20:11:44 UTC 2013 [20:12:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:13:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 20:12:59 UTC 2013 [20:13:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:14:52] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 20:14:51 UTC 2013 [20:15:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:21:46] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63384 [20:24:51] New patchset: Sanja pavlovic; "Patch for worker.py. It checks for external programs existence in the initialization part." [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/63390 [20:24:55] odder: There's 4 other wikis still using 135px-Wikimedia-logo.svg.png [20:25:00] I guess they should be fixed too ;) [20:25:20] Reedy: I'll be creating new logos for those wikis [20:25:45] https://bugzilla.wikimedia.org/show_bug.cgi?id=48379 [20:25:50] There's only FDC that really needs another logo [20:26:11] * Reedy shrugs [20:26:26] I didn't do anything yet! Feel free to comment :) [20:26:28] Thehelpfulone: ^^ [20:27:03] fdc, iegcom, ombudsmen, transitionteam [20:27:22] Reedy, why not iegcom? transitionteam probably doesn't need one but ombudsmen could do with one [20:27:38] Why do they need one? [20:27:50] by new logo I just mean take the wikimedia one and put text underneath it [20:27:52] Based on this logic, loginwiki should have something more than just hte logo [20:28:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:49] I asked James about that, he said it didn't need one [20:29:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [20:29:40] New patchset: Sanja pavlovic; "Per bug #48012. Patch for worker.py. It checks for external programs existence in the initialization part." [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/63390 [20:30:11] Reedy, almost all wikis have something other than this logo - including office, collab, board, checkuser, steward, otrs, internal etc etc [20:30:19] collab's got a really fancy one too in comparison! [20:30:41] New patchset: Hashar; "contint: proxy only enabled on wikimedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63391 [20:38:32] New patchset: Odder; "(bug 48382) Customise Wikimania wiki logos with years" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63392 [20:39:28] PROBLEM - RAID on ms-be10 is CRITICAL: Timeout while attempting connection [20:40:55] New review: Odder; "Just by the way: in case the change of the 2005 Wikimania wiki seems controversial to you, please no..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63392 [20:41:34] stalker. [20:42:09] New review: Thehelpfulone; "And not split on 2 lines either! ;)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/63392 [20:43:29] Yeah, I checked it twice this time [20:43:51] Dunno what happened the last time, probably some copy+paste error in my editor :) [20:45:08] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Sun May 12 20:45:01 UTC 2013 [20:45:28] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:48:42] odder: Which editor do you use? [20:49:32] emacs [20:50:27] but I guess it depends on my terminal width, and that's why I didn't notice [21:57:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:59:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [22:31:19] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [22:32:29] RECOVERY - search indices - check lucene status page on search19 is OK: HTTP OK: HTTP/1.1 200 OK - 60075 bytes in 0.113 second response time [22:35:14] PROBLEM - Puppet freshness on db26 is CRITICAL: No successful Puppet run in the last 10 hours [22:38:51] New patchset: Aaron Schulz; "Added hook to purge CDN for thumbnails only in swift." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/63396 [23:08:59] RECOVERY - search indices - check lucene status page on search20 is OK: HTTP OK: HTTP/1.1 200 OK - 60075 bytes in 0.115 second response time [23:16:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:16:49] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [23:16:49] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [23:17:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [23:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:23:00] PROBLEM - RAID on mc15 is CRITICAL: Timeout while attempting connection [23:23:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [23:24:59] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [23:31:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:32:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.162 second response time [23:56:05] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours