[08:49:09] !log making (almost) all private wikis https-only per RT-2565, vi remnant.conf,sync,graceful... [08:49:12] Logged the message, Master [08:56:07] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:58:13] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [09:00:23] wow nice [09:00:33] and long overdue :-) [09:10:25] apergos: :) hi, linked them all from ticket now. did alli could find regardless if they are closed. "almost" because i few left i could not find (i expect closed) and 2 lokk kind of broken , redirecting to incubator [09:11:02] !log nomcom and langcom wikis look kind of broken , redirecting to pages on incubator with "Error: This page is unprefixed! " [09:11:05] Logged the message, Master [09:11:24] yeah those are probably done wrong [09:11:33] that's a great cleanup though [09:47:23] alright, sent [09:47:33] mail to the list.. cya later [09:48:04] have a good one [09:48:16] ty [10:43:53] PROBLEM - Host cp1017 is DOWN: PING CRITICAL - Packet loss = 100% [11:37:26] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [12:11:10] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [12:22:16] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:24:13] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [13:03:26] hello [13:03:36] is someone awake? :) someone with access to lab [13:03:38] labs [13:11:02] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [13:13:08] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [13:23:02] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 0 seconds [13:23:29] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [13:35:59] the container auditor seems to be the one with the issues [13:36:04] I wonder what is going on with that [13:56:47] PROBLEM - swift-container-auditor on ms-be4 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [13:57:02] and there it goes [13:57:06] grrrr [13:57:25] I'll wait a couple minutes and see if it magically restarts or something [13:58:10] it's called 'puppet' :P [13:58:36] puppet doesn't run every two minutes last I looked :-P [13:59:22] because I don't know what the auditor actually doesn, I don't know what the consequences are if it doesn't run for a little while [13:59:45] it audits objects [13:59:56] checks for corruption [14:00:05] I figured that [14:00:08] but on what basis? [14:02:24] I dunno, comparisons with other servers, checksums, etc [14:02:45] I meant more like, how does it know which objects are next to check [14:02:52] anyways I can ask ben about it later [14:05:39] or you could look it up in the swift manual [14:05:45] so ben doesn't need to :P [14:05:57] I could but my head is full of my actual work :-P [14:06:03] so is ben's [14:06:18] and his actual work is swift, where mine isn't :-P [14:08:38] but he doesn't have to look something up in the manual...he just could tell me if it's important to restart thatjob as soon as possible or if it's ok that ithere are delays [14:08:57] I can tell you [14:09:07] nothing bad will happen if you don't restart that job quickly [14:09:19] ok, well that was my original question here [14:09:24] I know [14:09:31] I was just hoping you'd be able to answer it yourself [14:10:26] I could ook up absolutely everything myself (I allready do that for a lot) but the consequences are that it takes me much loner than everyone else to do things, and I don't remember the details afterwards, ven with notes [14:10:44] so at a certain point I decide to ask people about some things [14:10:53] *longer [14:12:17] RECOVERY - swift-container-auditor on ms-be4 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [14:12:37] and there it is [14:19:10] He who asks a question is a fool for five minutes; he who does not ask a question remains a fool forever [14:30:20] funny statement from someone who has a nickname 'closedmouth' ;) [14:33:13] indeed [14:56:04] this seems like it should be a stupid question, but I'm tired of wasting time searching--where the hell is the puppet documentation for the "user" resource(?) ? [15:01:07] ahhh, nm. of course now I finally find it. [15:36:20] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [15:37:50] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.8406035135 (gt 8.0) [15:44:03] why?!?!?!? "Invalid parameter manages_homedir at /etc/puppet/manifests/misc/mwlib.pp:30" [15:50:53] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [15:52:32] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0018782883 (gt 8.0) [15:56:44] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.50495070796 [16:00:30] Jeff_Green: because the parameter is called 'managehome' [16:13:42] mark: ha. i misread the doc that talks about feature manages_home [16:14:05] all puppet types are documented here: http://docs.puppetlabs.com/references/stable/type.html [16:14:26] right--search "manages_home" on that page and you'll see what I mean [16:14:43] yeah I figured [16:14:55] documentation + user fail [16:15:07] fixing . . . [16:15:37] caught someone else--thumbs.pp has the same typo [16:15:59] hehe [16:21:59] hrm, wf? "drwxr-xr-x 2 grunny mwlib 4096 2012-03-19 15:52 pp" [16:22:29] nice, but what is it? [16:22:34] puppet should have created that as user pp, plus there's no user grunny in /etc/passwd [16:22:44] ldap madness? [16:22:50] uh oh [16:22:56] in labs? probably [16:23:02] yeah labs [16:23:35] probably the same uid of pp which puppet created [16:23:46] but that means puppet is creating conflicting users with ldap, that's worrying [16:23:52] yes it is [16:23:56] it's user 999 [16:24:18] ah [16:24:23] you're not using the 'systemuser' definition [16:24:27] see generic-definitions.pp [16:24:48] probably it won't matter though [16:24:53] since it effectively does the same [16:24:55] the example I found in git was as I did, which system => true [16:25:18] useradd talks about system users as not having the usual time limits set, and not automatically getting a homedir unless you explicitly set it [16:25:20] there are lots of bad examples around, doesn't mean you should create more ;-) [16:25:41] mark: I don't believe there's a good example anywhere in puppet for anything :-P [16:25:43] but I agree that it probably doesn't matter in this case [16:25:51] now now [16:25:52] how so? [16:26:21] safer to stick with "now now" [16:27:16] is there a style guide (even a few words? :P) for wmf puppet? or who decides how puppet should be used? just whoever happens to be making the changes? [16:27:31] there's hardly any style guide [16:27:34] we should write one I guess [16:27:40] in practice it's up to the people reviewing it [16:27:44] yes, it would be very helpful [16:27:47] I review everything that goes in [16:28:12] but the fact that it's often after the fact (after the merge), and that it's hard to comment on and keep track of fixes after it's merged makes it not work very well [16:29:57] yeah, what's the deal with allowing post merge comments/followups? on a roadmap? [16:30:08] no idea [16:31:31] where's the appropriate place to ticket for bugs in labs? [16:31:44] i'm not sure, i think it's bugzilla [16:32:01] k, i'll double check with ryan [16:32:11] what's the bug? you might want to also mention in #-labs [16:32:43] depending on whether systemuser does the right thing I'll ticket the userid conflict [16:33:18] also ran into a delete instance bug--the web UI barfed an error that it failed to remove DNS entries for an instance I deleted [16:35:02] no idea about the DNS. the systemuser thing should work... [16:36:07] I don't see how systemuser is going to do anything different than user does [16:36:13] agreed [16:36:17] but you should use it anyway [16:36:21] sure [16:36:36] it just sets overrides for homedir and managehome [16:36:43] yeah [16:36:47] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 14.6404059459 (gt 8.0) [16:36:53] Jeff_Green: and a different uid range [16:37:02] not in puppet [16:37:04] Jeff_Green: that presumably doesn't overlap with LDAP [16:37:05] it seems [16:37:17] I don't see any references to uid [16:37:22] system user isn't just a puppet thing [16:37:31] right it's a useradd flag [16:37:31] it's a linux thing with many distros [16:37:54] and I'm assuming user "system => true" triggers it from puppet [16:38:01] so, i have to assume it just passes it on and does whatever that underlying tool does for "system" users [16:38:29] yeah, I reviewed all this re. ubuntu--it's mainly about account expiration [16:39:12] so our bug is probably with the local system ldap integration itself, nothing to do with puppet [16:42:22] useradd --system is creating users counting down from uid 999 [16:42:37] without the --system flag they're counting up in the 2000's range [16:47:05] if ldap has users in the <1000 range, that's stupid [16:47:20] I can only assume it does, but I haven't figured out how to check [16:48:25] well some distros use <1000 for system, some use <500 [16:48:43] what distro are we using? :) [16:48:53] ubuntu? [16:48:56] i guess [16:48:57] unless we've configured ldap to set that for labs hosts we are broken [16:49:45] the labs environment should be setting up the UID ranges for new instances to accomodate its ldap range, not the other way around [16:51:55] * mark is looking at varnish code [16:52:12] well, if you want to change it on the ubuntu side, try /etc/login.defs [16:52:19] i've no idea about the ldap side [16:54:16] I'm going to submit it as a labs bug once ryan appears and tells me where to do so [16:55:39] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [17:06:50] holy crap (evening news)... a 70 year old guy yous into the tax office, after being sent around to a few different windows he pulls out a carbine and starts shooting (luckily no one hurt)... they think he had been sent some sort of reposession notice ... took a bunchof people hostage... they were able to arrest hi an hour or so later [17:06:59] things are getting waaaay out of hand here [17:07:11] yikes [17:08:15] I guess we'll know more in a few days [17:10:48] ;-( [17:14:42] * Jeff_Green deeper down the puppet rathole "Duplicate definition: Systemuser[gmetric] is already defined in file /etc/puppet/manifests/ganglia.pp at line 169; cannot redefine at /etc/puppet/manifests/generic-definitions.pp:45" [17:14:57] I didn't touch either ganglia.pp or generic-definitions.pp [17:15:03] I simply invoked: [17:15:09] systemuser { "pp": name => "pp", home => "/opt/pp", shell => "/bin/bash" } [17:15:41] that may just be something someone else did in labs, which is broken [17:16:00] i see [17:16:15] I think i'd better go for a bike ride before I break something. [17:16:46] it's test [17:16:52] everyone breaks stuff there ;) [17:17:00] but I guess you don't mean puppet huh ;-) [17:17:33] yeah, no [17:19:17] that is curious [17:19:37] scoping fail? [17:20:18] oh euh [17:20:26] I see [17:20:48] you did touch generic-definitions :) [17:20:54] whaaa? [17:20:58] woosters: https://rt.wikimedia.org/Ticket/Display.html?id=2355 ping? [17:21:18] 17:15:36 caught someone else--thumbs.pp has the same typo [17:21:23] did you do a search and replace or something? [17:21:29] systemuser is now reentrant ;) [17:21:37] nope, I didn't touch thumbs.pp [17:21:46] maybe you didn't [17:21:46] didn't even open the file afaik [17:21:53] but you did touch generic-definitions.pp according to my git blame [17:22:00] de059228 (Ryan Lane 2011-09-07 22:28:35 +0000 31) } [17:22:00] de059228 (Ryan Lane 2011-09-07 22:28:35 +0000 32) [17:22:00] 50cb3ea1 (jgreen 2012-03-19 16:58:06 +0000 33) systemuser { $name: [17:22:00] de059228 (Ryan Lane 2011-09-07 22:28:35 +0000 34) require => Group[$name], [17:22:04] i had generic-defs open for sure [17:22:26] interesting [17:22:34] https://gerrit.wikimedia.org/r/#change,3281 [17:22:59] coffee? ;) [17:23:00] arghabargha [17:23:12] fixing [17:23:55] trying to blow up puppet eh, i see what you're doing [17:24:10] yes, that's my plan [17:24:43] I'm going to get us off puppet by adding a drop of subterfuge to the bucket of no-subterfuge-required :-P [17:27:45] yay! that bit is unbroken. thanks for finding it. [17:27:52] yw [17:28:23] interestingly--systemuser behaves very differently than user { system=>true [17:28:40] the uid is in the >2000 range like normal non-system users [17:28:47] yeah [17:28:53] for some reason it doesn't have system => true set [17:28:59] probably that didn't exist at the time [17:29:08] and probably it should have that now [17:29:31] well . . . the problem is that will put you back in the conflicting uid range [17:29:36] yes [17:29:39] but that should be fixed anyway [17:29:49] yeah [17:32:04] hexmode - will have mutante work on it, now that r1.19 is done [17:32:31] woosters: I shall ping mutante, then :) [17:34:44] New patchset: Mark Bergsma; "Make systemuser make system users" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3284 [17:34:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3284 [17:35:15] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3284 [17:35:18] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3284 [17:35:43] mark: if you fix that without fixing the other issue, labs stuff is going to break [17:36:08] ryan should fix that by redefining the uid range in labs [17:36:15] but that's why i'm not merging this into test branch yet [17:36:26] ah ok [17:36:42] but please mention that when you're doing the bug report anyway ;) [17:37:06] i will [17:38:02] * mark considers writing a new varnish director for consistent hashing [17:42:00] PROBLEM - swift-container-auditor on ms-be2 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [17:43:39] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [17:45:36] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [17:53:33] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [17:53:33] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [17:58:11] RECOVERY - swift-container-auditor on ms-be2 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:07:56] is brewster intentionally nonresponsive? [18:09:01] or is there a new web proxy for apt fu? [18:10:25] brewster should work [18:10:45] but appears to have disk problems [18:10:50] uff ok [18:11:42] wow it's all horked up [18:11:47] yeah [18:11:49] checking [18:12:11] /home is gone [18:12:22] brewster had /home ? [18:12:34] was it an nfs mount? [18:12:38] definitely not [18:12:41] all I know is there's nothing in /home [18:13:03] when I see / at 100% I assume /tmp, /var, or /home has taken over [18:13:08] /var/log [18:13:17] it is a tiny / [18:13:29] it's mostly lighttpd [18:13:37] trash right? [18:13:42] removed [18:14:13] squid's pretty bloaty too, blasting that [18:14:43] !log Running smartctl -t long /dev/sdb on brewster [18:14:46] Logged the message, Master [18:16:02] RECOVERY - Squid on brewster is OK: TCP OK - 0.009 second response time on port 8080 [18:16:58] how long was it down? i see no mention of that check that just recovered changing before just now [18:17:15] (going back 24 hrs) [18:17:28] good question [18:18:17] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.68127149123 (gt 8.0) [18:19:57] 18 06:40:11 <+nagios-wm> PROBLEM - Squid on brewster is CRITICAL: Connection refused [18:20:00] (UTC) [18:20:04] ah [18:20:15] i was trying to get the answer from nagios, which is proving full of fail [18:21:01] i think it just doesn't go back for enough [18:21:47] what I just pasted seems to be a full 24 hrs earlier than the earliest thing in the nagios log now [18:21:58] yeah [18:22:01] SMART Self-test log structure revision number 1 [18:22:02] Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error [18:22:02] # 1 Extended offline Completed: read failure 90% 20931 959747534 [18:22:11] ugh [18:22:16] http://nagios.wikimedia.org/nagios/cgi-bin/history.cgi?host=all&archive=0&statetype=0&type=0&oldestfirst=on [18:22:20] reallocated sectors, the works [18:22:31] creating a ticket for chris [18:22:33] mark: i wonder if that explains some of brewster's other crummy behavior over the past while [18:22:47] might have, or space issues [18:26:30] !log killed kill-slow-queries on db1008 for the duration of the civicrm upgrade [18:26:33] Logged the message, Master [18:41:43] mark: i have a 1tb hdd on site...can you please bring brewster down for hdd swap [18:42:05] yes [18:42:16] going down now [18:42:22] !log Shutting down brewster for HDD replacement [18:42:25] Logged the message, Master [18:44:07] binasher: I'm thinking of writing a consistent hashing director for varnish [18:45:53] PROBLEM - Host brewster is DOWN: CRITICAL - Host Unreachable (208.80.152.171) [18:45:54] mark: that would be great. as a patch? [18:45:58] yeah [18:46:13] I think as a new director because changing the existing code to do that would be quite hard [18:46:25] currently it's a very simple hashing method which shares all code with the random and client directors [18:46:38] bolting consistent hashing onto that would be harder than just making a new director [18:46:44] a director is only like 200 lines of code anyway [18:47:03] a new director would probably be easier to get accepted upstream too [18:47:11] indeed [18:47:41] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.19666785714 (gt 8.0) [18:54:09] !log brewster HDD replacement complete [18:54:13] Logged the message, Master [18:56:08] mark: the hdd has been replaced...can you check to see if it is rebuilding [18:56:27] once it's up [18:56:40] it won't rebuild automatically, I'll have to do that manually [18:56:44] but it should come up on its existing drive [18:59:31] but somehow it doesn't :P [18:59:34] checking drac [19:04:08] mark: did you reboot brewster [19:04:15] yes [19:04:56] RECOVERY - Host brewster is UP: PING OK - Packet loss = 0%, RTA = 0.15 ms [19:05:04] it's coming up [19:06:56] great! [19:08:04] !log Rebuilding RAID arrays on brewster [19:08:07] Logged the message, Master [19:12:26] !log shutting down virt3 for memory reseating [19:12:29] Logged the message, Master [19:16:02] PROBLEM - Host virt3 is DOWN: PING CRITICAL - Packet loss = 100% [19:22:05] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.42543630631 [19:25:09] ryan_lane: it is the dimm...i moved b1 and b4 to b2 and b5 and the errors are now with b2 and b5. [19:25:24] just one dimm? [19:25:31] or both? [19:25:50] both [19:26:42] ok [19:27:11] RECOVERY - MySQL Replication Heartbeat on db1033 is OK: OK replication delay 0 seconds [19:27:16] both virts are under warranty so I will call about them both [19:27:38] RECOVERY - MySQL Slave Delay on db1033 is OK: OK replication delay 0 seconds [19:29:52] ok [19:29:56] can we bring virt3 back up? [19:30:36] yes [19:31:32] RECOVERY - Host virt3 is UP: PING OK - Packet loss = 0%, RTA = 0.51 ms [19:33:58] !log deploying new frontend squid conf to add support for mf_useformat cookie [rt 2645] [19:34:02] Logged the message, Master [19:35:12] ok [19:41:50] !log bringing virt3 instances back up [19:41:53] Logged the message, Master [19:45:20] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 12.9389685345 (gt 8.0) [20:12:29] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.797791875 [20:16:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:20:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.026 seconds [20:22:15] is this a larger issue that we should investigate or can this just be closed: https://bugzilla.wikimedia.org/show_bug.cgi?id=35293 [20:53:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 7.488 seconds [21:17:32] !log started enwiki.revision sha1 alter on production side [21:17:35] Logged the message, Master [21:17:58] yay [21:18:09] I have a bunch of dumps with empty sha1's in them :-/ [21:18:16] and with that, good night [21:20:12] PROBLEM - MySQL Replication Heartbeat on db36 is CRITICAL: CRIT replication delay 299 seconds [21:20:39] PROBLEM - MySQL Slave Delay on db36 is CRITICAL: CRIT replication delay 327 seconds [21:21:21] woo [21:33:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:34:57] New patchset: Catrope; "Apply a custom skin to Gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3285 [21:35:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3285 [21:35:34] New review: Catrope; "Setting -1 because the CSS is still marked as unstable by Timo, need to clear that with him." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/3285 [21:35:59] Ryan_Lane: ---^^ [21:38:13] !log creating "ops" db and related grants on prod db clusters 2-7 to prep rollout of ishmael / pt-digest beyond s1 [21:38:17] Logged the message, Master [21:39:15] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [21:39:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.011 seconds [21:46:45] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , frwiktionary (25984) [21:52:23] re: frwikitionary jobqueue alert, they are all of htmlCacheUpdate and refreshLinks2 variety [22:01:41] binasher: why do those silly users have to change templates? :p [22:12:24] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [22:15:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:23:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.019 seconds [22:53:54] RoanKattouw_away: ping me on that when you are ready for a full review ;) [22:53:57] until then I can ignore it [22:56:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:58:37] Ryan_Lane: I'm back now, had to run an errand [22:58:48] * Ryan_Lane nods [22:58:54] I'm going to do that proxy thing now [22:58:56] or attempt to [22:58:58] sweet [22:59:15] you wanted the url to be...? [22:59:24] Anything [22:59:28] wikimania-videos? [22:59:31] Sure [22:59:33] maybe we can reuse it for next time [22:59:41] Yeah that's a good idea [22:59:48] maybe just videos [23:01:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.889 seconds [23:01:24] New patchset: Ryan Lane; "Adding a proxy for wikimania videos" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3288 [23:01:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3288 [23:03:15] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3288 [23:03:18] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3288 [23:06:17] RoanKattouw: thanks for pinging me about the vidoes, I'll never do it otherwise :D [23:06:28] heh no worries [23:06:36] I need to move over that prototype thing now [23:06:42] Hopefully I'll get to that before my flight [23:06:48] Or maybe I don't get my passport in time and there is no flight [23:07:15] heh [23:07:58] seems to be working. now to push it out to all of them [23:09:42] having a transparent proxy that can be configured this easily is wonderful :) [23:20:15] !log added a new proxy to the ssl configuration to temporarily proxy access to wikimania videos being transcoded [23:20:19] Logged the message, Master [23:20:25] !log restarting all nginx servers [23:20:28] Logged the message, Master [23:26:23] RoanKattouw: hm. what's a valid url for the videos stuff? [23:26:30] I'm not sure if this is working [23:26:50] Let's see her [23:27:09] Whoa, someone fixed cadmium so my user exists now [23:27:29] Ryan_Lane: /Wikimania should work [23:27:50] And it works for me [23:27:59] it doesn't for me [23:28:02] using vidoes.wikimedia.org? [23:28:05] Dude even straight up https://videos.wikimedia.org/ works [23:28:05] err [23:28:07] videos? [23:28:09] really? [23:28:15] maybe I have bad cache or something [23:28:18] oh [23:28:19] Maybe http doesn't, I wouldn't know [23:28:20] DUH [23:28:22] http doesn't work [23:28:32] heh [23:28:43] I wouldn't notice, I have HTTPSEverywhere and the ruleset assumes HTTPS capability by default [23:28:52] this is of course the issue with doing it this way [23:29:03] I don't plan on fixing squid to make this work over http [23:29:10] I don't care, this is awesome [23:29:13] so, just make sure people use https :) [23:29:19] Yeah [23:30:47] !log videos is only accessible over https properly… I don't plan on fixing that ;) [23:31:00] -_- [23:31:05] where's that damn bot? [23:31:09] It's here [23:31:15] It's just not logging [23:31:19] morebots: poke poke [23:31:55] Krinkle: Hey what's the status of GerritSite.css? Is it still "unstable"? [23:36:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:40:45] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.05151869919 (gt 8.0) [23:43:00] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.039 seconds [23:46:00] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 9.22160618321 (gt 8.0) [23:57:24] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is -5.39289333333