[17:52:56] hmm. apparently my change didn't take? [17:53:00] New patchset: Ottomata; "git::clone define improvements." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5977 [17:53:00] mark: take 3 :) [17:53:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5977 [17:53:08] :) [17:53:54] meh. puppet was running while I made the change. it'll get fixed again with the next run (that I just started) [17:54:04] hmm no not like this [17:54:04] ^^^ that's about the m. LVS page that just went out. [17:54:13] git::user should not be put in that definition [17:54:17] but in your statistics classes [17:54:27] and we don't want it made on every system that uses git::clone [17:54:29] some don't need it [17:54:43] I think you should make a mwdeploy user in your stat1 classes [17:54:51] and then pass that user to git::clone [17:55:04] PROBLEM - MySQL Slave Delay on db24 is CRITICAL: CRIT replication delay 192 seconds [17:55:22] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 223 seconds [17:55:49] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 242 seconds [17:56:07] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CRIT replication delay 235 seconds [17:57:10] PROBLEM - MySQL Idle Transactions on db13 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:59:03] ottomata: i'm going off for food now, may be back later today [17:59:20] if someone else is willing to babysit that change into production that's fine as well [18:02:59] mark, who should the default owner be then? [18:03:25] if not root, and not some generic system user? [18:11:54] RECOVERY - LVS HTTP on m.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 0.108 second response time [18:14:33] ^^^ yay puppet finished on spence. [18:14:47] only 23 minutes! [18:17:10] haha [18:17:23] oh man, i *heart* naginator… neon is so much faster now [18:17:47] :-) [18:19:06] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:22:54] why is stafford going critical all the time? [18:26:23] hi guys (binasher or maybe preilly), [18:26:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.606 seconds [18:26:28] i'm really confused about varnishncsa [18:26:31] default log format [18:26:40] i've got the mw version of it installed in my local vm [18:26:44] and i get 14 fields of output [18:26:55] but I can't find anywhere where those 14 fields are defined [18:26:58] not even in the source [18:27:14] * jeremyb hands ottomata an strace ;) [18:28:33] !log making routing change, higher risk [18:28:35] Logged the message, Mistress of the network gear. [18:28:48] LeslieCarr: feeling better? [18:29:07] Ryan_Lane: test.m.wikipedia.org. 3600 mobile-lb.wikimedia.org. [18:29:54] jeremyb: mostly, ankle's sorta messed up [18:30:00] ok, please scream if anyone sees any issues [18:30:28] jeremyb, not sure how strace will help me [18:30:37] i'm looking for the string definition of a log format [18:30:57] ottomata: wrap the thing that's generating the log msgs in strace. then look at the strace output [18:31:06] RECOVERY - MySQL Idle Transactions on db13 is OK: OK longest blocking idle transaction sleeps for 0 seconds [18:31:07] ottomata: it's defined in the source [18:31:16] or just ask binasher :P [18:31:18] right, but it is not defined in the source the same way it is outputting [18:31:20] i know because i put it there [18:31:39] i get 14 fields [18:31:43] there are two definitions in the source [18:31:45] there are two possibilities in the source [18:31:46] yeah [18:32:01] both of those only have 9 fields [18:32:26] the only diff between them is one uses %h, the other uses X-Forwarded-For [18:32:40] i'm betting your source isn't patched with our changes.. [18:32:50] oh hm [18:34:26] ottomata: default can be root [18:34:32] binasher, i downloaded from here: http://apt.wikimedia.org/wikimedia/pool/main/v/varnish/ [18:34:35] mark: ok thanks [18:34:37] but we need to be careful when we use that [18:34:40] k [18:35:14] which gave you what, a deb src package? [18:35:36] PROBLEM - MySQL Idle Transactions on db13 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:36:07] AH, i grabbed the .orig.! [18:36:19] heh yeah [18:36:53] ottomata: clone operations/debs/varnish from gerrit [18:37:53] then look at debian/patches/ [18:38:59] ahhhhhhh [18:39:01] thank you [18:39:02] there it is [18:39:14] 14 fields [18:40:36] so binasher, I have some RT tickets to add some HTTP headers to log sources [18:40:56] should I make a patch for the .deb? or just specify -F in the init.d file in puppet? [18:43:01] change the invocation of varnishncsa [18:43:11] after testing on our binaries [18:43:42] ok cool, in the init.d file, right? [18:43:46] installing new varnish packages results in restarting varnish everywhere, so its much more invasive [18:43:52] yeah, [18:44:08] i have to do this for squid and nginx as well [18:44:19] i think so [18:44:31] q: [18:44:37] what is the puppet:///volatile source? [18:44:47] i think i need files from there in order to do this for squid [18:45:09] the ability to log response headers is new to varnishncsa 3.0.2, so it may or may not actually as advertised, or be stable [18:45:23] (even though the documentation said it was there in prior versions) [18:45:24] ok [18:45:54] i don't think puppet://volatile should ever be touched directly [18:46:11] check wikitech for squid config/deploy docs [18:46:26] ok [18:46:28] thanks, will do [18:48:34] New patchset: Ottomata; "git::clone define improvements." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5977 [18:48:43] mark: take 4 [18:48:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5977 [18:52:42] did you notice that misc::statistics::mediwiki has a typo? ;) [18:52:45] fix in a next commit [18:54:04] and you're missing a requirement on mwdeploy being ready before git::clonse [18:54:05] clone [18:54:08] so change include into require [18:55:52] require like that will auto include? [18:56:05] yes [18:56:08] ok [18:56:10] require class as a statement [18:56:13] um, typo?... [18:56:15] means include, plus add requirement [18:56:18] cool [18:56:20] mediAwiki [18:57:27] still don't see typo... [18:57:56] line? [18:59:45] the class name itself [19:00:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:00:54] class misc::statistics::mediwiki { [19:00:59] ah [19:01:03] ha, was looking FOR a capital A [19:01:08] k [19:01:12] hehe [19:01:43] New patchset: Ottomata; "git::clone define improvements." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5977 [19:01:50] take 5 [19:02:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/5977 [19:02:24] New patchset: Ryan Lane; "Change gerrit manifests to use and require the gerrit package, rather than installing it from puppet." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6312 [19:02:38] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/6312 [19:02:44] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/5977 [19:02:47] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5977 [19:03:10] woot [19:03:37] New patchset: Ryan Lane; "Change gerrit manifests to use and require the gerrit package, rather than installing it from puppet." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6312 [19:03:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6312 [19:08:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.184 seconds [19:18:16] ottomata: some problems [19:18:23] ja? [19:18:36] so $directory is partly being used as the containing dir now [19:18:40] and partly as the actual clone [19:19:12] i'll revert for now [19:19:21] New patchset: Mark Bergsma; "Revert "git::clone define improvements."" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6315 [19:19:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6315 [19:19:43] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6315 [19:19:46] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6315 [19:20:02] ? [19:20:08] where is it being used for the containing dir? [19:21:24] hmm I don't get it [19:21:34] err: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate definition: File[/var/lib/git/operations] is already defined in file /var/lib/git/operations/puppet/manifests/puppetmaster.pp at line 119; cannot redefine at /var/lib/git/operations/puppet/manifests/generic-definitions.pp:639 on node stafford.pmtpa.wmnet [19:21:42] but i'm not sure why, as the correct $directory was being passed [19:22:23] can I fix this tomorrow? [19:22:27] sure [19:22:33] ah, i see though [19:22:44] puppetmaster and git clone are both defining the file [19:22:46] it's getting late and my internet here is flaky ;) [19:22:51] yes [19:23:00] if it's the CONTAINIG dir I don't think git::clone should touch that [19:23:03] only the clone itself [19:23:16] yeah [19:23:18] the caller can/should take care of that [19:23:36] especially since you can have multiple clones in one dir etc [19:23:38] ah [19:23:38] directory => "$puppetmaster::config::gitdir/operations", [19:23:43] insetad of operations/puppet [19:23:47] oh [19:23:50] was I looking at the wrong part [19:23:57] oh wait [19:23:57] ah [19:23:58] damn labs [19:24:09] ag, i'm looking at the wrong cehckout too :p [19:24:19] yeah no [19:24:20] and you didn't edit the labs part in puppetmaster [19:24:21] i changed it properly [19:24:25] directory => "$puppetmaster::config::gitdir/operations/puppet", [19:24:33] yeah but look above that [19:24:42] if $is_labs_puppet_master { [19:24:42] git::clone { [19:24:43] "operations/puppet": [19:24:43] require => File["$puppetmaster::config::gitdir/operations"], [19:24:43] directory => "$puppetmaster::config::gitdir/operations/puppet", [19:24:44] yeah, just operations/ [19:24:50] oh [19:25:00] that's right? [19:25:02] right? [19:25:09] that's the labs one [19:25:10] that one is right [19:25:13] the production one, is not [19:25:14] ohhhh [19:25:21] ah! [19:25:22] i see itok [19:25:25] yeah below that [19:25:27] yes [19:25:32] i'll fix that tomorrow though [19:25:38] as I cannot seem to stay online now anyway [19:25:40] ok? [19:25:41] ok, you're on it now then? i shoudln't touch it? [19:25:43] yeah that's fine [19:25:46] yeah I'll do that [19:25:51] cooooooool [19:25:53] awesome [19:25:54] thanks [19:25:59] thanks for your work [19:26:11] yup! :) [19:29:45] !log switching vrrp mastership of row a to cr1-eqiad [19:29:47] Logged the message, Mistress of the network gear. [19:31:46] mark, if you are still there, could you give me the history of varnish::udplogger vs. varnish::logging [19:31:47] ? [19:32:36] !log reverting vrrp mastership of row a to cr2-eqiad [19:32:38] Logged the message, Mistress of the network gear. [19:41:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:45:37] RECOVERY - MySQL Idle Transactions on db13 is OK: OK longest blocking idle transaction sleeps for 0 seconds [19:49:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.250 seconds [19:59:52] RECOVERY - mysqld processes on db52 is OK: PROCS OK: 1 process with command name mysqld [20:03:10] PROBLEM - MySQL Replication Heartbeat on db52 is CRITICAL: CRIT replication delay 1885 seconds [20:04:50] PROBLEM - MySQL Slave Delay on db52 is CRITICAL: CRIT replication delay 1812 seconds [20:09:08] PROBLEM - Puppet freshness on blondel is CRITICAL: Puppet has not run in the last 10 hours [20:20:15] * Jamesofur just had Oracle try to do a MYSQL sales call on my work number… as much as I tried it's hard not to be a bit of an ass.... [20:21:23] <^demon> "Sorry, I only use web scale databases." [20:21:55] heh [20:23:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:29:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.447 seconds [20:31:56] PROBLEM - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is CRITICAL: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2563* [20:33:17] RECOVERY - ps1-b5-sdtpa-infeed-load-tower-A-phase-Z on ps1-b5-sdtpa is OK: ps1-b5-sdtpa-infeed-load-tower-A-phase-Z OK - 2388 [20:35:56] can I ask what might be a stupid question? [20:35:58] yes? ok good. [20:36:29] i'm trying to pass a quoted format argument to varnishncsa daemon in an init.d script that uses start-stop-daemon [20:36:56] the problem is that the value of the option is a quoted string of space separated fields [20:37:18] and I can't get bash or init or the daemon or something to use the quotes correctly [20:38:35] the opt and values are stored in a variable [20:38:47] all of the options are [20:39:02] but when they get passed to start-stop-daemon [20:39:09] it is like the quotes are removed [20:39:31] i'm almost certain this is a bash quoting problem that I am being extra dumb about right now [20:43:49] ottomata: So what is the daemon actually getting? Just the first item in the list? [20:44:35] yeah [20:44:46] What does your line in init.d look like? [20:46:10] And, have you tried desperate overquoting? '"like this"' or "\"like this\""? [20:46:24] have tried lots of desperateness [20:46:28] on to something mayyybe [20:46:29] but right now [20:46:38] DAEMON_OPTS="-n wmvm -w 127.0.0.1:8420 -m RxRequest:^(?!PURGE$) -F '%l %n %t %{Varnish:time_firstbyte}x %h %{Varnish:handling}x/%s %b %m http://%{Host}i%U%q - %{Content-Type}o %{Referer}i %{X-Forwarded-For}i %{User-agent}i'" [20:46:56] then later [20:46:57] if start-stop-daemon --start --quiet --pidfile ${PIDFILE} \ [20:46:57] --chuid $USER --exec ${DAEMON} -- ${DAEMON_OPTS} \ [20:47:06] i think the problem is the unquoated ${DAEMON_OPTS} [20:47:17] trying to quote that as a whole, but i'm not sure if start-stop-daemon will like that [20:47:25] the problem is that the last option [20:47:26] -F [20:47:37] takes a string that needs to be quoted due to whitespace [20:48:08] RECOVERY - MySQL Replication Heartbeat on db52 is OK: OK replication delay 3 seconds [20:48:35] RECOVERY - MySQL Slave Delay on db52 is OK: OK replication delay 0 seconds [20:49:02] if I remove the -F ... [20:49:06] from $DAEMON_OPTS [20:49:14] and pass it quoted normally to start-stop-daemon [20:49:20] right after [20:49:21] if start-stop-daemon --start --quiet --pidfile ${PIDFILE} \ [20:49:21] --chuid $USER --exec ${DAEMON} -- "${DAEMON_OPTS}" \ [20:49:22] then it works [20:49:38] (without quotes, oops) [20:49:40] but yeah [20:49:48] but I want it to be in the variable! [20:53:36] OK, and what command is actually getting run by that 'if' clause? [20:55:24] log_end_msg 0 [20:55:30] the real command is the start-stop-daemon thing [20:55:35] the if is just logging something based on the return [20:56:18] Sorry, I meant: What does start-stop-daemon --start --quiet --pidfile ${PIDFILE} etc. resolve to when the script executes? [20:57:20] tart-stop-daemon --start --quiet --pidfile /var/run/varnishncsa/varnishncsa-wmvm.pid --chuid varnishlog --exec /usr/bin/varnishncsa -- -n wmvm -w 127.0.0.1:8420 -m RxRequest:^(?!PURGE$) -F '%l %n %t %{Varnish:time_firstbyte}x %h %{Varnish:handling}x/%s %b %m http://%{Host}i%U%q - %{Content-Type}o %{Referer}i %{X-Forwarded-For}i %{User-agent}i' [20:58:18] And what do you want it to look like? [20:58:25] ('cause that's what I'd expect.) [20:58:46] i want it like that [20:58:55] but it isn't working like you think it would [20:59:01] the -F '…' looks right [20:59:16] to get that for you I just copy/pasted the line, quoted it, and echoed it [20:59:31] but whatever happens after — with start-stop-daemon [20:59:36] seems to be not interpreted correctly [20:59:57] if I paste that output directly on the CLI [21:00:03] and invoke start-stop-daemon myself [21:00:05] !log reinstalling db53. this time with correct raid! [21:00:05] it works [21:00:07] Logged the message, notpeter [21:00:24] but if the init.d script does it, no good. [21:00:59] Curious. [21:01:15] Can you set -x in your script and then see what actually happens in a log someplace? [21:01:33] * andrewbogott is not being very helpful [21:02:28] the other thing I'd do is create a script with just those two lines, and see what that does. In case you haven't done that already. [21:02:50] PROBLEM - Host db53 is DOWN: PING CRITICAL - Packet loss = 100% [21:02:54] whoaaa, never used set -x before [21:02:55] awesome [21:03:30] Yeah, bash is hard enough without wearing a blindfold :) [21:03:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:03:41] start-stop-daemon --start --quiet --pidfile /var/run/varnishncsa/varnishncsa-wmvm.pid --chuid varnishlog --exec /usr/bin/varnishncsa -- -n wmvm -w 127.0.0.1:8420 -m RxRequest:^(?!PURGE$) -F '%l %n %t %{Varnish:time_firstbyte}x %h %{Varnish:handling}x/%s %b %m http://%{Host}i%U%q - %{Content-Type}o %{Referer}i %{X-Forwarded-For}i %{User-agent}i' [21:04:32] Well, ok, that's exactly the same... [21:04:40] yeah.. [21:04:52] So probably there's nothing wrong with your arg construction, and instead something interesting happening with the context in which the init script is getting run. [21:05:02] Unless it fails when you run the script by hand as well... [21:06:17] going to write a more minimal start-stop-daemon script [21:06:21] see if i can reproduce [21:06:31] * andrewbogott nods [21:09:48] New patchset: Asher; "allocate 87% of free space to data on dbs, leaving more room for lvm snapshots" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6321 [21:10:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6321 [21:10:18] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6321 [21:10:21] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6321 [21:11:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.027 seconds [21:22:32] RECOVERY - Host db53 is UP: PING WARNING - Packet loss = 37%, RTA = 0.20 ms [21:25:32] PROBLEM - Full LVS Snapshot on db53 is CRITICAL: Connection refused by host [21:25:32] PROBLEM - SSH on db53 is CRITICAL: Connection refused [21:26:08] PROBLEM - MySQL disk space on db53 is CRITICAL: Connection refused by host [21:26:08] PROBLEM - MySQL Slave Running on db53 is CRITICAL: Connection refused by host [21:26:08] PROBLEM - MySQL Recent Restart on db53 is CRITICAL: Connection refused by host [21:26:26] PROBLEM - MySQL Idle Transactions on db53 is CRITICAL: Connection refused by host [21:26:35] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: Connection refused by host [21:26:53] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: Connection refused by host [21:29:06] woosters: whats going on with http://rt.wikimedia.org/Ticket/Display.html?id=2826 ? [21:38:16] andrewbogott [21:38:18] look at this one [21:38:29] here is my script [21:38:36] actually, i will pastie [21:39:20] !log created an ops db on all core mysql shards [21:39:22] Logged the message, Master [21:39:31] https://gist.github.com/2571642 [21:40:10] it looks like it is quoting the the vars individually?? [21:40:18] -F '"%l' %n %t '%{Varnish:time_firstbyte}x' %h '%{Varnish:handling}x/%s' %b %m 'http://%{Host}i%U%q' - '%{Content-Type}o' '%{Referer}i' '%{X-Forwarded-For}i' '%{User-agent}i"' [21:41:14] it does! Lemme see if I can make that happen here [21:42:37] ottomata: I was going to comment that it's good to use sh instead of bash since it's the LCD. [21:42:45] Coincidentally, it looks like sh does not do that weird thing [21:42:53] aye ok [21:42:56] So maybe that's a valid, if inexplicable, fix. [21:42:57] yeah, and init.d is using sh too [21:43:04] reading this [21:43:05] http://stackoverflow.com/questions/1661193/start-stop-daemon-quoted-arguments-misinterpreted [21:43:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:43:49] aye, using sh does not show weird quoting with set -x [21:43:58] but still, output does not work [21:44:03] still only getting the first arg [21:44:35] PROBLEM - NTP on db53 is CRITICAL: NTP CRITICAL: No response from NTP server [21:44:48] How do you feel about python or perl? An init.d script can be anything, I believe. [21:45:01] And all of your quoting worries will vanish... [21:45:07] ha, yeah [21:45:08] I mean, not that I'm not curious... [21:45:19] i don't want to rewrite the init.d script though for this [21:45:19] or write it as an upstart ;) [21:45:20] Putting things in an array seems smart. [21:45:22] trying to make minmal changes [21:45:28] upstart or supervisord or whatever would be way better :) [21:45:43] upstart, not supervisord [21:45:48] actually, i wanted to talk to mark, because in puppet there seem to be two ways to start a varnishncsa script [21:45:54] (aye, you guys you use upstart) [21:45:59] and one of them uses upstart [21:46:08] and my change would be more DRY [21:46:11] if I used the upstart one [21:46:11] but [21:46:18] it isn't being used anywhere currently in puppet [21:46:30] What do you get when you switch to sh? The logged command looks right to me. [21:46:35] Not that I can actually run it... [21:46:37] yeah, it looks good to me too [21:46:42] but running still has the same affect [21:47:07] Oh, yeah, probably the quotes are getting stripped by the shell. [21:48:29] you could probably escape them [21:48:40] i have tried all versions of escaping [21:49:04] gimme a way of setting the var and I have probably tried it :p [21:49:22] gwaaahh, ok, i'm going to stop working on this, and then tomorrow think about fixing up the varnish loggin gpuppet stuff [21:49:30] maybe using the upstart and DRYing the cli options [21:49:57] New patchset: Ryan Lane; "Change gerrit manifests to use and require the gerrit package, rather than installing it from puppet." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6312 [21:50:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6312 [21:51:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.689 seconds [21:51:11] ottomata: If you return to this; try replacing start-stop-daemon with with a simple tool that just does 'echo $*' [21:51:11] PROBLEM - udp2log log age for emery on emery is CRITICAL: CRITICAL: log files /var/log/squid/teahouse.log, have not been written to in 6 hours [21:51:18] ottomata: So you can see what's actually getting passed. [21:53:53] --start --quiet --pidfile /tmp/varnishncsa.pid --chuid varnishlog --exec /usr/bin/varnishncsa -- -w 127.0.0.1:8420 -m RxRequest:^(?!PURGE$) -F '%l %n %t %{Varnish:time_firstbyte}x %h %{Varnish:handling}x/%s %b %m http://%{Host}i%U%q - %{Content-Type}o %{Referer}i %{X-Forwarded-For}i %{User-agent}i' [21:54:05] looks right [21:54:06] growl. [21:54:08] yeah :( [21:54:12] welp, thanks for the tips [21:54:16] set -x is awesome [21:54:19] glad to have learned that [21:54:30] sorry that our endeavor was ultimately doomed. [21:54:37] s'ok learned some stuff [21:54:48] actually, ha, whenever I've had to deal with quoted variables in shell scripts [21:54:55] i think this is usually my experience [21:55:32] New patchset: Ryan Lane; "Change gerrit manifests to use and require the gerrit package, rather than installing it from puppet." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6312 [21:55:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6312 [21:57:25] New patchset: Ryan Lane; "Change gerrit manifests to use and require the gerrit package, rather than installing it from puppet." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6312 [21:57:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6312 [21:58:12] New review: Ryan Lane; "Need to wait for gerrit upgrade for this change." [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/6312 [22:11:36] !log upgraded percona-toolkit on coredbs to 2.1.1 - now with the potential to run online schema changes on tables without single column unique keys!! [22:11:38] Logged the message, Master [22:20:00] oooo fancy [22:24:02] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:28:21] Ryan_Lane: we have no way of deleting those wikis ? [22:28:27] no [22:28:31] we have no way to delete any wiki [22:28:34] wow [22:28:38] i had no idea [22:28:38] yeah [22:29:23] otherwise I would have deleted those wikis ages ago :) [22:31:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 6.290 seconds [22:31:20] New patchset: Asher; "lowering slow query threshold to 450ms" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6335 [22:31:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6335 [22:31:56] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6335 [22:31:59] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6335 [22:34:05] New patchset: Lcarr; "pushing new firewall builder to streber" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6336 [22:34:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6336 [22:35:10] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6336 [22:35:13] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6336 [22:37:40] I haven't seen us use gerrit for actual reviewing once since I came here [22:37:47] I wonder why we don't just push to a branch and be done with it :) [22:37:49] Ops don't so much [22:38:11] hehe [22:38:23] what's the worst that could happen? [22:38:23] and use gerrit for when, I don't know, want to /review/ things? :-) [22:38:37] there's lots of other sites on the internet :) [22:49:24] New patchset: Lcarr; "fixing hostname" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6338 [22:49:32] paravoid: will you review my changeset ? :) [22:49:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6338 [22:49:42] hahaha [22:49:49] that wasn't the point [22:54:50] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6338 [22:54:53] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6338 [22:59:17] New patchset: Lcarr; "adding in ssl" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6341 [22:59:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6341 [23:02:07] LeslieCarr: do you (or anyone else) have any idea how many page SMS do we get per month? [23:02:20] do we *send* that is, not get per individual [23:02:37] let me make a very rough count ... [23:02:49] (number of txt pages this month * number of peeps without the sms gateways) [23:05:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:07:59] RECOVERY - MySQL Replication Heartbeat on db24 is OK: OK replication delay 0 seconds [23:08:24] paravoid: i'd guess around 400-500 (we've had a bad paging month this month [23:08:35] RECOVERY - MySQL Slave Delay on db24 is OK: OK replication delay 0 seconds [23:10:24] Ryan_Lane: do we trust apache host restrictions for actual file security ? [23:10:34] what do you mean? [23:10:45] i want to put a firewall creation file on a host, but limit it to be viewed by only a certain set of people [23:10:48] not set of people [23:10:50] set of ip's [23:10:57] basically just the DC [23:11:25] well, as long as you are sure of the authentication, i don;t see the problem [23:11:36] well it would have to be http, not https [23:11:57] why? [23:12:00] so it would be a "allow from blah/24" or something [23:12:03] juniper can't do logins [23:12:08] it can only do simple file pulling [23:12:25] so, how does apache come into play again? [23:12:31] oh [23:12:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 7.165 seconds [23:12:39] the juniper is pulling from apache [23:12:42] yes :) [23:12:46] make it an internal apache [23:12:50] should have mentioned that important bit of info ;) [23:12:59] and proxy to it from another apache [23:13:08] where that proxy requires auth [23:13:36] so the one apache server would auth to the other one ? [23:13:43] so, the juniper would access the internal one, and users would access the proxy [23:13:47] ahha [23:15:04] LeslieCarr: and we want sms for the US, Netherlands, Germany and Greece? [23:16:14] austrailia would be a bonus [23:16:20] ah right [23:16:37] but those 4 are the ones we need [23:20:54] Ryan_Lane: (and anyone else here) want me to switch your paging from 24x7 to PDT hours ? (i believe it gives us midnight to 8am to sleep) [23:21:03] yes please [23:21:22] it's been driving me *insane* [23:21:36] thanks [23:21:36] shit, we can't have any more of that [23:21:40] :D [23:23:20] paravoid: actually we have 1 us person in my sms count, so something more like 200-300 a month should be more accurate, and since we'll hopefully get rid of teh false pages caused by rack saturation, i'm hoping it will go down to 100 total via sms [23:23:38] well i hope it goes to 0, but i'm a pessimist [23:40:27] New patchset: Lcarr; "new firewall site plus fixing firewall builder cron job" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6353 [23:40:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6353 [23:41:45] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6341 [23:41:47] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6341 [23:41:54] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6353 [23:41:56] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6353 [23:49:59] New patchset: Lcarr; "fixing up files according to package requirements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6355 [23:50:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6355 [23:50:38] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6355 [23:50:41] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6355