[00:25:33] anyone got a minute to check out my borked exported resources ? [00:25:33] New patchset: Asher; "fix package dependency order issue, remove hardy bits" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6360 [00:25:49] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/6360 [00:30:55] New patchset: Asher; "fix package dependency order issue, remove hardy bits" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6360 [00:31:11] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/6360 [00:42:40] New patchset: Asher; "fix package dependency order issue, remove hardy bits" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6360 [00:42:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6360 [00:43:47] finally! [00:51:40] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6360 [00:51:43] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6360 [01:01:15] starting innobackupex from db57 to db53 for new s2 slave for the one zillionth time [01:01:20] !log starting innobackupex from db57 to db53 for new s2 slave for the one zillionth time [01:01:23] Logged the message, notpeter [08:20:26] !log cherry-picked ae12df0 commit to 1.20wmf2 since there are mobilefrontend commits pending. [08:20:33] Logged the message, Master [08:25:20] New patchset: ArielGlenn; "job status update done in main todo queue so deletes work" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/6369 [08:29:18] Change abandoned: ArielGlenn; "someone (cough) commited this change later on their own" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5887 [08:30:19] New review: ArielGlenn; "(no comment)" [operations/dumps] (ariel); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6369 [08:30:21] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/6369 [08:43:20] !log kaulen: installing various upgrades (apache,mysql,cron,php-wikidiff2,...) [08:43:22] Logged the message, Master [09:20:10] !log upgrading bugzilla to 4.0.6 [09:20:14] Logged the message, Master [09:29:41] New patchset: Mark Bergsma; "Revert "Revert "git::clone define improvements.""" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6372 [09:29:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6372 [09:43:54] New patchset: Mark Bergsma; "git::clone define improvements." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6372 [09:44:12] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/6372 [09:51:59] New patchset: Mark Bergsma; "git::clone define improvements." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6372 [09:52:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6372 [09:52:27] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6372 [09:52:30] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6372 [09:58:18] New patchset: Mark Bergsma; "Remove requirement for removed file resource" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6374 [09:58:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6374 [09:59:16] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6374 [09:59:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6374 [10:09:47] !log Started distribution upgrade of server sockpuppet from Lucid to Precise [10:09:50] Logged the message, Master [10:37:59] New patchset: Mark Bergsma; "Don't install dashboard on sockpuppet, it scales very badly" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6375 [10:38:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6375 [10:38:45] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6375 [10:38:48] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6375 [10:40:59] !log refreshLinks.php - started it once again in a screen on hume, just for s1. last cron failed with "mwscript command not found"?? well now it is there again and running [10:41:02] Logged the message, Master [11:05:50] mutante, what can I do now to help you with wikistats? [11:06:08] check that list of wikis you published on https://bugzilla.wikimedia.org/show_bug.cgi?id=36268 ? [11:06:29] * Nemo_bis hopes the answers is not "learn git and commit patches" [11:08:17] Nemo_bis: well.. both :) hehe. checking that list would be cool. using git would be awesome, but you don't have to [11:08:53] putting stuff in bugzilla tickets is a first step [11:09:43] Nemo_bis: i would like those URLs replaced with links to api.php or Special:Statistics?action=raw [11:10:01] could script it, but we imported thousands and these are just 56 special cases left [11:10:27] and you never now about special installations having weird URL rewrites and stuff [11:24:01] New patchset: Mark Bergsma; "Add the Dell R300 platform to base::platform" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6378 [11:24:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6378 [11:24:42] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6378 [11:24:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6378 [11:35:32] New patchset: Mark Bergsma; "Dell R300 uses serial speed 57600" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6379 [11:35:50] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6379 [11:35:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6379 [11:36:01] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6379 [11:36:04] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6379 [11:45:17] i'm going to dist upgrade stafford in a bit [11:45:23] so puppet will be temporarily unavailable [11:46:52] k [11:52:49] !log Started distribution upgrade of server stafford from Lucid to Precise [11:52:52] Logged the message, Master [11:57:59] mutante, those URL are something I could fix with your shell script, right? [12:02:07] Nemo_bis: eh no, the URLs in that bug above are those left the script does not get, because they are nonstandard. so either manually because it is a really low percentage and seems like a one-time job, or enhance script [12:03:03] the script works if you either have an "api.php" or a "Special:Statistics", but not for stuff like "Spezial:" "Speciaal" , language thing [12:03:18] mutante, I meant the script to add an API URL at a time [12:04:00] true [12:04:18] it had a "change URL" feature [12:05:25] these are old additions, you could say we don't accept any non-standard ones like this if somebody wants to add [12:06:47] New patchset: Pyoungmeister; "correecting ian baker's pubkey." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6380 [12:07:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6380 [12:07:38] New review: Pyoungmeister; "safe to be pushed out any time. when stafford is back up." [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6380 [12:07:41] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6380 [12:08:11] which is almost ;) [12:10:26] and how do I know that is safe to be pushed out? ;) [12:12:09] notpeter: where is that RT ticket? [12:15:09] New patchset: Mark Bergsma; "Revert "correecting ian baker's pubkey." No idea how I'm supposed to verify whether this is genuine." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6381 [12:15:18] :-]] [12:15:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6381 [12:15:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6381 [12:15:43] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6381 [12:15:54] mark: we had a long discussion about updating ssh keys with ^demon and Ryan [12:16:03] not sure we wrote a process though :-( [12:17:40] http://rt.wikimedia.org/Ticket/Display.html?id=1329 [12:17:49] http://rt.wikimedia.org/Ticket/Display.html?id=2835 [12:17:54] roan fucked up his key [12:18:02] might be nice to mention those in the commits [12:18:08] I couldn't find them with dumb rt search [12:18:16] feel free to revert my revert [12:18:21] but I'm just making a point :) [12:18:25] did you look at the diff? [12:18:29] yes I did [12:18:32] the difference was == at the end [12:18:35] I noticed [12:19:04] ok I wasn't making a point, I was destroying two = ;-) [12:19:30] I'm going to go drink coffee. [12:25:51] drop table; http://pinterest.com/pin/11610911509612074/ [13:04:17] hashar: how is gallium doing with the query log? [13:04:51] I havent checked [13:04:59] wait [13:05:07] aren't logs written in /var/log/mysql/ ? :D [13:05:13] if so I am 100% sure I can't read them [13:05:44] oh, yeah, you did not specify a different log location for the slow query log right [13:06:17] i commented about that, because i did not see anything in mysql log, at least not right away after the config was changed. looking.. [13:06:52] /var/log/mysql just nothing but error.log and error.log-old [13:07:05] maybe that is logged somewhere else [13:07:15] or there are no slow queries (doubtfull) [13:07:26] yeah, but expected default, there did not seem to be a path in that config [13:08:16] what should i look for besides "slow" [13:10:28] mutante: looks like the default is hostname-slow.log [13:10:35] aka: gallium-slow.log [13:10:44] in the data directory [13:11:03] ah, correct:) [13:11:10] so should be: /var/lib/mysql/gallium-slow.log [13:11:14] you got them [13:11:26] who wants to look at them [13:11:36] me / ^demon I guess [13:12:27] maybe we could write a basic shell scripts wich would allow us to : sudo cat /var/lib/mysql/gallium-slow.log [13:12:47] or have the file written somewhere where we can read from [13:13:08] yeah, didnt you want it in /home/ somewhere [13:15:45] hashar: it does not exist, but /home/wikidev would make sense i guess, with you guys being in that group [13:17:32] hashar: copied gallium-slow.log to your home [13:18:07] danke schoen [13:18:12] de rien [13:18:50] oh Google gave me a link to Ferris Bueller's soundtrack: "Danke Schoen" http://www.youtube.com/watch?v=ru-wvqo1SFY [13:19:06] Ferris Bueller's Day Off is an amazing movie btw [13:19:23] hehe, yea, it is on the list of classics i guess [13:19:40] even a cult one [13:20:40] ^demon: you also got a copy of gallium-slow.log in gallium home [13:21:02] <^demon> Just what I wanted? :) [13:21:18] ^demon: that is a slow query log for Krinkle [13:21:25] he wants to enhance testswarm queries [13:21:45] <^demon> Yeah I knew what it was for, just didn't know that I had asked for a copy :p [13:21:54] oh [13:22:06] I have been bold and figured out you could use one [13:22:18] so I can tell Timo : "no time, ask Chad he got a copy" [13:22:21] :-]]]]]]]]]]]]]]]]]] [13:22:46] * mutante hides ands lets you forward it to krinkle [13:23:49] <^demon> And I'll say to Timo: "Hashar's crazy if he thinks I'm doing his work." ;-) [13:23:55] (is not a user on gallium) [13:24:01] :)))))))))) [13:24:37] New patchset: Mark Bergsma; "Add precise to the default list of pbuilder distributions" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6385 [13:24:48] laughed so loud that most people at the coworking place stared at me [13:24:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6385 [13:25:11] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6385 [13:25:14] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6385 [13:28:36] is there a way to configure gerrit to only show some projects without having to modify your search for it all the time? :) [13:28:43] I don't want to see all the mediawiki crap ;-) [13:33:27] <^demon> mark: Upstream feature request to "save searches" [13:34:08] awesome [13:36:15] or we could disallow you access to mediawiki projects :-] [13:37:50] <^demon> That could work too ;-) [13:38:24] that's a good idea [13:38:27] I'll remove my access [13:39:00] <^demon> Well we'd have to make a new group, put mark in it and then set DENY -> READ on mediawiki/* [13:41:11] group could be named something like `pureops` `top-of-food-chain` [13:42:02] <^demon> `people-bothered-by-clutter` [13:42:18] people-who-hate-web-scripts [13:42:28] ;) [13:43:54] heyaaaaa [13:44:08] mark, if you got a sec, could you tell me about the different varnish::logging stuff in puppet? [13:44:13] there seem to be two ways to set things up right now [13:44:17] but one isn't really used [13:44:18] yeah [13:44:34] one is dirty, one is cleaner but wasn't working [13:44:37] then notpeter did some work on it [13:44:41] and right now i'm not sure which one is used [13:45:01] varnish::logging is used [13:45:04] i think it is the old one [13:45:06] using the init.d script [13:45:16] varnishncsa is the new one, using upstart [13:45:23] just modify the format string in both [13:45:25] and you're safe [13:45:48] well, i'm having trouble modifying it in the init script, due to really annoying start-stop-daemon argument quoting [13:45:48] upstart was a bit problematic in lucid [13:45:55] but i'm just now working on getting varnish on precise [13:45:58] so it migth work better then [13:46:09] i feel like i'm doing something dumb, but andrewbogartt and I looked at it for a while then, and had so much trouble [13:46:16] I can do it in an in elegant way [13:46:24] and make a special variable in the init script [13:46:28] that I can set in /etc/defaults [13:46:43] hm, ok [13:46:52] if the upstart stuff doesn't really work [13:46:55] <^demon> mark: The other good way is to avoid using All -> Changes and use My -> Changes and My -> Watched Changes more. If you add operations/puppet to your watchlist in your settings, they'll show up in Watched Changes. [13:47:06] alright [13:47:20] i guess i'll do my inelegant way, with the hope that varnish puppet will be cleaned up later [13:47:26] i was gonna ask if I should work on making it clean now [13:47:32] buuuut, if upstart doesn't work right..meh? [13:47:51] you can try ;) [13:47:53] or you can wait [13:49:09] well, i'd try if upstart worked right in lucid [13:49:11] dont' want to fight that [13:49:31] yup but at least those varnish boxes won't be on lucid for much longer [14:03:56] Could someone put http://noc.wikimedia.org/~reedy/upload-1.19.0.tar on dataset2:/data/xmldatadumps/public/mediawiki and then extract it please? [14:05:48] <^demon> That really needs a new home. [14:22:13] Indeeeed [14:23:57] sometimes I still don't get git at all. [14:26:48] Reedy: check that it looks ok please [14:29:31] apergos: looks good. Thankyou [14:29:38] sure [14:35:19] we should get jenkins to build deb packages for us [14:35:51] <^demon> That's doable. [14:36:04] <^demon> Jenkins can do pretty much anything if you can script the process. [14:36:40] the script is called git-buildpackage ;) [14:37:56] <^demon> Yeah, all we'd need is a build.xml for ant to know what to do. [14:40:10] ant is not required [14:40:21] we could use rake / a shell script / a c program :-] [14:40:23] and some standard on what distro to build it for based on branch name [14:40:39] and/or releases vs snapshot builds [14:40:49] the good thing with ant is that Jenkins as additionals and nice supports for it [14:40:51] although I guess the latter would ideally stay outside of that [14:42:09] <^demon> https://wiki.jenkins-ci.org/display/JENKINS/Plugins has so many toys we can look at too. [14:42:21] <^demon> Might be some people who've done the heavy lifting already and plugin-ized it. [14:43:46] !log Built varnish for precise as 3.0.2-2wm5 and imported it into APT repository precise-wikimedia [14:43:50] Logged the message, Master [14:43:52] <^demon> There's also a couple of publish/deploy plugins that are kind of cool. [14:44:22] and the winner might be : https://github.com/mika/jenkins-debian-glue#readme [14:45:01] * mark adds it to his list as a berlin hackathon topic  [14:45:14] <^demon> hashar: Ooh, that looks cool. [14:45:15] bug report it ? :-D [14:45:30] that would mean I'd need to figure out how bugzilla works [14:45:41] Wait, bugzilla works? [14:45:48] no clue [14:46:03] ohhhhhhh [14:46:57] since the author is offering consulting services, we could just have him set it up for us ;-]] [14:47:59] I could just have you set it up for us. [14:48:06] :-))) [14:48:09] notpeter: Good morning. So what's the RT ticket situation exactly? [14:48:12] probably going to do that [14:50:51] mark: bug request is https://bugzilla.wikimedia.org/show_bug.cgi?id=36443 [14:51:15] CCed you [14:51:49] my internet is sllowwwww [14:51:52] still loading [14:52:21] ty :) [14:56:01] New patchset: Mark Bergsma; "varnish (3.0.2-2wm5) precise-wikimedia; urgency=low" [operations/debs/varnish] (master) - https://gerrit.wikimedia.org/r/6391 [14:56:26] New patchset: Ottomata; "RT 2255 - adding Content-Type response header to varnish log lines." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6392 [14:56:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6392 [14:56:54] mark, could you check that one for me? I don't like how I did it but I think it is good enough for now [14:57:37] New review: Mark Bergsma; "(no comment)" [operations/debs/varnish] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6391 [14:57:39] i gotta run for 15 mins, be back in just a bit, leave comments in gerrit [14:57:39] Change merged: Mark Bergsma; [operations/debs/varnish] (master) - https://gerrit.wikimedia.org/r/6391 [14:58:33] OH [14:58:34] mark! [14:58:38] oh nm [14:58:44] that is varnish, not varnishncsa [14:58:53] if you are rebuilding varnishncsa for precise [14:58:54] can we make a change to the default log line? [14:59:05] ack, brb [14:59:48] New review: Mark Bergsma; "Has this been tested?" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/6392 [15:00:00] varnishncsa is part of varnish [15:00:26] we could, but this should be configurable anyway [15:00:36] we don't want to roll a new varnish build every time a new log format is needed [15:01:06] New patchset: Catrope; "Resubmit "correecting ian baker's pubkey."" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6393 [15:01:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6393 [15:01:27] robh: can you please take Bellin out of rotation for hardware test [15:01:33] !rt 2872 [15:01:33] https://rt.wikimedia.org/Ticket/Display.html?id=2872 [15:02:29] cmjohnson1: ok, so notpeter put in this ticket [15:02:29] robh: also mw64 [15:02:36] !rt 1890 [15:02:36] https://rt.wikimedia.org/Ticket/Display.html?id=1890 [15:02:41] so when its something of his, feel free to ping him if he is aobut [15:02:46] ok [15:02:56] mostly cuz he would know if its still doing work or is out of cluster [15:03:08] but i am checking, cuz he may not be responsive at the moment [15:03:27] it looks to be offline [15:03:56] !log bellin crashed, unresponsive to ssh or serial console [15:03:58] Logged the message, RobH [15:04:06] cmjohnson1: so bellin is crashed, offline, you can do what you need to with it [15:04:56] !log shutting down mw64 for hw test per rt 1890 [15:04:58] Logged the message, RobH [15:05:05] robh: thx [15:05:10] cmjohnson1: ok, all yours [15:07:17] cmjohnson1: thank you! [15:08:30] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6393 [15:08:33] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6393 [15:09:20] cmjohnson1: after a lot of head-scratching, I realized that it couldn't be done over console redirection [15:09:30] cmjohnson1: and yeah, ask me anything you like! [15:12:42] RobH: are you in the dc? [15:13:19] nope, but plan to be later this week [15:13:20] hmm actually it doesn't matter nm, these hosts aren't there [15:13:23] need something tested? [15:13:34] mark: could you potentially review https://gerrit.wikimedia.org/r/#change,5813 ? It is to let the labs people send Apache syslog to a specific non-production server [15:13:40] no, we probably need someone to look at a few hosts and see whats up with them [15:14:07] notpeter: i see where you tried [15:15:07] heh [15:16:37] hmm maybe someone got to them already [15:16:38] nice [15:21:23] mark: thanks! [15:24:07] New review: Ottomata; "Mark, yes, on my local VM. I tried so many different ways of making this more elegant, and this is ..." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/6392 [15:24:51] is bits.wikimedia.org have problems again? [15:25:23] Jeff_Green: ^^ [15:25:54] It is not loading on: [15:25:55] https://www.mediawiki.org/wiki/Special:Code/Wikimedia/1662 [15:26:51] jpostlethwaite: this is the first I've heard of anything [15:27:04] i loaded that once and it came up without the bits, then with bits the second time [15:27:22] It just started for me [15:27:29] ok [15:27:32] bits is loading [15:27:34] for me [15:27:48] thanks :) [15:27:55] huh [15:28:06] lemme know if you notice it again [15:28:20] notpeter or apergos, who wants to help me figure out why I can't write to the pagecounts-ez rsync module on dataset2 from stat1? [15:28:45] did youget someone (e.g. me) to create a writeable directory over there for testing? [15:28:50] I bet you forgot to do this [15:29:05] needs to be writeable by the user specified in the module [15:29:24] hm, ok [15:29:37] can you check then? the module is running as backup i think [15:30:04] yes it is [15:30:09] /data/xmldatadumps/public/other/pagecounts-ez [15:30:11] is the directory [15:30:18] there is now a directory "testing" in there that you can write to for... testing, duh [15:30:20] what are perms/ownership on that? [15:30:24] ok [15:30:28] it's all owned by ezachte [15:30:35] because he writes in there directly right now [15:30:37] Reports of bits issues in -tech [15:31:01] when you're ready to move, as I mentipned earlier, we'll switch perms on the entire dir recursively to the right user [15:31:07] !log no nagios bot, kicking nagios on spence [15:31:09] Logged the message, notpeter [15:31:17] cool!, that works, hm ok [15:31:24] happy script-writing [15:31:43] ezachte is going to be rsyncing from stat1…if we make rsync module gid=wikidev and chmod -R g+w on that dir [15:31:48] would that be good enough? [15:32:03] maybe chgrp -R wikidev as well? [15:32:14] Reedy: bits was out for a few minutes for me and came back up [15:32:24] because uh, we are ready to move :) [15:32:27] I'd rather just make the owner/group be backup on the dir contents [15:32:31] i could not load code review [15:32:34] just this and a DNS change and stat1 will be finished [15:32:40] that's fine with me [15:32:43] so if he wants to do that now I'll do the chmod right this second [15:32:47] it appears to be working now [15:32:52] aye beacuse he might be using nfs from bayes [15:32:58] hm, can we backup:wikidev it? [15:33:07] g+w, just in case? [15:33:31] ottomata: do you want to upgrade stat1 to precise? [15:33:34] I thought we want to not do nfs any more [15:33:37] um, sure? [15:33:41] right we don't [15:33:45] while you're still working on it [15:33:46] but since we are not 100% using stat1 yet [15:33:50] and today is report card metrics meeting day [15:33:52] saves everyone a later upgrade [15:33:58] he might be using bayes and writing there [15:33:59] i doubt it though [15:34:05] but if we make it g+w :wikidev [15:34:15] then at least he can still write on bayes (which will be deprecated once stat1 is finished) [15:34:33] when can we decommission bayes you think? [15:35:12] I don't want to leave it that way [15:35:28] when do we expect him to be using stat1 exclusively? [15:37:29] let's wait with upgrading to precise until friday [15:37:38] for stat1? [15:37:40] yes [15:37:42] I thought it wasn't used yet? [15:37:47] it is [15:38:08] as of last week, erik z and andre engels are running their scripts on stat1 [15:38:10] I'm hoping for a reinstall instead of an upgrade btw [15:38:16] because that's also a good test for the puppet manifests [15:38:29] fine either way [15:38:37] but it can wait until friday of course [15:38:41] ottomata? [15:38:42] no rush [15:38:53] just saying, now would be a good time to upgrade while it's all fresh on everyone's mind [15:38:57] instead of a year from now or so ;) [15:39:25] grrr [15:41:27] apergos: (about using stat1 exclusively) and I assume you are referring to Erik Z: as soon as we know 100% that stat1 has all the functionalities, libs, data etc that bayes has and that erik can generate everything without problems [15:41:40] is there any eta? [15:42:04] in other words, should I ask again in e.g. a week? 2 weeks? [15:42:05] i would think within 2 weeks, [15:42:08] ok great [15:42:09] or less [15:42:10] * apergos makes anote [15:42:27] just keep poking me and erik z, i will also keep reminding him that we really want to move away from abyes [15:42:59] yes indeed [15:43:45] awesome [15:44:00] so after friday we can reinstall stat1 [15:44:09] then it can start to be used [15:44:18] and then after a week or two, we can decommission bayes [15:44:20] right? [15:44:25] if all goes well of course [15:44:27] yes [15:44:31] cool [15:44:32] (if all goes well) [15:44:52] maybe we should email this to both erik z, andre engels, me, andrew, and apergos [15:44:57] so everybody is in the loop [15:45:06] yes [15:45:11] fine by me [15:45:20] and since that's your excellent idea, i'll let you have it ;-) [15:45:39] love it [15:46:55] mark: I have made the $syslog_server determined in realm.pp and base::remote-syslog a class with param https://gerrit.wikimedia.org/r/#change,5813 [15:48:25] there are multiple errors in there [15:48:30] first, you can't reassign variables in puppet [15:48:40] second, you're mixing a class parameter with a global variable [15:48:50] i'm not sure what's that supposed to do :) [15:49:38] ah I did not know we could not reassign variables :( [15:49:55] about mixing, do you mean $syslog_server and $::syslog_server ? [15:53:12] Reports of bits 503s in #wikimedia-tech [15:57:24] notpeter: can we change the warning for inactive filters from 6 to 18 or 24 hours? [15:57:33] right now we are getting way too many false warnings [15:58:31] drdee_: sure [15:58:39] thx [15:58:59] ag, internet at this cafe is pooopy [15:59:09] missed everything since :34 of this hour [15:59:38] so apergos, if you responded to me I missed it [16:00:15] it's because your response to me never made it here [16:00:22] what you missed was: [16:00:27] mark: updated with at patchset3 https://gerrit.wikimedia.org/r/#change,5813 [16:00:30] we expect him to be off of bayes in a couple weeks [16:00:41] mark: though I need to run out of coworking space right now :/ Will follow up later tonight [16:00:44] an email will go around summarizing stuff like the install timeframe etc [16:01:05] ? [16:01:07] that's not puppet syntax hashar ;) [16:01:10] i said: [16:01:10] he might be using bayes and writing there [16:01:10] i doubt it though [16:01:10] but if we make it g+w :wikidev [16:01:11] then at least he can still write on bayes (which will be deprecated once stat1 is finished) [16:01:15] mark: :-]]]]]]]]]]]]]]] [16:03:47] (06:35:12 μμ) apergos: I don't want to leave it that way [16:03:48] (06:35:28 μμ) apergos: when do we expect him to be using stat1 exclusively? [16:04:08] since you were gone dr dee and mark and I had the rest of the conversation ... sorry about that [16:04:22] mark: updated again :D https://gerrit.wikimedia.org/r/#change,5813 [16:04:47] basically I've done the chgrp/chmod and you can addbackup to the wikidev group but I want to undo all that once he's off bayes [16:06:11] New patchset: Pyoungmeister; "decreasing sensitivy on udp2log stale logs files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6401 [16:06:16] oh ok, cool [16:06:20] that's fine [16:06:23] lemme try to write to regular dir [16:06:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6401 [16:06:53] back later tonight [16:07:26] apergos, still have permission probs when writing to pagecounts-ez dir [16:07:31] rsync: mkstemp "/.test_file.bWBB18" (in pagecounts-ez) failed: Permission denied (13) [16:07:41] because backup isn't in the wikidev group [16:08:01] which is why I said you would want to add it (in puppet) [16:08:22] oh. hmm [16:08:29] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6401 [16:08:32] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6401 [16:08:38] you mean as ezachte you can't write [16:09:08] ? [16:09:58] or do you mean that the rsync doesn't work? if it's the rsync it's the reason I gave above [16:10:33] i mean rsync, but ah right [16:10:39] should I make the rsync module gid=wikidev? [16:10:43] or add backup to the wikidev group? [16:10:59] for right now why not make the rsync module gid wikidev [16:11:07] k [16:11:09] thanks [16:11:42] !lastlog mw61 [16:13:14] New patchset: Ottomata; "Making pagecounts-ez rsync module on dataset gid = wikidev." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6402 [16:13:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6402 [16:14:03] apergos: approvy woovy? [16:15:33] New review: ArielGlenn; "temp workaround til stats stuff is moved off of bayes" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6402 [16:15:36] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6402 [16:16:11] !log starting innobackupex from db1040 to db1022 for new s6 snapshot slave [16:16:14] Logged the message, notpeter [16:16:17] cool, thanks, how long does it usually take for puppet to run and put the change in place? [16:16:26] half hour [16:16:29] but for you... [16:16:31] mk [16:19:18] for you, 5 min (done) [16:19:24] (did a manual run) [16:19:57] woo, danke [16:20:25] yay, it works [16:21:13] * RoanKattouw stabs Tim-away [16:21:21] "set -e in scap so that scap fails if mergeMessageFileList.php fails" [16:22:49] <^demon> Stab him in his sleep? [16:22:52] <^demon> ;-) [16:24:05] New patchset: Mark Bergsma; "Cache bits 4xx for 1m" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6403 [16:24:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6403 [16:24:32] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6403 [16:24:34] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6403 [16:25:08] ^demon: He caused the "scap mysteriously stops running" bug [16:25:16] haha [16:26:09] Interestingly robla said "isn't there that flag that tells bash to stop when there's an error code", which I think he relayed from someone else [16:26:15] But that led me to find it pretty quickly [16:26:37] nope, I've added that to a script myself when I needed it (not here) [16:27:07] bash has a flag for damn near everything [16:27:28] OK [16:27:54] robla: Then you are now officially credited with finding the cause of a problem that I couldn't find myself ;) [16:28:02] * RoanKattouw credits robla in commit msg [16:32:28] mark, can you point me to some docs about how backups work? or maybe something in puppet? or just tell me what I should do on stat1? [16:32:55] reading backups.pp now... [16:32:59] not now, outage [16:33:03] oh [16:33:04] k. [16:35:39] New patchset: Catrope; "Revert "set -e in scap"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6404 [16:35:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6404 [16:41:12] New patchset: Catrope; "Revert "set -e in scap"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6406 [16:41:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6406 [16:42:36] RoanKattouw: could you wrap commit msgs? idk if there's a policy established yet but i think it's 1) generally a good idea and 2) makes gerrit much more friendly [16:42:48] jeremyb: I just did? [16:43:02] jeremyb: See also convo between me and ^de mon in #wikimedia-dev [16:43:08] idk... /me relooks ;) [16:43:12] oh, *that* channel [16:47:23] Change abandoned: Catrope; "Tried to amend this, but amending reverts is broken in Gerrit. Adandoning in favor of https://gerrit..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6404 [17:02:07] Hey opsen, could someone review&merge https://gerrit.wikimedia.org/r/#change,6406 please? It unbreaks our deployment scripts, which would be nice to do before I teach Ian how to deploy code at 3pm [17:03:28] baker? [17:04:46] New patchset: preilly; "Partner IP Live testing Wednesday, May 2nd, 10am - 12pm PST" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6407 [17:05:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6407 [17:05:16] Yes, Ian baker [17:13:14] is leslie around today? [17:13:22] She just walked in [17:13:31] RoanKattouw: okay, thanks [17:14:13] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/5439 [17:14:16] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/5439 [17:17:03] preilly: ^ [17:17:17] jeremyb: ? [17:17:22] jeremyb: oh, thanks! [17:17:25] heh [17:17:38] LeslieCarr: are you in a meeting already? [17:17:45] preilly: yes [17:18:04] LeslieCarr: I was hoping I could get you to push this https://gerrit.wikimedia.org/r/6407 [17:18:12] not right now :-/ [17:18:19] nick woosters [17:18:29] ha ha ha [17:18:36] ;-P [17:18:45] ctwoo: I think you're missing a / [17:18:51] ya ... [17:19:06] nick to release woosters ... [17:19:21] need to release it, not tnick [17:24:29] binasher: ping [17:24:41] hmm? [17:24:59] binasher: can you push https://gerrit.wikimedia.org/r/6407 live for me? [17:26:33] preilly: digi is using the zero domain and all the other carriers are just sticking with m [17:26:34] ? [17:26:49] binasher: that is correct [17:26:52] damn [17:27:02] can digi be persuaded to change? [17:27:14] binasher: it is in the contract [17:28:04] its kinda funny… all that work for dot zero and all other carriers said no thanks [17:28:20] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6407 [17:28:23] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6407 [17:28:38] binasher: yeah [17:32:41] binasher: can you let me know when it's live [17:33:39] preilly: its merged on the puppet server too [17:33:49] the rest will take a bit longer [17:34:11] binasher: the rest? [17:34:50] preilly: the puppet installs [17:35:10] binasher: can you force them? [17:35:40] you would go there.. [17:36:18] heh [17:37:10] varnish is loading the new vcl without complaints, should be everywhere in 5 minutes [17:43:29] New patchset: Ryan Lane; "Change pam configuration to only require pam_access for sshd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6419 [17:43:48] New patchset: Ryan Lane; "Change gerrit manifests to use and require the gerrit package, rather than installing it from puppet." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6312 [17:44:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6419 [17:44:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6312 [17:44:32] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/6312 [17:46:17] New patchset: Ryan Lane; "Change pam configuration to only require pam_access for sshd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6419 [17:46:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6419 [17:47:33] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6419 [17:47:36] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6419 [17:55:24] nopeter: regarding bellin...right now it is looking like dimm error but it is both dimm in one bank (b2 and b3) so could be dimm or bank or nothing at all [17:55:50] may just be an old error that wasn't cleared...same errors logged back in Jan [17:57:20] cmjohnson1: yeah, it dies intermitently :/ [17:57:28] like every couple of weeks/months [17:57:51] the most obnoxious kind of error to troubleshoot... [17:58:49] yep...i am running a DSET report for Dell, will clear the errors and let you have it back and see if you get the error again. [18:00:46] hi, got Qs about squid and puppet and volatile, is this a bad time to ask? [18:01:24] cmjohnson1: ok! cool. did you replace dimms? [18:02:16] er, I'm confused as to the current state of the machine... [18:02:29] notpeter: no I have not [18:03:03] okie dokie [18:05:03] cmjohnson1: ok! and if you have time to work on rt 2798, that'd be cool [18:05:04] notper: the current state is borked....i have to call Dell and get them to send me new DIMM. [18:05:19] we're at a point where I could used them, but not a super rush, or anything [18:05:23] cmjohnson1: cool cool [18:05:42] are you ready for those? last time we talked you were thinking end of month [18:06:12] cmjohnson1: yeah, I could use them. mar_k was very very fast with getting ubuntu 12.04 installable [18:06:24] so yeah, when you got time, I can use'em [18:06:59] are we swapping one at a time or do you need them up and running first [18:08:45] cmjohnson1: the whole search cluster is not active currently [18:09:01] is there room/power enough to add one of the new ones that I can use for testing? [18:09:09] or is a direct switch out needed? [18:09:31] if i can shut down and remove a current search than yes [18:09:41] gotcha [18:10:06] eh, whatevs, just go for it. feel free to swap out all of them [18:11:47] notpeter: I will need mgmt IP information to setup DRAC [18:12:52] cmjohnson1: ah, ok [18:13:44] i'll handle that, going to put in here what i do for ya chris [18:13:54] so when you see it soon enough it wont be the first ya hear of it [18:14:18] so we handle our dns files via svn to commit changes, then via a script to sync those changes out [18:14:35] so the svn kept files are edited on sockpuppet in /root/pdns [18:15:06] then on there, for mgmt i edit the reverse dns zone file to find an available IP [18:15:51] we have most of these already allocated into subnets, so if its not the first server of its kind, there is usually something there [18:16:05] like now we have search#.mgmt.pmtpa.wmnet [18:16:11] 1-20 [18:16:21] cmjohnson1: how many are you adding again? 16? [18:17:03] there are only 13 to replace [18:17:30] ? [18:17:32] but I have 16 total...so yep....3 more [18:17:52] right, but i need to know the total number of new, we aren't going to name the new servers old names [18:18:13] yes...13 [18:18:16] sorry 16 [18:18:26] I am replacing 13 adding 3 [18:18:31] for a total of 16 [18:18:47] so i am going to add search21-36 [18:19:42] robh: wouldn't you just want to rename it search1-12 since we are replacing [18:19:47] nope [18:19:51] okay [18:20:12] someday we may start doing that, when things tie to a more asset tagged system outside of racktables [18:20:36] but for now we don't rename old server names, since we usually have those said clusters on internal ips [18:20:45] caching being exepmtion but meh [18:20:55] (they have a mix of internal and external ips) [18:21:29] notpeter: with this new information I will be able to get one up for you to test first...(once i get ip info) [18:22:06] cmjohnson1: do you have an rt # i can paste teh ip info into ? [18:22:24] robh: 2798 [18:22:53] cmjohnson1: cool! [18:23:27] ok, so when we are working here later i will show you the changes screens for the dns stuff [18:23:38] but i added that ip info and committed the change [18:23:52] so now i can login to ns0.wikimedia.org (dobson) [18:24:06] oh, with passing along our ssh key, since we need to run a sync script [18:24:17] and run that script on it, which sends it to all the other nameservers [18:24:21] !log updating dns [18:24:24] Logged the message, RobH [18:25:39] cmjohnson1: the most important thing being before committing to svn, ensuring you have no typos [18:25:54] a bad typo in dns can pretty much take us into multi hour downtime [18:26:09] got it! [18:26:28] since i just tested my changes, i can now safely continue to say [18:26:33] i have yet to cause an outage due to dns [18:26:48] i wasnt going to say that in mid dns update [18:26:50] don't jinx yourself [18:26:58] but it tested ok, so i can say it now [18:27:07] updated the ticket with the ip info as well [18:28:14] cool thanks. [18:29:41] RobH: are you in the dc ? [18:29:56] nope, playing catchup [18:30:07] whatcha need? [18:32:56] LeslieCarr: can you do me a favor, i am adding new search boxes in sdtpa but want to get one up and running for notpeter today for testing. I will be connecting to port 34 on asw-b3-sdtpa [18:33:20] the linecard might be returned to sender today.. i'll open a remote hands ticket to put it in the cage [18:33:27] it's neighbor is ms-be4 ...and I will add it to the network ticket for all of them [18:33:33] cmjohnson1: cool [18:33:40] thx [18:34:34] i'm in a meeting now until noon, then another noon meeting but i think i can do this during the noon meeting [18:34:59] awesome..thx [18:35:14] LeslieCarr: pls do, sorry about that [18:48:49] !log creating a blobs_cluster23 ES shard table for all active projects [18:48:51] Logged the message, Master [18:56:04] ok aaaand i'm back! [18:56:09] squid questions abound [18:56:16] shoot [18:56:26] aaaahhh drdee_ you don't have the answers! [18:56:38] you never know [18:56:42] the squid conf files seem to be in the volatile source [18:56:51] i'm pretty sure I need access to them to change log format [18:56:56] but they are not checked into puppet [18:56:58] so wah wah [18:56:59] :( [19:00:20] poking time: hmmm [19:00:31] maybeeeeee binasher knows [19:00:41] or maybe maplebed? [19:00:47] or maybe notpeter [19:00:51] hm? [19:00:55] ohai! [19:00:55] haha, hi [19:01:18] looking for squid conf files to add more info to access log lines [19:01:27] but they are in puppet:///volatile [19:01:28] they're on fenari in /home/w/common/conf/squid or something like that. [19:01:38] right [19:01:45] but i don't have access to that [19:01:47] http://wikitech.wikimedia.org/view/Squid#Configuration [19:02:12] http://rt.wikimedia.org/Ticket/Display.html?id=2745 [19:02:23] i've got nginx and varnish down [19:02:28] need to do the same for squid [19:02:38] what's the process for making a change to a squid conf? [19:02:44] maybe throw a patch into the RT? [19:02:45] are they checked into a repository somewhere? [19:02:51] I don't think so. [19:02:53] can you get me the originals? [19:03:00] you don't have read access to them? [19:03:08] i don't have access to fenari [19:03:09] afaik [19:03:11] ottomata: there's a squid conf generating php script [19:03:21] maplebed: they're locally checked into.. wait for it… wait for it…. RCS! [19:03:23] that copies to the volitile puppet dir [19:03:28] haha, waht's RCS? [19:03:30] binasher: that rocks. [19:03:50] ottomata: I can get you an original. [19:03:56] ottomata: it stores and diffs punch cards [19:03:58] binasher: can you revert the varnish change [19:04:06] preilly: uh? [19:04:11] did you break the site? [19:04:23] binasher: no it's only for an hour long test [19:04:23] ottomata: you do have access to fenari. [19:04:27] I do!? [19:04:34] so says puppet. [19:04:44] /home/w/conf/squid/generated/frontend.conf [19:04:44] well how about that! [19:04:45] preilly: did you revert the change in git? what actually needs to happen? [19:04:51] never logged in there before [19:04:52] cool [19:04:54] ottomata: that is where the frontend ones are [19:05:03] which is where the logging happens [19:05:10] ok, cool [19:05:12] binasher: I didn't revert it [19:05:17] aweseom [19:05:21] binasher: in the past LeslieCarr has been doing these [19:05:22] so, this generator.php thing though [19:05:24] preilly: revert it [19:05:29] binasher: and she just reverts it [19:05:32] when I make changes, should I submit a patch for frontend.conf or for that? [19:06:01] oh whoa [19:06:03] lots of files [19:06:07] New patchset: preilly; "Revert "Partner IP Live testing Wednesday, May 2nd, 10am - 12pm PST"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6431 [19:06:08] ottomata: changes go into /home/w/conf/squid/frontend.conf.php [19:06:24] ottomata: yeha, the generator script uses the mentioned template [19:06:24] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6431 [19:06:27] and makes one for each [19:06:38] and then pushes to the volatile puppet repo [19:06:42] cmjohnson1: i don't see the ticket ? [19:06:46] binasher: reverted in https://gerrit.wikimedia.org/r/#change,6431 [19:06:51] awesome, got it [19:07:03] sooooooo, use with extreme caution. maybe talk to mar_k about any changes you're going to make [19:07:16] but, for investigation, you now can see all the confs [19:07:18] will do, i mean [19:07:19] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6431 [19:07:20] i won't make the change myself [19:07:22] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6431 [19:07:24] kk [19:07:26] i will submit it as a patch along with my gerrit change [19:07:37] for nginx and varnishncsa [19:07:44] preilly: look at you, pressing buttons in gerrit like a big boy! [19:07:54] kk [19:07:54] ugh, i'm sounding like you [19:08:13] binasher: fuck you [19:08:18] binasher: with love [19:08:34] the first part was ok, but then it got creepy [19:08:41] welcome to the wmf tech department, ladies and gentlemen [19:08:54] notpeter: ha ha [19:26:03] notpeter: is it possible to configure different time out tresholds per udp filter? [19:26:21] notpeter: once lesliecarr gets a chance she'll setup network for search34 and then you should have free rein [19:28:00] drdee_: anything is possible... but that would be non-trivial [19:28:05] cmjohnson1: wooo! [19:28:06] thank you! [19:28:58] notpeter: okay, maybe we need to make a distinction then between filters that we have to monitor and filters that we don't need to filter, currently the teahouse filter is generating way too many false alarms [19:29:20] filters that we don't need to filter ==> filters that we don't need to monitor [19:29:39] wait, what? [19:30:00] this sounds promising [19:31:00] notpeter: i'll rephrase: can we enable monitoring for individual filters? [19:32:35] drdee_: yes. but anything that's not all or nothing will require some doing to d oproperly [19:34:01] notpeter: okay, what would you suggest we do to reduce the number of false warnings? [19:35:14] drdee_: so, I did increase the warning and crit threshhold on the monitoring 4-fold [19:35:17] that will probably help [19:35:41] but, I guess, what are the filters doing if they are not writing to logs for 24 hours at a time? [19:35:48] yes: but the downside is that it will take 24 hours before we notice that important filters are failing [19:36:03] some filters are just very specific and don't get many hits [19:37:34] drdee_: so, the monitoring of if a filter proc is missing fro the process list is still as sensitive [19:37:44] so if filters are crashing , you'll still know that within minutes [19:38:16] ok [19:38:30] the only failure mode, at this point, is if a filter is running, and showing up in the process list, but isn't outputting anything [19:38:34] this is... possible [19:38:45] but would only happen with a misconfigured filter or some type [19:38:54] which should be checked for when a filter is first deployed [19:39:00] agree :) [19:39:08] okay, let's see how this goes [19:39:18] the likelihood that something will all of asuddend keep running but stop outputting seems... low. [19:39:21] cool [19:39:26] if this is a significant issue [19:39:29] I will work on it [19:39:35] and we can get per-filter monitoring working [19:39:41] but that would be a day's worth of work [19:39:45] not a couple fo minutes [19:40:11] oh [19:40:13] although [19:40:36] oh, no, that won't work. damn [19:40:46] nevermind. ithought I was clever for a second there [19:43:12] Hey opsen, could someone merge https://gerrit.wikimedia.org/r/#change,6406 please? [19:43:44] I'm supposed to be guiding Ian on his first deploy later today so it'd be nice to unbreak our primary deployment script before then [19:45:51] RoanKattouw: that's legit. I'll +2 and merge on sockpuppet [19:46:07] who set it to begin with? what's the history of this flag? [19:46:49] Tim did [19:47:02] To make scap fail when mergeMessageFileList.php faile [19:47:04] d [19:47:13] But I don't think he fully realized what he was doing, per my commit msg [19:48:02] alright, that's surprising, but sure, I shall mergificate [19:48:08] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6406 [19:48:11] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6406 [19:48:31] Thanks for having enough faith in me to revert Tim's changes for me :D [19:49:38] I mean, this might turn into a revert war [19:49:41] but whatever [19:49:46] not my war ;) [19:50:04] binasher: do you have stuff in the deploy queue? [19:50:24] Ryan_Lane: you too [19:50:32] binasher Ryan_Lane is that stuff good to go? [19:50:43] yah [19:50:44] mine [19:50:50] not sure about ryans [19:53:25] meh, it's not dns. should be ok ;) [19:53:43] RoanKattouw: ok, let your revert war begin! [20:02:34] New patchset: Pyoungmeister; "making db53 a snapshot host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6437 [20:02:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6437 [20:06:36] New patchset: Pyoungmeister; "making db53 a snapshot host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6437 [20:06:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6437 [20:07:21] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6437 [20:07:24] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6437 [20:12:44] binasher: d e r p [20:18:13] Ryan_Lane: http://blog.bofh.it/debian/id_413 [20:21:21] LXC is pretty awesome though [20:35:06] * binasher is afk for lunch  [20:39:55] gerrit Q for you guys [20:40:01] I have a commit in gerrit currently [20:40:01] https://gerrit.wikimedia.org/r/#change,6392 [20:40:10] I want to commit something else that will be dependent on this commit [20:40:56] Should I: [20:40:56] A. wait for it to be approved [20:40:56] B. commit to the same topic branch [20:40:56] C. Create a new topic branch from this topic branch and commit there? [20:41:09] Ryan_Lane might be able to help me with that one [20:42:24] why are they dependent on each other? [20:42:28] ottomata: If it's dependent, then any of those three is a reasonable thing to do IMO [20:42:55] meaning, are they legitimately dependent, or just because of how they were submitted? [20:43:59] um [20:44:19] well, in order to add change the log format [20:44:27] for RT 2255 [20:44:38] i had to change the way the init.d script read in the format for varnishncsa [20:44:47] now I have another ticket, that asks for more changes to log format [20:44:57] RT 2745 [20:45:33] if RT 2255 was approved and merged, no problem, I could topic branch from production and do new commit [20:45:46] but since that current commit is not yet merged in [20:45:49] it only exists on my local branch [20:46:26] so the change requests are legitimately dependent [20:46:42] but one is only a change for varnishncsa [20:46:56] the other changes I ahve coming in will affect varnishncsa, nginx, squid and udp-filter [20:47:05] got a question - in manifests/misc/firewall.pp i'm using exported resources to create tiny files for each host that needs an open port. however, in the file, when created it sees the hostname as proper but the ip address as the ip address of the server which is pulling the files. anyone have an idea why this is ? (starting line 31) [20:47:06] and I'd rather keep them seperate, since they will need more babysitting [20:48:27] LeslieCarr: a guess: [20:48:30] define exported_acl_rule($hostname=$::hostname, $ip_address=$::ipaddress, [20:48:37] but the file definition in that [20:48:41] uses $ipaddress [20:48:47] instead of the passed in $ip_address [20:48:51] ahha [20:48:53] thanks ottmata [20:49:16] yup! [20:49:30] New patchset: Lcarr; "fixed ip_address field" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6438 [20:49:49] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6438 [20:50:01] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6438 [20:50:05] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6438 [20:53:27] Ryan_Lane, so advice for committing new dependent stuff? [20:54:29] no idea [20:56:00] haha, k thanks :p [20:56:22] i wonder who I can get to approve my change then [20:56:27] cause that would be my #1 choice [20:56:29] let's seeeeeeee [20:59:17] hmmm, binasher kinda knows about this ticket [20:59:19] mayyyybe? [20:59:31] i guess this change needs to be babysat too, which makes it annoying [20:59:38] since it does change running varnishncsa instances [20:59:52] aaand I bet ops peeps are all busy bees [21:00:14] maybe notpeter? he's a helpful chap. [21:00:58] New patchset: Demon; "Changing gerrit to use a raw jdbc url for connecting to MySQL (bug 35626)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6439 [21:01:14] ottomata: to say i'm busy is an understatement ;) [21:01:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6439 [21:07:15] cmjohnson1: ports completed [21:07:41] lesliecarr: awesome! [21:08:00] lesliecarr: did you see my email a couple weeks back about wireless here? [21:08:49] yes [21:09:05] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6439 [21:09:08] which means that i need to fix it up -- sorry i haven't yet :-/ [21:09:08] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6439 [21:09:12] me fail [21:10:18] there are more important things going on....but whenever you get a moment [21:18:25] what's the name of the python library that does the easy making parameters (like -f foo) ? [21:18:34] maplebed: ^^ ? [21:18:48] optparse? [21:19:04] yes, thank you [21:19:34] though eventually we'll have to start using argparse instead of optparse. [22:33:43] gerrit=503? [22:35:28] yes, >90% is 503 [22:35:36] for me [22:35:44] <^demon|away> Known, sorry [22:35:57] oh, i see now [22:37:24] (fixed now) [22:40:11] New patchset: Ryan Lane; "Change gerrit manifests to use and require the gerrit package, rather than installing it from puppet." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6312 [22:40:25] -_- [22:40:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6312 [22:40:42] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/6312 [22:42:18] New patchset: Lcarr; "backporting arparse due to python 2.6! yay!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6448 [22:42:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6448 [22:42:55] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/6448 [22:42:57] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6448 [22:46:50] New patchset: Ryan Lane; "Ensure the gerrit backend doesn't get removed on restart" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6451 [22:47:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/6451 [23:12:48] hrm [23:13:13] nagios bot has been dead [23:13:17] !log restarting nagios bot [23:13:19] Logged the message, Mistress of the network gear. [23:13:24] I'm breaking swift atm. [23:13:32] (not intentionally) [23:13:35] oh ok [23:13:37] good to know [23:14:45] RECOVERY - BGP status on csw2-esams is OK: OK: host 91.198.174.244, sessions up: 4, down: 0, shutdown: 0 [23:14:46] RECOVERY - BGP status on cr1-eqiad is OK: OK: host 208.80.154.196, sessions up: 10, down: 0, shutdown: 0 [23:14:56] I don't actually know what's broken yet though. [23:16:24] RECOVERY - BGP status on csw1-esams is OK: OK: host 91.198.174.247, sessions up: 5, down: 0, shutdown: 0 [23:17:27] PROBLEM - BGP status on cr2-pmtpa is CRITICAL: CRITICAL: No response from remote host 208.80.152.197, [23:17:47] restarting swift-proxy on ms-fe1 [23:18:39] RECOVERY - BGP status on cr2-pmtpa is OK: OK: host 208.80.152.197, sessions up: 7, down: 0, shutdown: 0 [23:18:57] RECOVERY - check_all_memcacheds on spence is OK: MEMCACHED OK - All memcacheds are online [23:20:45] RECOVERY - Swift HTTP on ms-fe2 is OK: HTTP OK HTTP/1.1 200 OK - 366 bytes in 0.009 seconds [23:20:45] RECOVERY - LVS HTTP on ms-fe.pmtpa.wmnet is OK: HTTP OK HTTP/1.1 200 OK - 366 bytes in 0.083 seconds [23:20:46] RECOVERY - BGP status on cr2-eqiad is OK: OK: host 208.80.154.197, sessions up: 25, down: 0, shutdown: 1 [23:20:53] fyi the rack spence is on is overloaded [23:20:54] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK HTTP/1.1 200 OK - 366 bytes in 0.009 seconds [23:20:56] that's why the paging [23:21:08] :( [23:21:14] it's a 1g uplink [23:21:22] i'm going to up it to 2g while we wait for the new module [23:21:29] woosters and RobH ^^ [23:21:54] =[ [23:22:51] !log swift is recovered; ~20 minutes of impaired service. cause unknown, but the swiftcleaner looks likely. [23:22:54] Logged the message, Master [23:24:54] maplebed: Sorry that was me [23:25:09] I saturated the network uplink due to a bug in the sync scripts [23:26:49] maybe use a dedicated port # just for deploys and rate limit that port? [23:27:42] It's usually not an issue [23:27:50] Unless you try to push 2GB through it [23:27:59] Which shouldn't have happened if we hadn't had a bug in the scripts [23:28:26] well unless you try to push 1g through it [23:28:34] i can't up it to 2g until tomorrow [23:28:35] well maybe i'm just confusing unrelated events as being the same thing. but i thought we had a similar problem in the last week or so [23:28:43] Yes we did [23:28:56] !log update - roan takes the blame [23:28:58] Logged the message, Master [23:29:01] hahaha [23:29:03] ;) [23:29:05] * RoanKattouw_away passes blame to Aaron [23:29:56] anyway, the point is there will be some time in the future when we make a new top level het path. e.g. 1.20wmf3. what will stop this from happening again then? [23:30:51] A lower forklimit maybe [23:33:46] jeremyb: oh we have had this problem many times [23:33:49] it keeps happening [23:33:58] every single deploy [23:34:18] can you lower the forklimit while we don't have the proper resources RoanKattouw_away ? [23:34:35] LeslieCarr: I would have done that if I knew it was going to push 2.2GB of data [23:34:42] But I expected it to push about 1/1000th of that [23:34:46] ah [23:38:00] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:40:59] !log started swift old-object-deleter on ms-be3 [23:41:02] Logged the message, Master [23:43:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.145 seconds