[00:45:38] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [01:05:09] New review: MZMcBride; "Hi Ryan." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57649 [01:35:03] New patchset: MZMcBride; "noc: Refactor highlight.php to be simpler and less more secure" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [01:35:30] New review: MZMcBride; "This was a wonderful commit message to read. Thanks for working on this, Krinkle. :-)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [01:37:30] TimStarling: If you have a minute, I'd like you to do a sanity check on g 59034 (in case you haven't already). [01:38:28] * jeremyb_ has just starred 59034 [01:38:54] +1 for the same reasons as Susan but i haven't looked at the actual changes [01:39:48] wow, that's a real essay on g 57649 [01:42:03] https://gerrit.wikimedia.org/r/#/c/57649/ [01:42:16] Gerrit isn't conducive to essays. [01:42:20] But I don't let that stop me. [01:42:53] Susan: contact and contact-url probably come from the ContactPage extension [01:43:39] Contact-url isn't defined. [01:43:42] But Contact, probably. [01:43:51] Though that'd be pretty nasty for an extension to not prefix its messages. [01:44:18] morebots still missing. [01:44:19] Grrrr. [01:44:41] Seems someone updated the log, though. [01:44:46] That makes me happy. [01:49:15] "attempt to write a readonly database" [01:49:20] So that's why jenkins is failing [01:49:37] https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/2091/console [01:49:42] PleaseStand is wonderful. [01:50:44] I let him know. [01:51:29] Susan: Nice work on publishing your bots on github [01:51:33] I didn't know toolserver/reba was yours. [01:56:31] Thanks. :-) [01:56:36] Yeah, it's a really simple/stupid bot. [01:56:42] It just polls Gmail every minute or something. [01:57:02] There's a much smarter way to do e-mail --> IRC involving a mail filter or something, I think, but this was simpler for me at the time. [02:03:47] Krinkle: doing it now [02:16:20] PROBLEM - Puppet freshness on db1058 is CRITICAL: No successful Puppet run in the last 10 hours [02:16:55] Krinkle: maybe it should just redirect to gitweb, since we have the gitweb URL there already [02:16:59] !log LocalisationUpdate completed (1.22wmf1) at Tue Apr 16 02:16:58 UTC 2013 [02:17:17] Susan: i'm surprised you havn't just rewritten a moresbot replacement yet [02:17:20] that would make it somewhat easier to review [02:18:28] p858snake: Yeah... I suppose I could. [02:18:44] It wouldn't tweet, but it could post to the wiki page. [02:19:17] TimStarling: That doesnt have syntaxhighlighting though, and the inteface doesn't have a blame url in the same interface. [02:19:18] Susan: I don't think many people would pay that much attention to the twitter feed most likely [02:19:23] Perhaps redirect to github? [02:19:40] why does it need those things? [02:20:24] well, it doesn't need anything. The information is already public in the repos. the whole portal has become redundant. [02:20:50] Susan: Actually instead of posting directly to the page, You could have it go to a subpage (eg: SAL/Year-Month) which wouuld resolve the issue of the page having thousands of revs and easier to archive the logs away (but just removing their trunsucation from the SAL page) [02:20:50] but the way it is, it is easier to use. [02:20:58] having those things adds to that. [02:21:07] I personally use noc regularly. [02:21:14] Particular the raw text files. [02:21:25] Syntax highlighting with a large file is really slow to load and search. [02:21:32] Change merged: Tim Starling; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [02:21:51] I'm not sure if gitweb supports in-browser raw file viewing. [02:21:53] Susan: What if noc.wikimedia.org/conf/ would redirect to https://github.com/wikimedia/operations-mediawiki-config [02:21:53] GitHub might. [02:22:07] I merged it, but I think it has outlived its usefulness [02:22:11] github does, their raw. subdomain enforces text/plain [02:22:23] And then click wmf-config? Maaaaaaaybe. [02:22:29] TimStarling: I noticed that the cdb file with .txt still triggers a download [02:22:35] TimStarling: Any idea what's up with that? [02:22:52] those arguments about extra features in highlight.php could apply to any of our code [02:23:06] Sure [02:23:07] the reason highlight.php is limited to configuration is because the rest of the code was in version control [02:23:15] Click wmf-config, click InitialiseSettings.php, then maybe go to the raw version. [02:23:20] I agree, we should phase it out. [02:23:59] I'll start on a patch that reroutes it to version control viewers. And then we can make the individual redirects for the symlinks to the raw view of whatever viewer we use (not the ones via highlight.php) [02:24:06] what URL is triggering a download? [02:24:15] Some people also rely on the .txt files as an API of sorts. [02:24:17] the "raw text" for the cdb files in noc/conf [02:24:37] I used to have a tool to search CommonSettings.php/InitialiseSettings.php. [02:24:48] I still might. [02:24:49] https://toolserver.org/~mzmcbride/splarka/ [02:24:53] you mean http://noc.wikimedia.org/conf/interwiki.cdb.txt ? [02:24:57] Yes [02:25:00] Heh, this was before the .txt versions existed. [02:25:05] binary data [02:25:08] ha [02:25:09] I used lynx to strip the syntax highlighting. ;-) [02:25:36] it doesn't trigger a download for me [02:25:45] it just shows the binary garbage in the browser [02:25:51] Triggers a download for me. [02:25:57] In Chrome/OS X. [02:26:18] that is not surprising, I guess [02:26:39] sub.Popen('lynx -dump -width=99999 -display_charset=utf-8 %s%s.php' % (url, file) [02:26:43] :D [02:26:45] so highlight.php would redirect to the 'blob' action for gitweb or github ( https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/StartProfiler.php ) and the raw one'd direct to the plain/raw action https://raw.github.com/wikimedia/operations-mediawiki-config/master/wmf-config/StartProfiler.php [02:26:46] you know IE's FindMimeFromData(), which I disassembled and wrote a blog post about [02:26:53] and the portal / index can go to the tree view [02:27:08] that would ignore text/plain if there was binary data in the file [02:27:36] and we knew at the time that safari emulated IE's behaviour in some ways [02:27:55] so it's not surprising that other webkit clones would do it also [02:28:11] http://tstarling.com/blog/2008/12/secure-web-uploads/ [02:28:14] Susan: Also a download for this one? https://raw.github.com/wikimedia/operations-mediawiki-config/master/wmf-config/interwiki.cdb [02:28:23] Yes. [02:28:29] Not here [02:28:30] Triggers a download of interwiki.cdb. [02:28:36] As opposed to interwiki.cdb.txt. [02:28:45] Looks like github is using something that chrome makes it happy [02:28:51] Susan: Sure [02:28:56] it uses the file name [02:29:08] https://noc.wikimedia.org/conf/interwiki.cdb.txt and https://noc.wikimedia.org/conf/interwiki.cdb [02:29:11] respectively [02:29:35] Right. [02:29:58] TimStarling: The one thing it does add it centralise apache-config.git puppet:////lucene and mediawiki-config together [02:30:05] though that's a biased subset of course [02:30:39] I added repo links for those as well a few months back [02:30:48] I'm not sure the maintenance burden of noc/conf outweighs the cost of deprecating it. [02:31:34] brb later [02:31:34] index.php can do that, with deep links to gitweb or whatever [02:35:00] !log LocalisationUpdate completed (1.22wmf2) at Tue Apr 16 02:34:59 UTC 2013 [02:35:55] wow is morebots still dead? :( [02:48:53] hmmm [02:57:09] closedmouth: Yes, it seems my bug was ignored. And a number of people apparently performed actions today without logging them. [02:57:23] :(( [02:57:26] Which you'd think would upset more people. Though I don't think many people noticed. [03:04:23] New patchset: Tim Starling; "Enable terbium cron jobs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59370 [03:06:18] I can't fix it anymore [03:07:14] Oh, because it's on the old wikitech? [03:07:15] I'm nearly done with a clone. [03:07:32] Kind of insane that it wasn't moved to the new host when the wiki was moved... [03:07:38] Though I imagine there was a reason. [03:07:39] no, it's on a new linode [03:07:54] hang on, I found the root password for it [03:07:58] morebots is on wikitech-static. [03:08:17] (I'm not sure what host that is at this point.) [03:08:32] wikitech-static is the new linode [03:08:43] Okay. [03:09:41] just setting it up properly... [03:16:15] except: [03:16:15] logging.warning(sys.exc_info) [03:16:20] WARNING:root: [03:17:15] not very helpful [03:17:37] python? [03:17:40] where is that code? [03:17:45] ori-l: morebots. [03:18:05] haven't found its version control yet [03:18:07] https://wikitech.wikimedia.org/wiki/Morebots is where it should be... [03:18:08] logging.exception('An error occurred') will automatically attach exception info / traceback to the log output [03:18:30] I feel like it was in SVN... may be moved to Git. [03:18:41] provided it is invoked from within an 'except:' block [03:19:21] https://github.com/search?q=morebots&ref=cmdform&type=Code [03:19:28] of course just adding parens to that line, so that it reads 'logging.warning(sys.exc_info())' will do the trick [03:19:31] I see a puppet file... [03:19:33] but it's not very pythonic [03:19:51] trunk/tools/adminlogbot [03:20:08] https://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/adminlogbot/ [03:20:24] ok, fixing [03:20:26] no adminlogbot project in git [03:20:50] ori-l: I doubt the version in SVN is up-to-date. [03:20:57] Tim can probably pastebin the current version. [03:21:12] this looks like the latest, no: https://gerrit.wikimedia.org/r/gitweb?p=operations/debs/adminbot.git;a=tree [03:21:25] Oh, nice. [03:22:57] https://wikitech.wikimedia.org/wiki/Morebots <-- I added that link there. Thanks for dinging that. [03:23:00] Finding. [03:23:07] It's time for a break from typing. :-) [03:23:16] * legoktm finished his clone [03:23:22] !log test [03:23:32] oh [03:23:38] is anonymous editing disabled on wikitech? [03:23:42] Yes. [03:23:45] blah [03:23:53] as on all labsconsole wikis [03:23:54] I think account creation is also restricted. [03:23:58] I have an account [03:24:28] no_morebots [03:24:38] account creation isn't [03:24:44] but it requires many fields to be filled in [03:24:50] and $wgConfirmEmailToEdit is true [03:25:10] It's rejecting my password... [03:25:19] Via index.php or api.php? [03:25:42] index.php works in my browser, but wikitools (api.php) isn't. [03:25:53] What'd you set as the API URL? [03:25:59] I don't think a bot on nightshade is really a great replacement [03:26:03] >>> site = wikitools.Wiki('https://wikitech.wikimedia.org/w/api.php') [03:26:04] that's not going to be easier to restart, is it? [03:26:25] TimStarling: It's a single python script, I can just as easily move it to labs [03:26:31] legoktm: You know, wikitools logging in broke on the TS wiki ages ago. I never figured out why. [03:26:36] I just edit anonymously there. [03:27:12] Well thats not an option here :/ [03:27:16] TimStarling: With my IRC bots, I usually have code in them that pings the server every minute and then the bot will kill itself (and restart through some other process) if it doesn't get a pong. [03:27:34] It can help with netsplits and other funkiness. [03:27:42] And my bots are just a copy of Susan's :P [03:27:52] Frightening. [03:28:17] legoktm: Are you getting an error? Like WrongPass? [03:28:30] yeah [03:28:32] thats the error [03:28:34] You can print the raw API output as well. [03:28:43] in wikitools? [03:28:43] Yeah... it may be the same issue as the Toolserver wiki. [03:28:51] With wikitools. [03:28:54] You may have to hack it up a bit [03:28:55] . [03:28:59] But it's pretty simple. [03:30:37] {u'login': {u'result': u'WrongPass'}} [03:30:41] legoktm: Yeah, I'm getting WrongPass/False. [03:30:58] When I try to execute login(). [03:31:31] If you pass a made-up username, you get WrongPluginPass. [03:31:34] Pywikipediabot is complaining that it was getting a non-json response, but that has so many random quirks. [03:31:53] I feel like the login code is probably a little broken. [03:32:18] I bet Pywikipedia still has the screen-scraping ability somewhere. [03:32:32] Heh. [03:32:37] * legoktm looks [03:32:44] ori-l: are you going to submit any patches against it? [03:33:54] anyways, here's my morebots code http://dpaste.de/6QQj4/raw/ [03:34:54] legoktm: What would it take to get you to figure out why logging in doesn't work? :-) [03:35:11] I'm already working on that :P [03:35:13] I can mail you a prize. [03:36:47] erhmmm [03:36:58] My custom script can log in. [03:37:18] this is the library it's using: https://bitbucket.org/jaraco/irc/src/5ddfc868d51e096435088b59e22df0c94abea839/?at=version_0_4_8 [03:39:19] Lets see if it can edit... [03:40:04] you see that library has irclib and ircbot [03:40:17] ircbot has automatic reconnection [03:40:25] irclib does not, and morebots uses irclib [03:40:39] bleh. it didnt actually work. [03:40:42] so it just connects once and then goes into an infinite loop of doing nothing [03:42:16] Susan: I can't figure out what could be wrong. [03:42:26] so if someone could change it to use ircbot instead, then it would just work [03:42:29] I think its a bug in one of the login extensions wikitech is using. [03:43:16] I would be happy to review such a patch [03:43:38] legoktm: You should fix the real® morebots first. :-) [03:43:42] but obviously I'm not going to let a second bot be in here, writing duplicate messages to the SAL [03:43:57] And perhaps file a bug about wikitools + Labs. [03:44:07] TimStarling: That makes sense, but right now neither is here :P [03:44:32] easily fixed [03:44:45] but that is just temporary, until it next gets disconnected [03:45:18] :) [03:45:28] its https://gerrit.wikimedia.org/r/gitweb?p=operations/debs/adminbot.git;a=blob;f=adminlogbot.py;h=3bea8a53f2157f8441a0ef5f4e905ebbfede93d8;hb=HEAD right? [03:45:59] yes [03:49:00] eww tabs :( [03:50:03] Let me PEP8-ify it first, then I'll try and convert it to ircbot [03:55:17] I'm filling a bug for this [03:55:21] New patchset: Legoktm; "Convert all tabs to 4 spaces, and other minor PEP8 fixes." [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/59371 [03:55:26] mostly for somewhere to put my own notes [03:55:31] I'll assign it to you [03:56:20] sure [03:56:56] New patchset: Ori.livneh; "Make logging to file correct and configurable" [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/59372 [03:57:32] TimStarling: oh, sorry, I missed the rest of the conversation [03:58:38] ori-l: should I just re-do my tabs-->spaces commit on top of yours? [03:59:23] legoktm: no, that seems unfair to you. if you're going to do the irc library change, I can just abandon my change and you can just pick out the things you want from it (if anything) [03:59:48] or I can re-base my change on top of your change, if you prefer [04:00:38] Well I'm about to fix a few other files in that director so the PEP8 test will pass [04:00:53] either way works for me, since its one click in my IDE to fix the tabs [04:00:58] filing, directory * [04:01:17] thanks Susan [04:01:17] :P [04:01:42] No problem. [04:02:04] legoktm: I think this should be your project :P [04:02:14] it sounds up your alley [04:02:19] so I'm withdrawing my change [04:02:30] You'd be a good reviewer, though. :-) [04:02:46] ok :) [04:02:51] Change abandoned: Ori.livneh; "(no reason)" [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/59372 [04:23:10] Is officewiki https only? And if so, where is that set? [04:23:52] Yes, and it's here: https://noc.wikimedia.org/conf/remnant.conf [04:28:47] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [04:39:22] * legoktm pokes ori-l  [04:39:33] How do you split this comment into 2 lines? [04:39:34] # user_query = "[[IRC Nick::+]] or [[IRC Cloak::+]]|?IRC Nick|?IRC Cloak|limit=500|format=json" [04:39:42] That's the only non-pep8 thing left in the repo. [04:41:06] Well, it's a comment, so it's really up to you, but if you want the comment to remain valid python code for some reason, I'd go with: [04:41:12] # user_query = ("[[IRC Nick::+]] or [[IRC Cloak::+]]|?IRC Nick|?IRC Cloak" [04:41:12] # "|limit=500|format=json") [04:42:04] Does the pep8 unit test thing check against comments? Mine does, but I haven't touched any of the settings [04:43:22] yes, it does (as it should, since pep8 includes guidelines about comment style) [04:50:43] New patchset: Legoktm; "Convert all tabs to 4 spaces, and other minor PEP8 fixes. All PEP8 tests should pass now." [operations/debs/adminbot] (master) - https://gerrit.wikimedia.org/r/59371 [05:01:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:02:16] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59370 [05:02:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.164 second response time [05:04:07] !log moving most cron jobs from hume to terbium [05:04:15] Logged the message, Master [05:28:21] TimStarling: it was already broken before today, but are the moves to terbium what broke https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=hour&z=default&jr=&js=&st=1365625056&z=large ? [05:29:12] probably not [05:41:44] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [05:41:44] New patchset: Tim Starling; "Move broken cron jobs back to hume and note what is wrong" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59374 [05:43:09] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59374 [05:51:49] Nemo_bis: I don't think wmf1 has c3b315b8405f0626bcf6cf0ae612d01135a21e50 [05:51:57] so I'd expect a flat graph [05:52:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:52:28] I guess that will work wen [05:52:40] Wednesday that is [05:53:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [06:26:36] PROBLEM - Host search23 is DOWN: PING CRITICAL - Packet loss = 100% [06:26:52] hmmmmmm [06:28:30] 4.5 days since that box was reinstalled [06:29:16] RECOVERY - Host search23 is UP: PING OK - Packet loss = 0%, RTA = 26.50 ms [06:31:26] PROBLEM - search indices - check lucene status page on search23 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:32:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:33:16] RECOVERY - search indices - check lucene status page on search23 is OK: HTTP OK: HTTP/1.1 200 OK - 269 bytes in 0.055 second response time [06:33:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [06:36:47] Aaron|home: I didn't find c3b315b8405f0626bcf6cf0ae612d01135a21e50 but thank you, I'll wait hopeful :) [06:43:54] PROBLEM - NTP on search23 is CRITICAL: NTP CRITICAL: Offset unknown [06:47:54] RECOVERY - NTP on search23 is OK: NTP OK: Offset 1.835823059e-05 secs [07:19:42] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [07:31:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:32:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.137 second response time [07:57:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:58:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [08:02:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:03:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [08:03:49] mark: hey :]  Will you will be available this afternoon to review some varnish changes I have made for beta? [08:06:22] i can review them now, this afternoon will be spotty [08:06:32] that works for me :-] [08:06:35] ok [08:06:49] * hashar moves the pile of accounting related papers to another desk. [08:07:03] :) [08:07:13] any particular preference/order? :) [08:07:28] I have rebased them yesterday [08:07:41] hopefully under a common topic [08:07:54] https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:54864,n,z [08:08:26] the first one would be https://gerrit.wikimedia.org/r/#/c/54864/ which make it so the mobile class uses role::cache::configuration instead of lvs::configuration::lvs_service_ips [08:08:37] since we do not have LVS in beta [08:08:52] that made the varnish mobile cache to points to some pre reserved IP address which simply does not serve any traffic :- ] [08:09:23] the trouble I have is finding out whether it will cause any madness on production :(( I should probably create a fake production varnish instance on my laptop to test that out [08:09:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:10:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [08:11:47] New patchset: Mark Bergsma; "mobile always uses role::cache::configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54864 [08:13:03] New review: Mark Bergsma; "Note that the hierarchy is different for production and labs, and also for the new backends you adde..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54864 [08:14:13] New review: Mark Bergsma; "(2 comments)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54864 [08:14:19] almost merged it hashar [08:14:25] but have a few comments :) [08:14:31] goood :-] [08:15:07] * mark watches his car being serviced [08:15:07] oh site [08:15:17] are you working in a garage right now? [08:15:18] remote/mobile working is awesome [08:15:21] yes [08:15:28] I love it every single day [08:15:45] whenever it is sunny outside, I got to the park, sit next to a tree and enjoy a fresh coke while coding [08:15:51] yep [08:15:57] some people come to me asking what I am doing: "I am working for Wikipedia" :-D [08:16:05] hehehe [08:16:13] look on their face: priceless [08:16:13] and then they think you're writing it [08:16:18] yeah [08:16:22] that is a common misconception [08:16:37] that is a common trick at my coworking place [08:16:46] I am presented as the guy who wrote the french encyclopedia [08:16:49] ;] [08:16:52] i'm sure you are ;) [08:17:52] go forth and live up to the expectation? [08:21:43] mark: mind if I had the $::site level to backends in a different patch. That would require me to change the bits class too [08:21:51] or I can do it first and rebase my changes on that [08:23:27] yeah please rebase [08:23:38] for upload we also need to find a different way [08:23:44] because it's still special casing labs [08:24:26] okk [08:25:55] probably also working through role::cache::configuration more [08:26:02] i'm looking again at the mount stuff [08:26:14] so does every instance have a /dev/vdb available that can be used this way? [08:26:30] yes [08:26:54] the distrib is mounted on /dev/vda and /dev/vdb has most of the available disk space which is mounted on /mnt by default [08:27:05] I guess /dev/vda is usually 10GB [08:27:17] if you create a 320G instance, /dev/vdb would have 310G [08:27:36] ok [08:31:28] New patchset: Hashar; "beta: adds ::site to bits_appservers backends" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59380 [08:31:49] New review: Hashar; "(2 comments)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54864 [08:33:44] New patchset: Mark Bergsma; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [08:36:50] New patchset: Mark Bergsma; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [08:37:41] damn I am lost [08:38:02] you're in france [08:38:20] you can actually get people to help you there ;p [08:40:44] Change abandoned: Mark Bergsma; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59136 [08:41:21] New patchset: Hashar; "mobile always uses role::cache::configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54864 [08:41:53] heh [08:42:09] so https://gerrit.wikimedia.org/r/59380 adds ::site to the labs backends [08:42:25] I guess the bits server is broken right now [08:42:28] (on labs) [08:42:51] New review: Mark Bergsma; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54864 [08:42:53] alllmost [08:43:13] hmm [08:43:35] ah yeah if you want :-] [08:44:07] sorry, should have spotted that earlier ;) [08:44:13] otherwise it can be a bit confusing [08:44:41] New review: Hashar; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54864 [08:44:48] New patchset: Hashar; "mobile always uses role::cache::configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54864 [08:45:20] mark: I fully agree that makes more sense. I have originally cut and paste the line without actually thinking about it [08:45:20] :D [08:45:35] me too [08:45:41] (in my head ;) [08:45:58] brb [08:47:39] back [08:47:43] let's see if we can merge htis... [08:50:12] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59380 [08:50:31] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54864 [08:52:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:53:20] hashar: if you rebase 44709 I think it can go in too [08:53:52] ah needs another upload change first [08:54:02] I am rebasing another change [08:54:07] https://gerrit.wikimedia.org/r/#/c/54863/ [08:54:10] for the upload cache [08:54:13] ok [08:54:28] Fatal error: require() [function.require]: Failed opening required '/usr/local/apache/common-local/php-1.22wmf2/redirect.php' (include_path='.:/usr/share/php:/usr/local/apache/common/php') in /usr/local/apache/common-local/w/redirect.php on line 3 [08:54:40] New patchset: Hashar; "upload cache in labs now uses role::cache::configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54863 [08:54:58] Did someone delete redirect.php? [08:55:33] seems so :-D [08:55:36] Looks to be missing [08:55:40] Who/why/where/when/how? [08:56:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.146 second response time [08:56:20] Reedy: use your git skills : git lg -- redirect.php --stat [08:56:21] hashar: let's just move "backend" into role::cache::configuration as "swift" [08:56:25] Reedy: * f9e5457 - Remove completely unused $wgRedirectScript/redirect.php (2 weeks ago) [08:56:25] varnish doesn't exist in tampa (and won't) [08:56:47] mark: it does in tampa, on the beta cluster :-] [08:56:54] hashar: I'm currently getting ready heading out of the door ;) [08:56:57] mark: we don't have swift either yet. [08:57:08] right [08:57:17] we should pick a more generic name [08:57:20] there will be ceph too [08:57:24] "media_storage" perhaps [08:57:30] was about to propose that [08:59:41] ideally we get rid of that entire case statement in role::cache::upload [08:59:49] for the directors [09:01:19] New patchset: Hashar; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [09:01:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:01:39] trivial rebase :-] [09:01:54] Reedy: will look at it. Not sure why we use redirect.php still [09:02:04] Reedy: but common settings has $wgRedirectScript = $wgScriptPath . '/redirect.php'; [09:02:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [09:04:40] hashar: yeah get the changes merged that you can, right ;p [09:04:52] actually that one is now at patchset 32 [09:04:55] i feel it could do with a few more [09:04:57] what do you say [09:05:09] lets bring it to the next round number 64 [09:05:16] we will have to be creative though [09:05:17] yes [09:05:23] why don't you work on that while I do something else ;) [09:05:38] like driving back home? :D [09:05:42] not yet [09:07:25] New patchset: Hashar; "Varnish rules for Beta cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47567 [09:10:07] New patchset: Mark Bergsma; "(bug 44041) adapt role::cache::mobile for beta" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [09:10:20] 33! [09:10:38] i reordered so it doesn't depend on the upload changes [09:11:15] ah yeah [09:11:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/44709 [09:11:22] there we go ;) [09:11:25] I think I originally did both mobile and upload at the sametime [09:11:27] \O/ [09:11:42] New patchset: Hashar; "Varnish rules for Beta cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47567 [09:11:50] mark: can you get https://gerrit.wikimedia.org/r/#/c/47567/ in as well? [09:11:56] MaxSem had to tweak some VCL for beta [09:12:01] related to how the URL are presented [09:12:10] i'll look [09:12:20] the patch works in beta and is safeguarded [09:14:05] whoops [09:14:20] need to fix a problem first [09:15:35] New patchset: Mark Bergsma; "include lvs::configuration in role::cache::configuration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59381 [09:16:50] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59381 [09:18:04] I really need to work on unit & integration tests for puppet [09:21:00] i'll need to work on max's changes [09:22:46] i think my car is ready [09:22:54] good [09:23:04] have a safe ride and thx for the reviews/merges ! [09:23:19] thanks [09:24:52] New patchset: Hashar; "Varnish rules for Beta cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47567 [09:59:27] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [09:59:27] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [09:59:27] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [10:45:45] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [11:27:11] PROBLEM - RAID on ms-be3 is CRITICAL: Timeout while attempting connection [11:27:11] PROBLEM - RAID on stafford is CRITICAL: Timeout while attempting connection [11:27:11] PROBLEM - RAID on db9 is CRITICAL: Timeout while attempting connection [11:27:37] hashar: I saw your zuul pypi change [11:27:44] hashar: when can I get rid of that altogether? [11:28:06] paravoid: get rid? [11:28:16] paravoid: like merging them? We can do that right now if you want :-] [11:28:18] of git pull/setup.py [11:28:39] ohh [11:28:50] need to package Zuul first [11:30:32] and it turns out I packaged the wrong statsd package :D [11:33:11] New patchset: Danny B.; "cswiktionary: Set AbuseFilter notifications" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59385 [12:17:19] PROBLEM - Puppet freshness on db1058 is CRITICAL: No successful Puppet run in the last 10 hours [12:19:39] lol [12:19:46] (sorry, was in a call) [12:34:08] New patchset: Hashar; "(bug 47203) beta: disable filebackend for math" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59386 [12:35:01] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59386 [12:35:55] holly f** [12:35:59] timo had a live hack again [12:36:51] New review: Hashar; "please please merge on the fenari local repo :-]" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [12:46:42] paravoid: and I tried out cowbuilder, that is definitely faster than pbuilder [12:46:49] paravoid: it does not need to unpack a huge tar.gz :-] [12:46:50] BUT [12:47:08] that needs root access. So I can't really get Jenkins to build packages for us using that [12:54:41] hashar: do people then ask you about DCRI if they thought you wrote the encyclopedia? :-) [12:54:57] na none asked [12:55:04] huh [12:55:12] that was mostly a wikimedia only drama [12:55:17] that barely made the news [12:55:46] that's what you think [12:56:09] 1) it was on the front page of frwiki for a substantial period of time [12:56:21] 2) there's still new developments today [12:57:06] (parliament asking the Ministry to answer questions. i think the questions are public) [13:00:30] possibly [13:00:37] let me check the french news [13:02:03] so our deputy can ask questions to the government. And they are indeed published online ( http://questions.assemblee-nationale.fr/q14/14-23897QE.htm ) [13:02:24] right [13:02:42] roughly: a deputy is warning the ministry in charge of the DCRI about the deletion [13:03:31] and another deputy asked about it http://www.christianpaul.fr/Pressions-de-la-DCRI-sur-Wikimedia :) [13:03:33] \O/ [13:04:37] so the deputies are now asking the gov on which legal basis the DCRI has acted. Someone is in a deep shit :-] [13:04:51] we shall see [13:05:05] I would have taken the 48hours jailling [13:05:22] then piece of all my journalist contacts to get invited at the main news event :-] [13:05:50] but not everyone is versed in the law and media relations [13:05:55] hashar: so what's missing to have a zuul package? [13:06:17] hashar: hah. but he may have had worse repurcussions than a typical citizen... [13:06:28] paravoid: right now I am looking for a way to phase out operations/debs/python-statsd :D [13:06:36] phase out? [13:06:58] paravoid: that is the pypi module 'python-statsd' zuul require the 'statsd' one [13:08:20] ? [13:08:37] we just phased it in?! :p [13:08:58] yeah [13:09:01] I made a mistake :D [13:09:07] https://crate.io/?has_releases=on&q=statsd [13:09:21] hashar: you has mail [13:09:37] oh hahaha [13:10:02] oh jesus [13:10:12] well we don't need to 'phase out' the python- one, right? now we just need the other one? [13:10:30] jeremyb_: thx :-) [13:10:35] ottomata: exactly [13:10:38] no, this will clash in Debian [13:10:40] also [13:10:44] it's bad enough that pypi is like that [13:10:52] ottomata: but I would like to rename the repo from operations/debs/python-statsd to operations/debs/python-python-statsd [13:10:57] lol [13:10:57] ah [13:10:59] no [13:11:00] don't :) [13:11:00] i see [13:11:05] python-statsd is fine [13:11:05] or I can just phase it out [13:11:13] as a name for python-statsd [13:11:14] and reuse the existing repo [13:11:28] yeah delete it and recreate it from scratch with the same name [13:11:40] cause Debian is not going to accept both python module anyway: both of them exposes themselves as 'statsd' [13:11:46] oh they clash in python too [13:11:46] yeah lets do that [13:11:48] not just deb naming [13:11:48] also I'd suggest mailing these two people and telling them that it's completely ridiculous [13:12:01] I have mailed them both [13:12:03] and got replies [13:12:10] oh? [13:12:13] both python modules are called 'statsd' [13:12:15] import statsd [13:12:16] for both [13:12:27] what did they say? [13:12:32] I should just have pushed everything under /var/lib/jenkins and uses PYTHONPATH :-D [13:13:18] * paravoid stabs hashar [13:14:30] hashar: what did those folks say? [13:14:55] paravoid: pushed you the emails [13:15:06] basically both are stables and uses in a few projects :-D [13:16:22] ridiculous [13:17:18] welp, ha, glad to help with whatever I can when you figure out what needs to be done :p [13:17:25] zooko's triangle :-) [13:17:31] paravoid, i got those kafka .deb qs for you too :D [13:19:51] oh [13:20:11] I don't see it on gerrit? [13:20:32] no, i haven't pushed yet [13:20:38] cause i'm trying to resolve the one about the .sh scripts [13:20:48] either we don't fix it, or I have to patch them all [13:20:50] which I can do [13:20:58] but i'm not sure of the best way (once again, so many ways!) [13:22:04] the scripts do things like this: [13:22:04] base_dir=$(dirname $0)/.. [13:22:21] and then construct java CLASSPATH from $base_dir [13:22:44] so I can't just symlink or install .sh scripts in /usr/sbin, /usr/bin, whatever [13:22:56] how many scripts? [13:22:58] what do they do? [13:23:37] well, i really only need one, kafka-server-start.sh. but I am using kakfa-console-producer.sh with udp2log to send webrequest logs to kafka…so I guess that one too [13:23:47] there are a bunch more that are useful for debugging and setting stuff up [13:24:04] there are 2 producer scripts, 2 consumer scripts, and the server-start and server-stop script [13:24:05] but [13:24:13] each one of these all use a script called kafka-run-class.sh [13:24:44] which is the one responsible for setting up the classpath and running the appropriate class [13:24:56] e.g. kafka-server-start.sh: [13:24:57] did you see how tomcat does it? [13:24:59] … Faidon says not to use stdeb, but it creates a decent debian/ directory ... [13:25:00] does [13:25:00] $(dirname $0)/kafka-run-class.sh kafka.Kafka $@ [13:25:04] or other java packages? [13:25:08] (no it does not) [13:30:09] now I have to figure out what https://github.com/jsocol/pystatsd/blob/master/LICENSE license is :D [13:32:13] paravoid, i'm looking at tomcat, i don't see anything useful. i just extracted both tomcat6 and tomcat7 .debs and nothing is installed into /usr/sbin or /usr/bin [13:32:30] the initscript shells out to a script somewhere in /usr/share/tomcat... [13:33:57] hashar: MIT [13:34:15] paravoid: yeah that is what I thought. But that should be named 'Expat' according to http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#license-specification (just found out a sec ago) [13:34:17] (expat) [13:34:18] yes [13:34:22] ""There are many versions of the MIT license. Please use Expat instead, when it matches."" [13:34:27] I luuuve debian-legals [13:36:47] I need mosh too [13:38:09] New review: Krinkle; "Jenkins was stuck yesterday so I didn't know when (or if) it was merged. Deploying now.." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [13:40:45] New review: Krinkle; "Was already deployed. From what I recall Tim did deploy it on fenari right after he merged it. It di..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [13:41:29] ottomata: so, what do these scripts do? [13:41:35] apart from setting up JAVA_HOME, CLASSPATH etc.? [13:43:07] that's it, then they do java $CLASSPATH $@ blabls classname [13:43:10] okay [13:43:22] so, i'm saying [13:43:25] wanna read /usr/share/doc/javahelper/tutorial.txt.gz ? [13:43:28] i can patch them so they'd work in /usr/sbin or whever [13:44:05] looking [13:44:17] I think jarwrapper does all this [13:44:21] so [13:44:29] i'll read it, but fyi most of this is scala [13:44:31] not java [13:44:34] so the build is a bit different [13:44:41] it creates jars [13:44:47] but it doesn't use maven or poms [13:45:33] jarwrapper would work though [13:45:40] hmk, reading... [13:46:17] I don't think javahelper has anything to do with maven [13:46:23] maybe jh_makepkg does [13:46:31] but that's optional [13:46:43] (I've never used those, but they look reasonable) [13:47:01] Pick One: [13:47:01] -t --ant: Builds with ant [13:47:01] -M --maven: Builds with maven2 [13:47:01] -k --makefiles: Builds with make [13:47:01] -n --none: Create own buildsystem [13:47:44] aye, but [13:47:51] i don't know much about scala, don't think it uses javac [13:47:56] which i assume these all do [13:48:00] at somepoint [13:48:12] The scalac command compiles one (or more) Scala source file(s) and generates Java bytecode which can be executed on any http://java.sun.com/docs/books/jvms/; the Scala compiler works similarly to http://java.sun.com/j2se/1.5.0/docs/tooldocs/solaris/javac.html, the Java compiler of the http://java.sun.com/javase/. [13:48:36] the kafka project uses sbt build tool [13:48:39] http://www.scala-sbt.org/ [13:49:22] you don't have to use jh_build [13:49:28] Now running lintian... [13:49:31] Finished running lintian. [13:49:32] but look at the running apps part [13:49:35] yeahhh thanks ottomata for the tutorial :-] [13:49:48] "Runtime support" [13:49:59] hashar: please tell me you didn't use stdeb :) [13:50:15] na I used fpm [13:50:22] (HAHA, uh oh!) [13:50:23] * hashar whistles [13:50:28] paravoid, waaass so bad about stdeb if we change debian/ to be ok? [13:50:43] so what I did is that I reused the debian directory from the old python-statsd and edited it [13:50:43] that should be fine [13:50:51] it makes a nice control file at least, with stuff grabbed from the python setuptools stuff [13:50:54] ok, that should be fine, yes [13:51:03] now I have an interesting issue. Zuul requires statsd=1.0.0 and upstream is currently 2.0.1 [13:51:13] so I was wondering if out of one debian dir I could package boths :-] [13:51:15] grumbles [13:51:19] no you can't [13:51:20] python-statsd1 and python-statsd2 [13:51:24] are they incompatible? [13:51:27] debian sucks [13:51:28] ask zuul people to fix it? [13:51:29] yeah [13:51:30] fix it yourself? [13:51:37] shouldn't be too hard :) [13:51:38] :-( [13:51:40] New patchset: Mark Bergsma; "Setting X-Analytics header in vcl_deliver." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59166 [13:51:49] shouldn't be too hard to make the statsd dependency optional too [13:52:07] yup I got a patch to make it optional [13:52:11] but upstream rejected it :-] [13:52:11] I mean, this effort is all futile if you come to think about it [13:52:14] why?! [13:52:32] cause that would need some changes to their packaging infrastructure and how they handle python modules dependencies [13:52:48] ok [13:52:54] we can patch it ourselves in debian/patches/ [13:52:56] and be done with it [13:53:04] heh, mark,, i like that better too [13:53:06] upstreams aren't always sane :) [13:53:09] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59166 [13:53:19] but since the previous conditional was approved like that, i thought I should just keep it the same [13:53:22] ottomata: had to fix it to work anyway, note beresp vs resp [13:53:38] oh hm, ok [13:53:54] New patchset: Jgreen; "switching to stock drupal default.settings.php for test instance" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59391 [13:54:05] thanks! [13:54:19] btw [13:54:20] mark: have you merged this morning changes on sock puppet? My instance (deployment-cache-mobile01) bails out when trying to mount sda3 and sdd3 :/ [13:54:23] puppet-merge works nicely [13:54:35] but could you make it work when outside the right dir too? [13:54:41] so you don't have to go to /root/puppet [13:55:00] hashar: yes [13:55:02] then after that let's move it out of /root/ :-) [13:55:05] not the upload ones, but all the mobile ones [13:56:05] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59391 [13:56:57] mark I got : err: /Stage[main]/Role::Cache::Mobile/Varnish::Setup_filesystem[sda3]/Mount[/srv/sda3]: Could not evaluate: Execution of '/bin/mount -o noatime,nodiratime,nobarrier,logbufs=8 /srv/sda3' returned 32: mount: special device /dev/sda3 does not exist [13:56:57] err: /Stage[main]/Role::Cache::Mobile/Varnish::Setup_filesystem[sdb3]/Mount[/srv/sdb3]: Could not evaluate: Execution of '/bin/mount -o noatime,nodiratime,nobarrier,logbufs=8 /srv/sdb3' returned 32: mount: special device /dev/sdb3 does not exist [13:57:20] mark: which does not make sense to me since when $::realm is labs the storages should be [ 'vdb' ] not ['sda3','sdb3'] [13:57:48] yeah [13:57:51] will look in a bit [13:58:37] yeah, cool, mark we can do that [13:58:45] i had that at somepoint, since $1 takes a path [13:58:51] i can make it default to /root/puppet [13:59:19] ottomata: so? [13:59:25] so yeah [13:59:36] i'm looking, i don't fully understand, but i mean, the classpath is pretty long for all the .jars that are generated [13:59:37] hashar: I think patching zuul in debian/patches to make statsd optional is much more worth the effort than packaging statsd and all that [13:59:38] this [13:59:39] https://gist.github.com/ottomata/5396094 [13:59:41] is what it does [13:59:54] buncha for loops to find all of the compile kafka jars [13:59:58] then starts java with the passed in class name [14:00:10] paravoid: yup I even got a patch to do that in puppet (while still using setup.py though). [14:00:23] paravoid: also upstream waits for python-statsd to be packaged to get zuul packaged :-] [14:00:36] tell them to forward port it to latest statsd then :-) [14:00:53] if they're waiting for your work, they might just as well help you accomplish it [14:02:12] hashar: btw, swift supports statsd, but apparently they're embedding a StatsdClient class inside their source [14:02:47] New patchset: QChris; "Move connection limiting from gerrit's Jetty to Apache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50591 [14:03:06] brb [14:08:40] Change abandoned: Aude; "tim has moved these and other cron jobs over to terbium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58708 [14:10:21] paravoid, i'm not so sure jarwrapper will work, it looks like everything relevant is in a single jar [14:10:24] /usr/share/kafka/core/target/scala_2.8.0/kafka-0.7.2.jar [14:10:42] which means the jar Manifest won't have a single Main-Class [14:13:03] ottomata: can you add me as an owner to operations/debs/python-statsd https://gerrit.wikimedia.org/r/#/admin/projects/operations/debs/python-statsd,access [14:13:13] ottomata: I need to clean up the branches :-] [14:13:29] i don't think it lets me add individuals [14:13:31] is there a group I can add? [14:13:35] integration [14:13:37] that would work [14:13:41] k [14:13:46] one [14:13:48] done [14:14:01] New patchset: Mark Bergsma; "Varnish rules for Beta cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47567 [14:14:34] am I missing something here? :) https://gerrit.wikimedia.org/r/#/c/47567/9..10/templates/varnish/mobile-frontend.inc.vcl.erb [14:17:48] New patchset: Mark Bergsma; "Varnish rules for Beta cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/47567 [14:18:43] back [14:19:36] New patchset: Hashar; ".gitreview file" [operations/debs/python-statsd] (master) - https://gerrit.wikimedia.org/r/55028 [14:19:36] New patchset: Hashar; "packaging `statsd` python module" [operations/debs/python-statsd] (master) - https://gerrit.wikimedia.org/r/59397 [14:19:41] yeahhh [14:20:26] DIEDERIK! [14:20:41] what's the maximum number of patchsets for all gerrit changes? [14:20:43] :) [14:20:57] dunno [14:21:07] ottomata: I removed the `integration` group from operations/debs/python-statsd . Thank you! [14:21:08] go get it [14:21:14] this is an official request from ops to analytics [14:21:20] :) [14:21:36] need some more context :) [14:21:51] I was just curious, earlier, when I merged one of hashar's changes which had 33 patchsets [14:21:56] I wonder what the record is [14:22:02] I've definitely seen more [14:22:03] like 60 [14:22:04] I have seen a change with like 56 ps [14:22:05] yeah [14:22:19] New patchset: Jgreen; "modifying perms for fundraising civicrm & -dev code tree" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59398 [14:22:34] New patchset: Aude; "Adjust wikibase dispatch batch size" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59399 [14:22:48] i think i've seen 50 [14:22:52] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59398 [14:23:18] qchris: any idea how many patchsets are allowed on a change? I guess the limit is set by the field size in the DB :D [14:23:42] hashar: I think so too. But I never tried :-) [14:24:43] hashar: int(11) in MySQL. Do we need more? :-D [14:25:08] qchris: na I think a limit of int(11) to the # of patchset would be enough for mark and drdee [14:25:40] ottomata: could give me the contents of the other kafka-*sh? [14:25:55] you can recreate those wrapper scripts of course [14:26:14] New patchset: Hashar; "packaging `statsd` python module" [operations/debs/python-statsd] (master) - https://gerrit.wikimedia.org/r/59397 [14:26:14] or not :P [14:26:41] New review: Hashar; "PS2: ignores .gitreview in sourcefile." [operations/debs/python-statsd] (master) - https://gerrit.wikimedia.org/r/59397 [14:27:26] New patchset: Mark Bergsma; "Rm special-casing for root URLs of mobile sites" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58729 [14:27:57] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58729 [14:29:01] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [14:29:18] New patchset: Hashar; "packaging `statsd` python module" [operations/debs/python-statsd] (master) - https://gerrit.wikimedia.org/r/59397 [14:29:50] that is the new statsd package [14:30:07] now I am going to hack Zuul to make it supports statsd 2 or later [14:31:40] <^demon> Changes with most patchsets: 24264, 54986, 26441, 51333 and 11137. [14:31:50] <^demon> With 56, 54, 45, 43 and 41 each [14:32:00] across all repos? [14:32:02] <^demon> Yep [14:32:07] \O/ [14:32:34] ^demon: I asked you yesterday about renaming some repo. I have simply dished all its content using force push :-] Issue solved. [14:32:37] amazingly hashar was correct [14:32:38] we should have kept going, hashar [14:32:56] glad I still have a good memory [14:35:31] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 Moved Permanently - pattern not found - 944 bytes in 1.226 second response time [14:36:16] New patchset: Mark Bergsma; "Revert "Rm special-casing for root URLs of mobile sites"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59400 [14:36:24] paravoid: if you feel brave enough, you can get python-statsd build :-] https://gerrit.wikimedia.org/r/59397 That build properly on labs and dpkg -c seems to gives some useful files. [14:36:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59400 [14:38:30] RECOVERY - LVS HTTP IPv4 on m.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 20612 bytes in 0.002 second response time [14:41:30] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 Moved Permanently - pattern not found - 944 bytes in 0.854 second response time [14:42:15] New patchset: Mark Bergsma; "Simplify and cleanup the mobile host rewrites" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59401 [14:43:09] * hashar digs in python [14:43:27] do we have a statsd install by the way beside the one in ceph/swift or whatever? [14:43:30] RECOVERY - LVS HTTP IPv4 on m.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 20624 bytes in 0.004 second response time [14:43:35] I still don't know how to send metrics to our graphite install :( [14:44:49] no [14:45:06] both asher and me were independently thinking of setting up one [14:45:12] but we didn't :) [14:46:36] New patchset: Jgreen; "wikidev is a group, not a user. fail" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59402 [14:47:25] New patchset: Demon; "Remove unused junk from gerrit config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59403 [14:47:43] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59402 [14:48:30] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 Moved Permanently - pattern not found - 944 bytes in 0.002 second response time [14:48:58] how can I login to icinga again [14:49:30] RECOVERY - LVS HTTP IPv4 on m.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 20617 bytes in 0.008 second response time [14:50:33] found it [14:51:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:51:43] ok apparently i'm not authorized to schedule downtime in icinga ;) [14:51:48] <^demon> mark: Heh, so I hadn't actually looked at what the total on-disk savings for jgit gc was. I think the one week graph shows it best. [14:51:50] <^demon> http://ganglia.wikimedia.org/latest/graph.php?r=week&z=xlarge&c=Miscellaneous+eqiad&h=manganese.wikimedia.org&v=45.2&m=part_max_used&jr=&js=&vl=%25&ti=Maximum+Disk+Space+Used [14:51:53] well then dear colleagues, I'm happy to wake you up ;) [14:52:07] ^demon: heh [14:52:30] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 Moved Permanently - pattern not found - 934 bytes in 0.003 second response time [14:53:05] need anything mark? [14:53:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [14:53:28] ^ suppressed [14:56:30] RECOVERY - LVS HTTP IPv4 on m.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 20624 bytes in 0.024 second response time [14:59:07] paravoid, i'm back too, any thoughts? [14:59:14] yes [14:59:18] I had a look at the scripts [14:59:27] so the server-start/stop shouldn't exist at all [14:59:32] these belong in an init script [14:59:38] sure [14:59:39] that's how tomcat does it btw [14:59:44] have a look at that [14:59:44] ja, [14:59:46] makes sense [14:59:59] i saw that, does java jar blabla /usr/share/tomcat/... [15:00:10] but [15:00:21] in the init scrip [15:00:24] is it so bad [15:00:24] to do [15:00:35] /usr/share/kafka/bin/kafka-server-start.sh [15:00:36] ? [15:00:39] yes [15:00:39] :) [15:00:43] its just one more shell wrapper [15:00:46] and kafka-run-class.sh [15:00:51] no /usr/share/kafka/bin preferrably [15:01:07] so you want me to reproduce all of kafka-run-class.sh in an init script [15:01:14] I mean, you could do that [15:01:15] so I can do the java jar bit directly in the init? [15:01:18] it wouldn't be super bad [15:01:31] but would it be super good? [15:01:31] :) [15:01:31] but then you'll have to have all the JAVA_HOME logic and everything in the init script [15:01:44] right, exactly [15:01:57] ok, pick whichever seems more sensible to you I'd say [15:02:02] so, is it so bad to do /usr/share/kafka/bin/kafka-run-class Kafka.kafka in the init script? [15:02:25] New patchset: Jgreen; "fix civi ./files directory privs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59406 [15:02:55] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59406 [15:03:05] I'm not terribly excited about the assumptions that this makes [15:03:12] but let's try it if you want [15:03:36] well, i guess my point is [15:03:44] why is kafka-run-class.sh ok and kafka-server-start.sh not? [15:05:47] oh it's the same [15:05:55] I'm not terribly excited about all of these shell scripts :) [15:07:13] sooooooooo then I can use kafka-server-start.sh from the init script? [15:07:29] I guess... [15:07:40] if it were me I'd probably replace all of them with my own shell scripts [15:07:50] but feel free to reuse them [15:07:55] if you prefer that [15:08:24] i prefer to do as little custom stuff as possible, unless its really necessary [15:08:41] but, ok! [15:08:50] what's "custom"? :) [15:08:57] it's just a bunch of badly written shell scripts [15:09:17] custom would be me writing my own [15:09:19] half of Debian is "custom" like that :) [15:09:33] i agree they are bad, and I don't mind patching them if I have to I guess [15:09:40] but let's not get philosophical [15:09:42] i could make them better, read env vars, etc. [15:09:46] ahah [15:10:15] /usr/share/kafka/bin is poor taste imho [15:10:30] we can iterate though [15:12:53] hm, oo, paravoid, what should the release version be? [15:13:04] 0.7.2-1~wmf1 ? something like that [15:13:08] precise-wikimedia distribution? [15:15:27] New patchset: Ottomata; "Initial debian packaging using git-buildpackage" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/53170 [15:15:38] yes [15:15:40] New review: Ottomata; "(14 comments)" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/53170 [15:16:18] New patchset: Ottomata; "Initial debian packaging using git-buildpackage" [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/53170 [15:16:23] ok ready. [15:16:53] Change abandoned: Ottomata; "Dunno why I pushed a different Change-Id before." [operations/debs/kafka] (master) - https://gerrit.wikimedia.org/r/59008 [15:26:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:27:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [15:30:37] New patchset: Demon; "Gerrit, now with 50% more memory" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59411 [15:38:25] New review: Faidon; "(15 comments)" [operations/debs/kafka] (master) C: -1; - https://gerrit.wikimedia.org/r/53170 [15:38:54] ottomata: ^ [15:39:22] reading danke! [15:39:53] paravoid, I am generally knowledgeless about license stuff [15:42:20] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [15:43:00] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [15:50:00] paravoid, about the debian/links file [15:50:05] This links /usr/share/kafka/config -> /etc/kafka. Why is that bad? /etc/kafka has the real files. You said this was ok when we were discussing how the hadoop package does stuff. [15:50:32] why is this needed? [15:50:34] aaaaaand, some of the .sh scripts refer to $(dirname $0)/../config [15:50:37] why can't kafka read from /etc/? [15:50:39] aaargh. [15:50:47] sucky shell scripts as I said [15:50:49] so, the server-start init script stuff [15:51:00] doesn't takes in the config file as an arg [15:51:01] so no prob [15:51:03] but some of them read from that [15:51:10] i know they are sucky, but it would be nice if they worked, even if we aren't using them [15:51:20] why? [15:51:33] some are useful for testing, the shell ones for example [15:51:44] its nice to fire up a producer shell in one screen, and a consumer in another [15:51:52] and just type messages in, [15:51:56] /usr/share/foo/bin isn't for users to run [15:51:58] (or admins) [15:52:00] good for testing failover, etc. [15:52:09] if that's the purpose, then they shouldn't be there [15:52:32] so, should I patch them all then? [15:52:55] i was thiinking of making them all check for KAFKA_HOME and KAFKA_CONFIG, and defaulting to the dirname $0 stuff if those weren't set [15:53:00] then they could live in /usr ... [15:53:08] /usr? [15:53:11] /usr/sbin you mean? [15:53:15] yeah ... [15:53:25] /usr/bin, /usr/sbin [15:53:27] KAFKA_HOME wouldn't be set there either? [15:53:29] i assume producer/consumer in /usr/bin [15:53:49] KAFKA_HOME could be set wherever? [15:54:20] or I could patch them [15:54:22] I don't understand :) [15:54:26] and make them default to /usr/share/kafka [15:54:29] either way [15:55:00] i guess the question is: [15:55:35] do I have to patch the shell scripts to make them workable from /usr/sbin,/usr/bin. [15:55:35] if so which ones: do I install them all? what about kafka-run-class.sh? [15:55:35] if so, what patch system should I use (and learn about)? [15:55:43] heh, question*S* [15:56:06] that's an easy question :) [15:56:12] mkdir debian/patches [15:56:19] put patches in debian/patches/foo.diff [15:56:23] that's quilt? [15:56:28] and list them, one per line in debian/patches/series [15:56:32] kind of [15:56:34] it is quilt [15:56:42] but it's now reimplented/embedded into the debian source package [15:56:49] (this is what debian/source/format is about) [15:56:52] k, what about the other gbp related patch stuff? [15:56:55] so you don't need anything special in debian/rules [15:57:14] http://honk.sigxcpu.org/projects/git-buildpackage/manual-html/gbp.building.html#GBP.BUILDING.PATCH [15:57:34] and [15:57:34] https://honk.sigxcpu.org/piki/development/debian_packages_in_git/ [15:57:36] that's gbp-pq [15:57:45] that's for creating the patches in git, then exporting them to debian/patches [15:58:00] I see little point tbh, I've tried using it and it confused me [15:58:47] I'd probably rewrite the shell scripts instead of patching them btw [15:59:11] rewrite and do what? [15:59:15] commit to our repo? [15:59:34] make new ones that are saved in debian/ dir? with custom rules to install? [16:00:15] yes [16:00:25] hmmmm [16:00:28] !g 47567 [16:00:29] https://gerrit.wikimedia.org/r/#q,47567,n,z [16:00:49] but don't be afraid of debian/patches, it's fairly common [16:00:52] (and easy) [16:01:14] ja, i started looking into it yesterday, seemed not bad [16:01:46] sigh, i mean, sigh, i would like to rewrite these too, since they are pretty dumb [16:02:01] but then again, there are 15 of them [16:02:02] New review: Hashar; "I have deployed the change on the beta cluster. http://en.m.wikipedia.beta.wmflabs.org/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59401 [16:02:11] i probably would only like to have 5 or 6 maybe [16:02:51] its just, yargh, i mean, i guess cost vs. benifit, right? [16:02:58] how much do we really gain from putting all that work in? [16:03:36] it works as is, symlinking config -> /etc/kafka isn't that baaaad, is it? occasionally running scripts from /usr/share/kafka isn't that bad, is it? the .jars are stored in /usr/share/kafka anyway, why is it so bad to have wrapper scripts there? [16:04:52] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [16:04:54] sigh but i see your point [16:04:56] yargh [16:05:18] are we really having this discussion for 10 3 to 5 line shell scripts? :) [16:05:34] or 15 or whatever [16:05:57] yeah, its more than just the lines, its figuring out the best way to do things…shoudl I make a kafka-env.sh? what dir should the scripts default to using? what about /etc/default? [16:06:27] ok [16:06:36] how about: [16:06:49] i patch all the scripts so that they use the env vars, which default to /usr/share/kafka and /etc/kafka [16:07:20] and I install the relevant scripts to /usr/s?bin [16:07:51] should I not install /usr/share/kafka/bin then? [16:07:58] correct [16:08:06] ergh, yeah, and kafka-run-class.sh, that's ok to go in /usr/bin? [16:08:16] no :) [16:08:20] where should that go? [16:08:24] nowhere? :P [16:08:31] copy paste it all into each script? [16:08:46] I'd probably create a single script that would case based on $0 [16:08:56] or something like that [16:09:08] kafka-run-class finds the .jars and adds them to CLASSPATH [16:10:06] oh you mean a single script that has subcommands for each [16:10:08] eehhhhhhhhhh [16:10:10] i dunno [16:10:37] hmmmmmmm, hmmmm [16:10:38] hmmm [16:10:39] hmm [16:10:58] maybe. [16:11:02] ok, lunchtime [16:11:05] paravoid, thanks, i'll thikn about that [16:11:15] and either do that or come back and say "but but but but!" :) [16:11:18] back in a bit [16:12:10] okay :) [16:13:02] New patchset: Hashar; "attempt to fix role::cache::mobile on labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59419 [16:13:06] I am out of ideas :5 [16:13:37] mark: so somehow my role::cache::mobile instance selects the wrong storage disks. I noticed a missing trailing commas which might fix it https://gerrit.wikimedia.org/r/#/c/59419/ [16:13:45] mark: though that would really be a nasty puppet bug [16:14:43] hashar: seems weird that that would fix it [16:14:50] have you checked the value of $::realm? [16:14:53] worth trying out ? [16:14:57] yeah the $::realm is correct [16:15:02] looking at /etc/wikimedia-realm [16:20:15] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59419 [16:20:37] merged [16:20:42] looking [16:24:11] mark: that did not fix it :-] I give up for tonight, will look at it again tomorrow [16:24:16] * hashar waves at everyone, good night! [16:31:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:32:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [16:37:09] !log aaron synchronized php-1.22wmf2/includes/OutputPage.php 'deployed 997ab92458946d10fed96ff4ba41c74eeaee0d20' [16:37:18] Logged the message, Master [16:39:31] !log aaron synchronized php-1.22wmf2/includes/cache/BacklinkCache.php 'deployed 1a94124385df4c452c68fadaa74bc2714a204761' [16:39:39] Logged the message, Master [16:40:38] !log aaron synchronized php-1.22wmf2/includes/db/Database.php 'deployed fa819f8e3579855f6d95552655de98f92ec1789c' [16:40:46] Logged the message, Master [16:42:23] !log aaron synchronized php-1.22wmf2/includes/WikiPage.php 'deployed 38d60cdafe036625a6ec8fa3a9e8a94ef450c85c' [16:42:31] Logged the message, Master [16:56:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:57:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [17:09:08] New patchset: Aaron Schulz; "Enabled 1:1 profiling for cli scripts and put "cli" in the profile ID." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59425 [17:11:43] hey paravoid, what is the policy with respect to high-quality PPAs? do we just import the packages into our repo? [17:12:27] there's no clear policy, but yes, this can be okay [17:12:39] we have reprepro set up to support remotes too [17:12:46] reprepro is the apt repo tool [17:12:57] see puppet ./files/misc/reprepro/updates [17:14:04] which PPA would that be? I can have a look [17:14:34] https://launchpad.net/~jtaylor/+archive/ipython [17:14:46] New patchset: Jgreen; "replacing ssh key for jgreen" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59427 [17:14:47] the ppa maintainer (julian taylor) is a debian maintainer [17:15:47] why do you need a newer ipython? [17:16:56] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59427 [17:17:02] ipython notebook (which is the primary reason) had its file format changed and its interface redone [17:18:45] i've tried to use the version in apt but it is quite buggy [17:19:59] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [17:20:01] ideally we'd just add the PPA as a remote, since what I'm building on top of it will take some time to build and I'd like to keep up with development, which is currently proceeding at a rapid clip [17:21:10] which service is this for? [17:21:18] does it exist in labs? [17:22:52] it's a data analysis interface for eventlogging. it does exist in labs, yes. there's an eventlogging project with a script to generate fake but plausible-looking data, to simulate production [17:23:53] the fake data is adequate for building and testing the lower-level stuff, but as far as actually having a data analysis environment, i kind of need to work with the actual data. the idea is that this will never be exposed publicly -- it'll run on vanadium, which has no public interface, and you'll need access to the cluster to reach it. [17:24:17] okay [17:24:23] sounds reasonable [17:24:38] is it also private data? [17:24:46] yep. [17:25:18] ok [17:25:52] so, yeah, we can take the PPA, or the version from raring [17:25:54] and put it up in apt [17:26:13] New review: Kaldari; "You're right. Ori changed it to be enabled on test instead of en.wiki, which is probably a good idea..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57649 [17:26:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [17:27:36] Hi; I just pushed 2 commits to Gerrit, to deploy (https://gerrit.wikimedia.org/r/#/c/59424/ & https://gerrit.wikimedia.org/r/#/c/59426/), but it looks like jenkins is not kicking in [17:27:45] any suggestions on what to do next? [17:28:09] mlitn: I can see it run [17:28:18] https://integration.wikimedia.org/zuul [17:28:30] The queue is a bit backlogged sometimes [17:28:58] oh, both or still running [17:29:04] Yeah [17:29:19] ok, thanks [17:33:34] New patchset: Kaldari; "Turning on footer contact link for en.wiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59430 [17:36:25] New patchset: Jgreen; "forgot to ensure=>absent the old jgreen ssh key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59431 [17:36:52] New review: Daniel Kinzler; "yes please, we need that!" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59425 [17:37:06] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59431 [17:40:27] New patchset: Jgreen; "syntax error" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59432 [17:41:04] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59432 [17:44:46] New patchset: Andrew Bogott; "Use ensure => latest for ganglia." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59434 [17:45:20] Who has worked on our ganglia setup? LeslieCarr? [17:45:28] i have [17:45:41] Any thoughts on whether ^^ will break lots of things? [17:45:55] don't use ensure => latest [17:46:07] as a rule don't use it for anything [17:46:08] Heh, I figured. [17:46:21] So, how shall I get a working version of ganglia onto new instances? [17:46:27] I can use => 3.5.0 [17:46:40] so on new instances ensure => present will install the latest [17:46:48] why do you need a newer version? [17:46:51] did we upgrade? [17:46:54] I don't think so, because the image already has ganglia-monitor. [17:46:55] Ryan and i last discussed ensure => latest on broadway between 13th and 14th. west side of the street heading south. :-P [17:47:03] So ensure => present does nothing [17:47:17] oh. is this because the new image installs the wrong thing? [17:47:22] if so I'll fix the image [17:47:30] Ryan_Lane: I am assuming (perhaps wrongly) that the new image contains ganglia at the outset, and the old image did not. [17:47:41] New patchset: Pyoungmeister; "adding db51 as a snapshot host for s4" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59436 [17:47:50] Yeah, fixing the image would do it, if that's easy. Just install 3.5 or don't install it at all and let puppet take care of things. [17:47:55] do we have a custom version our repo? [17:48:02] I think so... [17:48:06] and I'm installing the one from ubuntu's? [17:48:55] Um… here's what I know so far: A new image has gmond 3.1.7 which doesn't work. A 'sudo apt-get install ganglia-monitor' gets gmond 3.5.0 which works. [17:49:03] I haven't poked Brewster to see what the story is with 3.5.0 [17:49:17] But… this suggests we use a custom version: https://bugzilla.wikimedia.org/show_bug.cgi?id=37201 [17:50:00] Change abandoned: Andrew Bogott; "Going to install the right version from the outset rather than relying on => latestg" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59434 [17:50:10] New review: Hashar; "and I did the rebase on fenari." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59034 [17:50:27] i'll check out brewster now [17:50:44] thx [17:50:51] any other problems to fix in the image now? [17:50:58] while I'm building a new one? :) [17:51:20] hrm, we have 3.5.0 and 3.3.5-2 in brewster [17:51:24] ok [17:51:26] esy enough [17:51:28] both custom built [17:51:33] I just moved the stage it gets installed [17:51:57] I need to puppetize the vmbuilder stuff I've done [17:52:03] and write some docs [17:52:33] Ryan_Lane: I think that's it. There's some weirdness with puppet certs but I don't think that's related to the image. [17:52:54] with puppetmaster::self, right? [17:53:02] not with normal certs? [17:53:07] Yeah -- I'm talking to otto about that. [17:53:11] New patchset: Ottomata; "Not generating puppetmaster self certificates until puppet.conf is compiled." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59437 [17:53:12] paravoid: cool, should I request that in RT, or is the process in version control (and thus I should submit a patch)? [17:53:13] * Ryan_Lane nods [17:53:19] I don't think that's related [17:53:31] ori-l: RT is fine :) [17:53:43] paravoid: thanks! [17:53:48] ori-l: I'd do it now, but I'm busy with something else now [17:53:54] sorry [17:53:55] I agree. The behavior with normal certs is probably correct… it did confuse otto into thinking that his change wasn't a regression (because the same behavior that he caused happens normally on a young instance.) [17:54:22] hi wha? [17:54:35] oh right. ja [17:54:50] but something is def weird with puppetmaster::self right now, of course my fault since I touched it…on it [17:54:56] heh [17:55:16] i think its the compile puppet.conf exec…i've had weirdness with that... [17:55:18] why do we do that, btw? [17:55:25] can't we just render a puppet.conf from a puppet.conf.erb? [17:55:38] because puppet can't do iteration [17:55:52] New patchset: Ottomata; "Not generating puppetmaster self certificates until puppet.conf is compiled." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59437 [17:55:54] and it can't easily modify the same resource from multiple places [17:55:59] hmmm, true [17:56:05] and clients and masters need that file [17:56:05] hm [17:56:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:58:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [17:59:02] New patchset: Pyoungmeister; "s4: db31 out, db72 in, db51 new snapshot. s5: db35 out, db73 in" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59438 [18:01:19] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59437 [18:03:55] mh, https://integration.mediawiki.org/ci/job/mediawiki-core-phpcs-HEAD/7855/ is pretty slow, already 16 mins, output is still at 00:00:17.182 [18:04:56] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58832 [18:05:04] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59436 [18:05:24] gate-and-submit jobs have been running for almost 30min now - is that "normal"? [18:06:27] .. and apparently just resulted in a failed, unrelated to the commit, phpunit-databaseless test (FileBackendTest::testLockCalls PHP_Invoker_TimeoutException: Execution aborted after 30 seconds) [18:07:15] ^demon|away: ^^ [18:07:33] mlitn: well, was going pretty slow for Jeff_Green [18:09:20] mlitn: this is blocking your deploy, yes? [18:09:59] mlitn, with our state of CI, it's safe to ignore tests on deployment branches [18:10:15] greg-g: yes; unless someone can confirm it's safe to ignore jenkins [18:10:15] furthermore, it's a waste of time _not_ to ignore;) [18:10:32] MaxSem: understood, thanks ;) [18:10:48] ugh [18:11:03] chrismcmahon: ^^ in case you have any thoughts on the failing jenkins [18:12:01] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59430 [18:12:31] !log bsitu synchronized wmf-config/CommonSettings.php 'Enable email bundling for test2wiki and mediawikiwiki' [18:12:39] Logged the message, Master [18:13:01] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable email bundling for test2wiki and mediawikiwiki' [18:13:08] Logged the message, Master [18:16:34] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Turning on footer contact link for en.wiki' [18:16:42] Logged the message, Master [18:18:13] New patchset: Kaldari; "Adding comment for wmgUseFooterContactLink in config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59441 [18:24:09] ottomata, ori-l-away: , Maryana and I notice that server-side event logging stopped on April 8, no new events in latest ServerSideAccountCreation and PageContentSaveComplete tables since then. [18:24:45] That's the same day as https://gerrit.wikimedia.org/r/#/c/58122/ , $wgEventLoggingFile: emery => vanadium [18:25:30] !log mlitn synchronized php-1.22wmf1/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [18:25:38] Logged the message, Master [18:25:51] !log mlitn synchronized php-1.22wmf2/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [18:25:58] Logged the message, Master [18:30:07] New patchset: Pyoungmeister; "s4: db31 out, db72 in, db51 new snapshot. s5: db35 out, db73 in" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59438 [18:31:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:32:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [18:33:59] !log aaron synchronized php-1.22wmf2/includes/cache/HTMLCacheUpdate.php 'deployed 344ddf85b6d7b79c40a1b6b9ec7e0a1e05c0705b' [18:34:06] Logged the message, Master [18:35:44] the new contact link on enwiki does not work. It is a plain wikilink instead of a weblink [18:36:35] can somebody pls do mass delete? [18:37:46] I'm planning to run a maintenance script (to purge some memcache) on terbium, if that's ok? [18:38:31] Aaron|home: ^^ [18:38:47] bsitu: ? [18:38:56] how often does job queue run on test2wiki? [18:40:44] bsitu: if it has jobs it has as much as chance of running as enwiki [18:42:39] Aaron|home: I am testing on test2 and a job was supposed to run 10 minutes ago, can I manually trigger on fernari: mwscript runJobs.php test2wiki? [18:43:45] New patchset: Demon; "Redirect redirect.(php|phtml) to index.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59443 [18:44:28] New patchset: Asher; "pulling db1058 from s1" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59444 [18:45:18] I don't see any jobs for test2wiki [18:46:29] Danny_B: huh? [18:47:08] Aaron|home: okay, maybe a problem on my end, how do you view a list of jobs? [18:47:27] jeremyb_: what? [18:47:35] 16 18:36:36 < Danny_B> can somebody pls do mass delete? [18:47:39] you can get the totals via mwscript showJobs.php test2wiki --group [18:47:53] if you really need to look at the jobs, you'd need eval.php [18:48:06] jeremyb_: yes, we need to delete nearly 400 pages, and it doesn't make a sense to do it manually obviously [18:48:20] Danny_B: use a bot? [18:48:21] $q = JobQueueGroup::singleton()->get( 'null' ); [18:48:23] foreach ( $q->getAllQueuedJobs() as $k => $v ) { var_dump( $k, $v ); } [18:48:28] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59444 [18:48:30] or something like that [18:48:56] jeremyb_: a) no admin has a bot b) obviously server side deletion is much faster as well as less bandwidth consuming [18:49:13] !log asher synchronized wmf-config/db-eqiad.php 'pulling db1058 from s1' [18:49:20] Logged the message, Master [18:49:25] Danny_B: huh? so run one on an admin account or grant temporary sysop [18:49:39] Aaron|home: thx [18:49:42] Danny_B: it's really unbelievable that you could possibly cite bandwidth as a reason [18:49:59] lol [18:50:22] I deleted 200 pages in a couple minutes with PWB on a wiki a couple days ago [18:50:26] jeremyb_: not everybody is on optic fibre [18:50:36] we have sysops deleting dozen thousands wikis with PWB [18:51:00] 400 pages is really a low number [18:51:33] if u need a bot for that I can make one... [18:51:59] petanb|bnc-fu: nepotrebuju bota, ale aby to nekdo smazal jednim prikazem na serveru, uz jsme to takhle delali nekolikrat [18:52:07] New patchset: Asher; "reprovisioning db1058 as s5 mariadb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59447 [18:52:09] Danny_B: uhhhh, use 56k for all i care [18:52:26] bandwidth is still no excuse [18:52:51] Danny_B kdy to bylo? [18:52:53] so what if it takes 20 or 30 mins? (which I think is probably way more than it would be.) that's fine [18:53:19] jeremyb_ 400 pages is like 2 minutes for a bot? :P [18:53:35] with addshore bot speed [18:53:52] ? [18:54:39] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59447 [18:55:08] petanb|bnc-fu: whats the ping for? :P [18:55:26] ill do a mass delete ;p [18:56:09] Danny_B you see ^ :D [18:56:14] addshore can delete anything [18:56:27] 400 is nothing :P will take 30 seconds ;p [18:57:06] New patchset: MaxSem; "Enable resource variance on m.mediawiki.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59449 [18:57:25] addshore: he was saying not everyone's on fiber. so i countered with i don't care if you're on 56k, bandwidth is still no excuse [18:57:55] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59441 [18:59:03] hi mark, in approximately 2 hours we plan to flip mobile caching on mediawiki.org - will you be around? [19:02:20] !log streaming s5 hotbackup from db45 to db1058 [19:02:27] Logged the message, Master [19:04:05] Danny_B, what pages? [19:04:06] hey ^demon; opened bugzilla ticket for gerrit/github custom name mapping: https://bugzilla.wikimedia.org/show_bug.cgi?id=47274 [19:05:00] New patchset: Asher; "pulling db1005 for upgrade" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59450 [19:05:40] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59450 [19:06:53] !log asher synchronized wmf-config/db-eqiad.php 'pulling db1005 from s5 for upgrade' [19:07:00] Logged the message, Master [19:09:24] New patchset: Asher; "upgrading db1005" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59452 [19:14:17] New patchset: Dzahn; "add account abaso and add to mortals (RT-4956)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59453 [19:15:49] New review: Dzahn; "merge after manager approval on RT-4956" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/59453 [19:17:17] PROBLEM - mysqld processes on db1005 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [19:18:03] !log upgrading db1005 to precise + mariadb [19:18:10] Logged the message, Master [19:18:57] MaxSem: I can keep an eye out [19:19:03] New patchset: Dzahn; "do not quote booleans (true/false) or string "false" could actually be true" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59454 [19:19:07] but this is just mediawiki.org right, it probably won't move the needle ;) [19:21:57] New patchset: Matthias Mullie; "Re-enable AFTv5 on enwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59458 [19:23:17] PROBLEM - DPKG on db1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:29:57] PROBLEM - Host db1005 is DOWN: PING CRITICAL - Packet loss = 100% [19:30:29] LeslieCarr: ping [19:31:37] RECOVERY - Host db1005 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [19:32:17] RECOVERY - DPKG on db1005 is OK: All packages OK [19:32:34] !log pgehres synchronized php-1.22wmf1/extensions/CentralAuth/maintenance/migrateAccount.php 'Updating CentralAuth/migrateAccount.php' [19:32:41] Logged the message, Master [19:32:54] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59452 [19:38:25] !log pgehres synchronized php-1.22wmf2/extensions/CentralAuth/maintenance/migrateAccount.php 'Updating CentralAuth/migrateAccount.php' [19:38:34] Logged the message, Master [19:41:17] what is the right way to suspend puppet runs on some host for a short while? [19:41:48] I'm trying to debug some problem and I'd like to make config changes by hand and not have puppet overwrite them until I'm sure I know what is going on [19:42:18] 'service puppet stop'? [19:42:32] puppetd --disable [19:42:45] mark: thanks [19:44:24] RECOVERY - mysqld processes on db1005 is OK: PROCS OK: 1 process with command name mysqld [19:47:21] New patchset: Aaron Schulz; "Enabled 1:1 profiling for cli scripts and put "cli" in the profile ID." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59425 [19:47:31] Izbusyz: what's up [19:47:34] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [19:48:17] LeslieCarr: PM [19:48:33] Izbusyz: don't see your pm ? [19:48:49] still sending ;) one moment [19:52:31] !log pgehres synchronized php-1.22wmf1/extensions/CentralAuth/maintenance/migrateAccount.php 'Updating CentralAuth/migrateAccount.php' [19:52:55] Logged the message, Master [19:53:36] !log pgehres synchronized php-1.22wmf2/extensions/CentralAuth/maintenance/migrateAccount.php 'Updating CentralAuth/migrateAccount.php' [19:54:03] Logged the message, Master [19:54:44] PROBLEM - Puppetmaster HTTPS on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:34] RECOVERY - Puppetmaster HTTPS on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.155 second response time [19:59:10] pgehres, have you done deploying? [19:59:20] MaxSem: all yours [19:59:24] cheers [20:00:04] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [20:00:04] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [20:00:04] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [20:02:34] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [20:03:44] PROBLEM - Puppetmaster HTTPS on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:05:12] New patchset: Ottomata; "Subscribing puppetmaster::ssl to compile puppet exec." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59510 [20:05:22] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59510 [20:07:32] RECOVERY - Puppetmaster HTTPS on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 2.673 second response time [20:10:38] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58551 [20:10:41] PROBLEM - Puppetmaster HTTPS on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:11:01] RECOVERY - Puppet freshness on db1058 is OK: puppet ran at Tue Apr 16 20:10:51 UTC 2013 [20:11:32] RECOVERY - Puppetmaster HTTPS on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.506 second response time [20:14:41] PROBLEM - Puppetmaster HTTPS on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:31] RECOVERY - Puppetmaster HTTPS on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.183 second response time [20:41:40] PROBLEM - Puppetmaster HTTPS on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:43:30] RECOVERY - Puppetmaster HTTPS on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.136 second response time [20:46:00] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [20:47:36] * MaxSem is scapping [20:49:10] New patchset: Ori.livneh; "Split up udp2zmq routers to allow counter to be added to 8421" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59517 [20:49:36] New patchset: Ottomata; "Defaulting to /root/puppet for puppet-merge basedir if $1 not given." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59518 [20:50:03] New patchset: Ori.livneh; "Split up udp2zmq routers to allow counter to be added to 8421" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59517 [20:50:17] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59518 [20:51:02] New patchset: Ottomata; "Fixing comment in puppet-merge" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59519 [20:51:20] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59517 [20:52:57] New review: awjrichards; "Good catch on checking for the URL template!" [operations/mediawiki-config] (master); V: 1 C: 1; - https://gerrit.wikimedia.org/r/59449 [20:57:35] New patchset: Aaron Schulz; "Include job recycling stats on gdash." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59521 [20:58:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:59:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [21:13:06] !log maxsem Started syncing Wikimedia installation... : Weekly mobile deployment [21:13:15] Logged the message, Master [21:15:14] New patchset: Pyoungmeister; "s4: db31 out, db72 in, db51 new snapshot. s5: db35 out, db73 in" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59438 [21:26:47] notpeter: https://gerrit.wikimedia.org/r/#/c/59399/ [21:27:28] Aaron|home: sweet patchset, bro [21:27:35] er, I mean, i can deploy that for you [21:27:54] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59399 [21:37:54] New patchset: Ottomata; "Attempting to solve puppet cert generate issue by using my own exec rather than puppetmaster::ssl's." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59526 [21:38:26] New patchset: Ori.livneh; "MongoDB on vanadium: bind all interfaces" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59527 [21:39:29] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59526 [21:43:01] !log maxsem Finished syncing Wikimedia installation... : Weekly mobile deployment [21:43:09] Logged the message, Master [21:46:08] !log maxsem synchronized php-1.22wmf1/extensions/ZeroRatedMobileAccess/ 'https://gerrit.wikimedia.org/r/#/c/59525/' [21:46:16] Logged the message, Master [21:46:54] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59449 [21:49:21] !log maxsem synchronized wmf-config 'https://gerrit.wikimedia.org/r/59449' [21:49:29] Logged the message, Master [21:51:27] mark, the new caching is live on m.mediawiki.org [21:51:51] what is the Vary header that is sent for that now? [21:52:24] New patchset: Aaron Schulz; "Enabled 1:1 profiling for cli scripts and put "cli" in the profile ID." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59425 [21:53:39] mark, Accept-Encoding,X-WAP,Cookie,X-Carrier,X-Subdomain,X-Images for HTML and Accept-Encoding,X-Device for bits [21:54:06] thanks [21:54:24] I see it indeed [21:57:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:58:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [21:59:37] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [22:01:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:02:10] !log anomie synchronized php-1.22wmf1/extensions/WikimediaMaintenance 'Update WikimediaMaintenance to master for l10nupdate fix' [22:02:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [22:02:18] Logged the message, Master [22:02:32] !log anomie synchronized php-1.22wmf2/extensions/WikimediaMaintenance 'Update WikimediaMaintenance to master for l10nupdate fix' [22:02:37] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [22:02:39] Logged the message, Master [22:04:30] TimStarling- I just finished deploying the WikimediaMaintenance change to 1.22wmf1 and 1.22wmf2. Care to merge https://gerrit.wikimedia.org/r/#/c/58911/ now? [22:06:19] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59527 [22:07:40] Or anyone, really. [22:07:56] Izbusyz: do you have some headers from that functionaries mail? in particular i'm wondering if there's an indication of what OS it is [22:08:23] I can look [22:09:06] i.e. i want to tell them to try s_client. but probably that's not a serious option. OTOH, if they're using that ISP then who knows [22:09:30] !change 57752 | apergos [22:09:30] apergos: https://gerrit.wikimedia.org/r/#q,57752,n,z [22:09:38] !change 57753 | apergos [22:09:38] apergos: https://gerrit.wikimedia.org/r/#q,57753,n,z [22:09:53] s there an rt or bz number btw/ [22:10:08] ah I see em [22:12:45] jeremyb_: only have the Webmail service version. [22:12:53] Izbusyz: :( [22:13:04] New patchset: Ram; "Bug: 47293 Further reduce noise in log files" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/59533 [22:13:14] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57752 [22:14:12] apergos - Help me with https://gerrit.wikimedia.org/r/#/c/58911/ ? [22:14:18] New review: RobH; "Opsen: Please do not merge this patchset until after the 3 day review period for an access request." [operations/puppet] (production) C: -2; - https://gerrit.wikimedia.org/r/59453 [22:15:29] anomie: I'm in the middle of this other bit I'm afraid [22:15:41] (still not done with this two part merge) [22:15:58] jeremyb_: ya, unless it's encoded somewhere I can't see it, there isn't any info on it [22:16:09] ok. Still 44 minutes left in my deployment window. [22:17:09] New review: RobH; "clarification. changeset looks good, just don't push until the wait period is over (friday)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59453 [22:17:40] mwalker: just an fyi that anomie is still working on his 3pm deploy, hopefully will be done by 4. I'll let you two coordinate the switch over. [22:17:42] Izbusyz: yeah, not sure what to do... i guess almost any system can use wireshark (mac/windows/linux) so we could tell him to make a dump while attempting us and while attempting another site that does work [22:18:08] shiney [22:18:22] anomie: just ping me when you're done [22:18:36] ya wireshark might at least hint where it's going. [22:18:46] greg-g, mwalker - At the moment, I'm waiting for someone who can merge and deploy https://gerrit.wikimedia.org/r/#/c/58911/ for me [22:20:09] anomie: someone who is in a better timeezone should do you, because they should stick around after (it's 1 am here) [22:20:27] on it [22:20:32] thanks mwalker [22:20:45] binasher: notpeter: any ideas about RT 4934? [22:21:05] anomie. greg-g: actually; I don't have rights on that repo [22:21:49] Ryan_Lane: want to review anomie's puppet change? [22:22:08] mwalker: what's your wikitech user name? [22:22:16] mwalker [22:22:16] you wanted the ability to give shell access, right? [22:22:18] ok [22:22:25] apergos- I guess I'll have to wait for TimStarling. I pinged him first, but he hasn't responded yet. [22:22:27] I'm at a conference [22:22:34] what gerrit change is this? [22:22:43] and yes; I'll happily take shell review rights [22:22:56] the change is https://gerrit.wikimedia.org/r/#/c/58911/ [22:23:01] added you to the group [22:23:11] ok. sorry about that [22:23:38] you think 58911 is urgent? [22:23:42] I probably shouldn't review this right now [22:23:47] I'm at a conference.... [22:23:50] win 28 [22:24:14] Ryan_Lane: that's cool; tim just joined us [22:24:23] * Ryan_Lane nods [22:24:39] TimStarling - Everyone seems concerned about getting bug 27320 fixed. [22:25:03] ah I see [22:25:12] TimStarling- I just finished deploying the WikimediaMaintenance change to 1.22wmf1 and 1.22wmf2. Care to merge https://gerrit.wikimedia.org/r/#/c/58911/ now? [22:25:28] the hyphen caused xchat's highlight to not match [22:25:46] with a space before the hyphen it works [22:26:07] hah [22:26:41] TimStarling: I can't get xchat to insert the space before the hyphen on tab completion. Does a colon ping you? [22:26:45] New patchset: Tim Starling; "l10nupdate: Use refreshMessageBlobs.php instead of clearMessageBlobs.php" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58911 [22:27:03] yes, colon works [22:27:19] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58911 [22:27:47] wow, jenkins still can't manage a small mediawiki-config change from hours ago [22:28:04] several rebases too [22:28:45] Aaron|home: maybe 3 day waiting period allows jenkins to catch up? :-) [22:29:18] PROBLEM - mysqld processes on db1005 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [22:30:48] PROBLEM - Host db1005 is DOWN: PING CRITICAL - Packet loss = 100% [22:31:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:32:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [22:32:24] TimStarling: How long does it take for the change to show up on fenari? [22:33:13] I ran puppet manually, it's almost done [22:33:14] anomie: there's a world readable /var/log/puppet.log on most boxes (maybe fenari too?) [22:33:16] done [22:33:23] notice: Finished catalog run in 64.20 seconds [22:33:48] should I test it? [22:34:02] TimStarling: I was going to, but if you want to you can. [22:34:18] RECOVERY - Host db1005 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [22:35:55] ok, I'm running it [22:37:09] !log running l10nupdate manually on fenari to test change 58911 [22:37:17] Logged the message, Master [22:38:19] RECOVERY - mysqld processes on db1005 is OK: PROCS OK: 1 process with command name mysqld [22:39:30] So the l10nupdate will probably take about half an hour (judging by how the nightly l10nupdates have been running). Then we'll see how the new code does. mwalker, if whatever you're going to do won't be disturbed by l10nupdate running in the background, you can probably go ahead. [22:39:57] awesome, thanks anomie / TimStarling . [22:40:19] Tim did you see that Gerrit patch to move wikibugs from #mediawiki to #wikimedia-dev? There's a request in there for you to transfer channel ops flags too if you can. [22:40:32] well, after a mere hour and 25 minutes, my rebase of a config change is still being reviewed by jenkins [22:40:38] hurm; it might... I was just going to sync-dir an extension; (and wasn't planning on scapping it) but there are updates to the i18n file -- do you think that's going to bork thigns? [22:40:41] yes, I saw that gerrit patch [22:40:47] I love how thorough that jenkins is! [22:40:55] mwalker: it's fine [22:41:24] TimStarling: cool beals; thanks :) [22:44:04] Warning: file_get_contents(/var/lib/l10nupdate/mediawiki/extensions/WikimediaMessages/WikimediaLabsMessages.i18n.php): failed to open stream: No such file or directory in /home/wikipedia/common/php-1.22wmf1/extensions/LocalisationUpdate/LocalisationUpdate.class.php on line 255 [22:44:30] some cross-version issue? [22:45:11] okay, so do you the commands to do that transfer? [22:45:19] doesn't matter, I guess [22:50:49] New patchset: Ottomata; "Explicitly defining relationship between cert generate and compile puppet.conf" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59541 [22:51:08] !log LocalisationUpdate completed (1.22wmf1) at Tue Apr 16 22:51:08 UTC 2013 [22:51:16] Logged the message, Master [22:51:44] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59541 [22:53:18] TimStarling, anomie; I'm ready to rebase my changes into the php dirs on fenari -- am I still good to go ahead and do that and sync-dir? [22:54:06] if there are any new messages then you probably want to do scap rather than sync-dir [22:54:23] but yes, go ahead [22:54:50] I'm not really sure what changed in the i18n file -- I'm mostly just trying to get a new fundraiser data source populated... the i18n sync can wait for the train :p [22:54:54] but kk! merging [22:55:54] anomie: so, post l10nupdate completion, are we good? [22:56:42] greg-g: Well, it should be running the update for 1.22wmf2 right now. Then we'll see what happens with the new code. [22:56:49] I killed it, the dsh part failed [22:56:58] ? [22:57:34] not related to MessageBlobStore scripts [22:58:51] !log mwalker synchronized php-1.22wmf1/extensions/ContributionReporting/ [22:58:59] Logged the message, Master [22:59:30] !log mwalker synchronized php-1.22wmf2/extensions/ContributionReporting/ 'Contribution reporting now pulls 2013 data' [22:59:38] Logged the message, Master [23:00:40] how is login of the l10nupdate user meant to work exactly? [23:00:41] New patchset: Ottomata; "Ok, trying to require the 10-self.conf file as well" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59542 [23:00:50] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59542 [23:01:26] TimStarling, anomie, awjr, greg-g: OK; my update is done and I'm out of fenari [23:01:41] thanks mwalker - greg-g still waiting on patchsets to merge [23:01:43] ori-l: let's merge the EventLogging change on the blog? i don't see anything wrong with it if that is just like on wikis [23:01:55] awjr: ugh [23:04:23] awjr: well, you're the last person in the LD queue today, so, unless TimStarling needs to do something weird to kick this l10n issue, go for it when ready. I'm headed out now. [23:04:36] thanks a lot greg-g [23:04:45] np [23:05:01] good luck with jenkins [23:05:07] heh thanks [23:05:11] anomie: have you run l10nupdate manually before? [23:05:26] New patchset: Asher; "returning sqstat to emery" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59543 [23:05:31] btw, is anyone looking into the crappy acting jenkins issue? [23:05:59] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59543 [23:06:19] TimStarling: Yes. Not that I remember much about it, it just runs and takes a long time. [23:06:28] I'll take that as a no and ping ^demon|gone tomorrow about it [23:07:03] I'm making progress now... [23:07:55] greg-g: you should ping hashar first [23:10:07] mutante: fine by me [23:11:17] binasher: did you my gdash patch? [23:11:57] Aaron|home: no.. what/where? [23:12:23] https://gerrit.wikimedia.org/r/#/c/59521/ [23:12:57] <^demon|gone> greg-g: Can you be more specific? [23:13:56] ^demon|gone: jenkins is failing way too often (even the git clone is failing) and also taking forever too [23:14:45] serious issues across at least 2 unrelated repos [23:15:11] <^demon|gone> zuul is confused. [23:17:38] TimStarling anomie im about ready to sync some MobileFrontend changes - that ok? [23:18:32] yes, go ahead [23:18:42] I will wait until the rush is over before I try anything disruptive [23:18:56] cool thanks - i should just be a few minutes [23:21:41] <^demon|gone> jeremyb_: So, repos seem to be cloning from /var/lib/zuul/git/*, but I believe that path is wrong. [23:21:48] !log awjrichards synchronized php-1.22wmf1/extensions/MobileFrontend/ [23:21:48] <^demon|gone> I replicate to /var/lib/git/* [23:21:55] Logged the message, Master [23:22:28] !log awjrichards synchronized php-1.22wmf2/extensions/MobileFrontend/ [23:22:35] Logged the message, Master [23:24:15] New patchset: Ottomata; "Reverting back to declaring dependencies between compile puppet.conf and puppetmaster::ssl class." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59552 [23:24:42] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59552 [23:26:04] TimStarling: all done [23:26:17] thanks [23:27:38] Apr 16 23:27:15 mw19 sshd[23389]: Connection closed by 208.80.152.165 [preauth] [23:27:41] what does preauth mean? [23:30:32] I'm pretty sure that the sync part of the LU script has been broken for months [23:30:46] it doesn't show any error message when it fails, so it's not obvious [23:31:04] what I'm not sure of is why when I test it, sometimes it fails one way and sometimes another [23:31:18] but I'm pretty sure there's no way it can possibly work [23:31:49] TimStarling: it does a !log that says if it failed or not. do you think that's inaccurate? [23:32:01] yes, it's definitely inaccurate [23:32:16] \o/ [23:34:34] it's as if the l10nupdate account is sometimes locked and sometimes not [23:34:36] New patchset: MaxSem; "Rm unused feedback stuff" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59553 [23:34:38] but it fails either way [23:35:02] I'm trying to find the definition of locked in the glibc manual [23:35:41] what do you mean locked? [23:36:12] normally i hear lock and i just think scrambled (or prefixed with junk) shadow passwd [23:36:18] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59521 [23:37:20] yes, according to man sshd, if the password hash is prefixed with "!" then it's locked [23:37:49] but maybe that stopped being true at some point because my own account just has "!" as its hash [23:37:59] maybe it needs to be "!!" [23:38:18] sigh [23:38:33] usermod -L prefixes the hash with a "!" [23:39:22] New patchset: Asher; "returning db1005, adding db1058 to s5" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59554 [23:39:30] http://paste.tstarling.com/p/CBxDNt.html [23:39:34] do you see how spooky this is? [23:39:40] I haven't done anything in the meantime [23:40:34] its shell is /bin/false so it doesn't echo anything when authentication succeeds [23:40:45] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59554 [23:40:49] that's not the mystery, the mystery is why it randomly alternates between working and not working [23:41:23] it's a little-known feature of SSH. if you fail authentication four times it assumes you just forgot your key and lets you through. [23:41:34] TimStarling: where is your agent running? [23:41:55] AWS [23:42:22] but it shouldn't be using an agent [23:42:42] and /var/log/auth.log says that when it's denied, it's preauth, not based on a key [23:43:18] <^demon|gone> jeremyb_: See gerrit change #59556 [23:43:28] New patchset: Reedy; "cswiktionary: Set AbuseFilter notifications" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59385 [23:43:37] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59385 [23:44:06] <^demon|gone> jeremyb_: I dunno what hashar wants though. I'm going to e-mail him though. Until now, I guess we just ignore jenkins :\ [23:44:07] it consistently doesn't work for me [23:44:21] New patchset: Reedy; "(bug 47166) Enable Extension:Collection on sh.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58986 [23:44:28] TimStarling: make it "sudo -u l10nupdate ssh -vvv mw19 echo" and pastebin the output on success [23:44:31] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58986 [23:44:54] New patchset: Reedy; "(bug 47204) Remove zh-mo from $wgDisabledVariants for zhwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59055 [23:44:59] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59425 [23:45:03] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/59055 [23:45:03] ^demon|gone: yeah, ok. will try to read in a bit [23:45:27] when I say randomly, I mean it consistently does one thing for 10 minutes or so, then consistently does the other thing for 10 minutes [23:46:20] !log aaron synchronized wmf-config/StartProfiler.php 'Enabled 1:1 profiling for cli scripts and put "cli" in the profile ID' [23:46:28] Logged the message, Master [23:47:59] !log reedy synchronized wmf-config/abusefilter.php [23:48:06] Logged the message, Master [23:48:38] http://en.wikipedia.org/wiki/Swiss_roll (thank you Special:Random) :) [23:48:58] TimStarling: try now? [23:49:02] !log reedy synchronized wmf-config/InitialiseSettings.php [23:49:09] Logged the message, Master [23:49:17] yes, works at the moment [23:50:11] /home/l10nupdate/.ssh was owned by uid 998, i chown'ed to l10nupdate. something will probably chown it back to 998 soon, based on the behavior you saw [23:50:59] right [23:51:16] makes sense, I already noticed there are two separate classes creating that user in puppet [23:54:26] WTF just happened to fenari [23:54:35] !log aaron cleared profiling data [23:54:43] Logged the message, Master [23:54:45] I copied a 1GB file to my home dir, it seemed to be fine, and now it's just hangin [23:56:46] !log asher synchronized wmf-config/db-eqiad.php 'returning db1005, adding db1058 to s5' [23:56:53] Logged the message, Master [23:57:06] oh, so it's your fault [23:57:22] binasher: https://gdash.wikimedia.org/dashboards/jobq/ purty [23:57:27] there was a spike in system CPU, but it's stopped now [23:57:51] Aaron|home: oooooh [23:58:27] !log asher synchronized wmf-config/db-eqiad.php 'raising db1058 weight' [23:58:34] Logged the message, Master [23:58:48] Change abandoned: Andrew Bogott; "Too hacky!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59161 [23:58:52] binasher: that tends to measure OOMs/sec ;) [23:58:59] Change abandoned: Andrew Bogott; "Depends on a hack" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59163 [23:59:10] New patchset: Andrew Bogott; "Added role::labs-lamp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/59561 [23:59:23] Aaron|home: those seem to have gone up a bunch recently [23:59:33] wikidata jobs especially, though i haven't looked lately