[00:00:28] !log updating OpenStackManager to r114724 on virt0 [00:00:30] Logged the message, Master [00:02:24] hm [00:02:30] well, that surely isn't working [00:05:33] bah. fucking live hacks [00:05:37] I never checked that in [00:12:18] !log updating OpenStackManager to r114726 on virt0 [00:12:20] Logged the message, Master [00:19:31] !log updating OpenStackManager to r114728 on virt0 [00:19:33] Logged the message, Master [00:24:52] !log updating OpenStackManager to r114729 on virt0 [00:24:54] Logged the message, Master [00:33:36] !log updating OpenStackManager to r114730 on virt0 [00:33:38] Logged the message, Master [00:36:50] PROBLEM - Puppet freshness on db59 is CRITICAL: Puppet has not run in the last 10 hours [00:46:53] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [00:46:53] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [01:01:53] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [01:01:53] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [02:03:18] PROBLEM - udp2log processes on locke is CRITICAL: CRITICAL: filters absent: /a/squid/fundraising/bi-filter, [02:05:24] RECOVERY - udp2log processes on locke is OK: OK: all filters present [02:18:09] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [02:35:06] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [02:40:03] PROBLEM - Host lvs5 is DOWN: PING CRITICAL - Packet loss = 100% [02:43:21] PROBLEM - BGP status on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, sessions up: 7, down: 1, shutdown: 0BRPeering with AS64600 not established - BR [03:35:46] RECOVERY - Puppet freshness on db9 is OK: puppet ran at Thu Apr 5 03:35:20 UTC 2012 [03:46:25] PROBLEM - Puppet freshness on sq34 is CRITICAL: Puppet has not run in the last 10 hours [04:42:22] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [05:51:14] PROBLEM - MySQL Slave Delay on db24 is CRITICAL: CRIT replication delay 202 seconds [05:51:23] PROBLEM - MySQL Replication Heartbeat on db24 is CRITICAL: CRIT replication delay 206 seconds [05:59:51] RECOVERY - MySQL Replication Heartbeat on db24 is OK: OK replication delay 0 seconds [06:00:09] RECOVERY - MySQL Slave Delay on db24 is OK: OK replication delay 0 seconds [09:19:51] New review: Hashar; "Needs a few more tweaks." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/3285 [10:35:05] I need a sysadmin that can do some tweaks to OTRS - any suggestions? :) [10:35:13] eeuurghh [10:35:27] so I thought otrs was in the middle of being shuffled around. maybe it's just being talked about [10:35:34] anyways robh I think knows about this [10:35:39] and maybe [10:35:42] * apergos tries to remember [10:35:44] mutante? [10:35:50] what tweaks do you need? [10:36:10] so there's a 1 click spam button apergos on queue view that will move things to the "Junk" queue [10:36:14] yes [10:36:24] I'd like that to be duplicated for "1 click junk" to move to Junk (non spam) [10:36:25] (I have otrs queue from being a volunteer ;-) ) [10:36:35] ah [10:36:45] geez I have no idea how that stuff works. whatsoever [10:36:54] and also if possible, put both of those buttons in the message view - it's currently only in the queue view [10:37:06] but getting a button in queue view is more important than the message view one at the moment ;) [10:37:58] right [10:38:41] PROBLEM - Puppet freshness on db59 is CRITICAL: Puppet has not run in the last 10 hours [10:39:25] there's a whole list of improvements at https://otrs-wiki.wikimedia.org/wiki/OTRS_technical_challenges apergos - but I think the 1 click spam is probably going to be the most useful [10:40:49] ah, this is not going to be a five minute task [10:40:50] this is [10:40:56] make a patch similar to [10:40:58] http://svn.wikimedia.org/svnroot/mediawiki/trunk/otrs/patches/50-one-click-spam.patch [10:41:03] build and test package [10:41:04] dpleoy [10:41:27] but in the back of my mind I think that there is a migration to a newer version or a different platform or something in the works [10:41:33] * apergos wishes they had a memory that didn't suck [10:42:06] whenone of rob or mutante shows up we can find out [10:44:09] it would be nice to get enhancement requests like these somewhere a bit more public [10:44:51] there's a component for it in bugzilla [10:45:02] guillom: did you think that the upgrade's been stalled until later this year? [10:45:33] I bet that in fact no developer pays any attention to that page [10:46:09] I know Jeff or someone else was working with an OTRS expert to upgrade to 3.0, but I don't know if the upgrade has been explictly postponed [10:46:13] https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&list_id=105276&component=OTRS&resolution=---&resolution=FIXED&resolution=INVALID&resolution=WONTFIX&resolution=LATER&resolution=DUPLICATE&resolution=WORKSFORME&product=Wikimedia [10:46:19] these need a walkthrough someday too [10:46:27] I guess they aren't all open, just a sec [10:47:10] https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&list_id=105278&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=VERIFIED&component=OTRS&resolution=---&product=Wikimedia [10:47:13] much shorter list [10:47:36] so maybe adding the top few enhancements to the list would be good [10:48:13] apergos: FYI: https://rt.wikimedia.org/Ticket/Display.html?id=452 [10:48:35] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [10:48:35] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [10:49:03] ok, so at least it's somewhere in the pipeline [10:49:31] legal and/or philippe should probably be pinged again, I expect jeff knows what is going on [10:49:58] heh both my guesses about right people were wrong [10:49:59] I believe I'm right in saying Philippe met the OTRS guy in Berlin to discuss it [10:50:04] Thehelpfulone: so apparently they're still working on it, but waiting for an NDA to be signed with the volunteer OTRS expert who's helping us [10:50:17] that was done a while back guillom, apparently [10:50:18] Ah, I haven't heard about that [10:50:43] grrr [10:50:49] if only there were public rt queues [10:51:06] so what I was looking at and you can 't see Thehelpfulone is discussion about the nda [10:51:08] according to philippe, it was in the week beginning Monday 13th February [10:51:17] last update is Jeff_Green saying the process is stalled, as of mid march [10:51:35] so I would check in with him and see what's going on [10:52:01] okay [10:52:06] heh, break down in communication! [10:52:23] he is a us person so it will be some hours yet before he's online [10:52:46] yep [10:52:50] those americans! ;) [10:52:54] hah [10:53:05] I'm a "those americans", I'm just in a different timezone :-D [10:53:38] ok well I know those were not the tweaks you were looking for, [10:54:03] but anyways hopefully you'll be able to get them scheduled [10:54:29] yep, I'll look through that huge list to see what's most important [10:54:38] ok cool [11:00:57] guillom: is it okay to copy/paste those bugs from https://otrs-wiki.wikimedia.org/wiki/OTRS_technical_challenges - if I remove who said it? [11:01:32] probably; use your judgment :) [11:02:01] will do :) [11:03:44] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [11:03:44] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [12:20:08] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [12:27:24] !log search1 and search4 seem to be dead. restarting lsearchd [12:27:28] Logged the message, notpeter [12:29:38] bah, I did not see that or I would hav erestarted them [12:29:40] I'm sorry [12:30:05] apergos: it's ok :) [12:30:25] they weren't alerting in nagios, because theonly check we have is a tcp 8123 port check [12:30:32] (imporving this is on my todo list) [12:30:33] I got one of the search hosts yesterday, I forget which one [12:30:42] awesome! [12:30:44] but only cause I happened to see it scroll by in here [12:30:51] but yeah, tcp 8123 up! serving no traffic :/ [12:30:56] ok yeah ugh [12:31:16] yeah, I only notice because I start every day by looking at ganglia at this point :) [12:31:27] ;-) [12:31:39] see my warnings come via other channels so I loko at those instead :-D [12:32:44] oh, you mean your other channels actually tell you when there are problems, instead of having to divine them from graphs? [12:32:47] some day.... [12:32:50] I too will be like that [12:34:10] I mean I get emails [12:34:19] RobH: just checking in on ssds for eqiad search? [12:34:24] when things fail it's either a cron job or a job that emails me directly [12:34:33] ah, gotcha [12:35:02] the other thing I do is look at open screen sessions on a couple hosts where I have stuff half done and no notification system yet [12:35:06] (things in testing or whatever) [12:35:27] oh this stuff about new slaves is external cluster [12:35:32] I see [12:35:46] * apergos is reading up on yesterday's irc log and going to get one ready [12:37:01] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [12:37:28] I dunno why I thought it was some other type of slave :-D [12:38:11] we're not *that* kind of imperialist institution. we're the good kind! I promise! [12:38:30] riiiggghhht [12:38:56] when have white males ever led people astray on this point? [12:38:57] oh. [12:38:59] wait.... [12:40:54] I see, I'm going to get up to the rsync of the snapshot and then come back in two days [12:40:55] fine [12:43:39] Jeff_Green: you around? [12:43:53] yes, just arriving [12:44:15] sweet. I was gonna say "let the testing continue!" [12:44:24] sounds good [12:44:45] I'm going to try to get back to working on pediapress stuff today, but I can help out in parallel [12:44:59] Jeff_Green: sweet [12:45:19] notpeter: so some came in, but not all of them, and the brackets are in route [12:45:40] i will chase down whats goin on with them later today [12:46:09] my basic plan is: retest pool 2/3, just to be safe, point them at eqiad, wait an hour or so, test en, point at eqiad, then prefix host for en, then *, then *.prefix [12:46:15] ok [12:46:31] so if I can just poke you for occasional testing and sanity checking, that would be awesome [12:46:38] sure [12:46:51] RobH: awesome! rough eta until installed? a week? [12:47:05] why don't we just do some full-rig sweep tests and see how it goes? it's not that much more painful than a single-pool run [12:47:16] sure, sounds good! [12:47:16] dunno until i track them down, but whatever day they arrive, i will install them that day or the following ;] [12:47:26] i would hpe less than a week unless they fubar'd the shipment [12:47:32] RobH: great! mostly just curious [12:47:37] kk, cool [12:50:05] notpeter: starting /opt/searchqa/bin/api_sweep_test -t 10 -l -m 100 [12:50:11] sweet [13:11:51] Jeff_Green: those results look good. [13:12:00] the fails look like mostly timeouts in pmtpa [13:12:06] what are the percentages? [13:12:19] it's going to be a while [13:12:42] it's running only 10 threads to keep things mellow, it's about 20% complete [13:12:57] oh! [13:12:57] ok [13:12:57] although lemme see, I can get you a mid-run status now that I think of it [13:13:06] nah, that's cool [13:13:50] the only things I'm seeing are a lot of 500's on 10.2.1.12 [13:13:58] but that's pmtpa just being failful right? [13:14:28] some 500s from 10.2.1.13 too [13:14:45] apart from that things look good [13:18:44] I think based on the early results I wouldn't hesitate to enfire the live testing [13:20:37] sweet. I'm going to dump in pool2 and 3 now, as those were known good yesterday and don't look like they've died overnight [13:20:46] k [13:21:50] !log pointing de, fr, ja, es, ru, nl, pl, pt, zh, and sv search at eqiad [13:21:52] Logged the message, notpeter [13:25:47] apergos? es1004 and es1002 are being reslaved. rsync still runnning [13:25:55] uh huh [13:26:05] I'm just starting to set up the snap for sn1003 [13:26:08] es1003 [13:26:14] k [13:26:35] hashar: i'm here .. what was it that should be done on gallium [13:27:24] ton of stuff :-D [13:27:43] oh.. [13:29:46] apergos: oh, about OTRS tweaks, i dont really know [13:29:57] sok jeff should know [13:30:17] alright [13:31:21] notpeter: final results are in [13:31:36] i'll post 'em [13:32:21] hashar: how about i start with a bunch of package upgrades ?;9 [13:32:33] from ubuntu? sure go ahead [13:32:37] yea [13:32:45] notpeter: http://greenspoons.com/sqa_results.txt [13:33:28] !log installing package upgrades on gallium. apache,apt,postgres,php5-*,ruby,...various libs [13:33:29] Logged the message, Master [13:36:04] hashar: postgre AND mysql on same host? [13:36:11] yes [13:36:14] ok [13:36:15] Jeff_Green: any idea why pool3 was so awful? [13:36:16] under my responsability [13:36:31] though I need to poke ops from time to time since I am not root there :-] [13:36:35] pool3 at pmtpa? it just kept timing out [13:36:44] <^demon> mutante: Just fyi, those aren't databases that are stored and need backing up. They're just created on the fly for testing. [13:36:56] hashar: ok, what else did you need [13:37:01] ok, ^demon [13:37:08] maybe there's a crapped-out host at pmtpa? I havne't looked closer [13:37:13] Jeff_Green: oh! that would explain the low match rate... [13:37:18] mutante: phpunit upgrade. Let me find the RT ticket [13:37:32] ah, yeah [13:37:37] Jeff_Green: yeah, I'm a bit worried about search7, tbh [13:37:43] on zhwiki, svwiki, ruwiki, plwiki etc? [13:37:48] mutante: RT 2737 https://rt.wikimedia.org/Ticket/Display.html?id=2737 [13:38:06] yeah [13:38:15] that would also explain the long response time [13:38:22] mutante: I think PHPUnit got installed using the Ubuntu package which provide an outdated version of PHPUnit [13:38:43] notpeter: the poor scores were due to timeouts at pmtpa, so as long as the indexes on disk are fresh at eqiad I have far more faith in that side [13:38:56] yeah [13:39:06] mutante: instead we want to use php PEAR to download the latest PHPUnit and thus bypass Ubuntu package :D [13:39:11] I'm going to restart lsearchd on search7 anyway, while it's not getting traffic [13:40:35] k [13:41:07] and 96 match on en seems reasonable [13:41:23] like, the same up to some minor size differences and such [13:41:30] yeah totally, it's all doc size differences so I'm sure it's just due to index timing [13:41:43] yep! [13:41:47] hashar: there has been discussion about installing stuff from PEAR, like it is a third-party repo. But i'll upgrade it based on the fact that it does NOT appear like it was installed from the Ubuntu package before anyways and via pear in the first place [13:42:12] mutante: it was installed with ubuntu package [13:42:27] there is no package phpunit installed though [13:42:33] mutante: but I am not willing to spend two days backporting the PHPUnit package from a recent ubuntu distribution :-D [13:42:36] ohhh [13:43:02] so maybe it was installed with pear :-)))))))))) [13:43:26] !log gallium - upgrading pear [13:43:28] Logged the message, Master [13:43:31] yeahhh [13:44:13] <^demon> We installed it from pear in the beginning. The ubuntu repo copy is always woefully out of date. [13:44:19] <^demon> *sigh* package maintainers. [13:44:31] lets drop ubuntu and use LFS [13:44:50] uff [13:45:16] Jeff_Green: ?? [13:45:38] reponding to the upvotes for LFS and bleeding edge [13:45:42] !log gallium - upgraded phpunit and php_codesniffer via pear (have been installed via pear before, distro outdated) [13:45:44] Logged the message, Master [13:45:46] heh [13:45:51] hashar: PHP_CodeSniffer-1.3.3.tgz [13:46:04] mutante: phpunit --version : PHP Fatal error: Call to undefined method PHP_CodeCoverage_Filter::getInstance() in /usr/bin/phpunit on line 39 [13:46:12] * Jeff_Green likes not having to think about the status of package we rely on all day every day,and thanks the ubuntu folks for doing that for us [13:46:16] <^demon> Jeff_Green: I'm not asking for bleeding edge here, just latest stable that ubuntu fails to ship. [13:46:40] they do do that, it's true [13:46:41] ^demon: be glad we're not running debian :-P [13:46:46] mutante: I think we need to have everything updated [13:46:48] sometimes for years! [13:47:08] mutante: so just "pear upgrade" [13:47:29] Jeff_Green: instead we get to wait for the debian maintainers to make a package and for that to get ported to ubuntu! :) [13:47:30] <^demon> Jeff_Green: And there's a difference between running the bleeding edge of everything and upgrading carefully when we need new features. [13:47:35] isn't this where someone is supposed to insert a snide remark about volunteering to integration-test the latest stable? [13:47:44] <^demon> In this case, we actually need 3.6 and there's no way to get it other than PEAR or installing it manually. [13:48:01] you guys could make your own package ;] [13:48:09] OH NO PLEASE [13:48:13] not it! [13:48:16] PROBLEM - Puppet freshness on sq34 is CRITICAL: Puppet has not run in the last 10 hours [13:48:42] I would be glad to provide packages if someone find me a nice tutorial / script to easily backport packages from latest ubuntu to whatever version we run currently [13:48:44] and when it's made and we introduce it to the rig, we have to take ownership of maintaining it as long as we use it [13:48:53] !log gallium - upgraded all pear packages [13:48:55] Logged the message, Master [13:49:08] <^demon> Gee, I'd love to write a package. But see this git migration has been keeping me awfully busy. [13:49:13] mutante: great!!! let me run a test [13:49:27] i was just stirring up shit to do it guys ;] [13:49:35] heheh me to, sorta [13:49:50] ;))) [13:50:16] <^demon> All this being said...PEAR is an awful package management system and I wish we didn't have to use it. [13:50:24] mutante: test running https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/395/console [13:50:31] would it be possible to avoid needing 3.6? [13:50:32] and i was about to say "pear provider for puppet" :P [13:50:41] ^demon: lets migrate from PHP to node.js :-D [13:51:08] <^demon> Jeff_Green: There's a couple of new features we want in 3.6--qchris is writing some dump-related tests that take advantage of it. [13:51:22] <^demon> Testing of command-line scripts, for one. [13:51:32] i see [13:51:42] Jeff_Green: we need PHPUnit 3.6 to be able to test output of command line script :-( [13:51:49] 3.5 does not have anything to handle that [13:52:20] notpeter: i think your drives may have arrived i am going into eqiad now to check it out [13:52:28] will let you know shortly ;] [13:52:31] RobH: woo woo! [13:52:44] mutante: I think you can close https://rt.wikimedia.org/Ticket/Display.html?id=2737 now :) [13:53:00] maybe it's available prepackaged from one of backport repos? [13:53:11] if that's the case we could fetch and drop it in our own repo [13:54:54] that is somehow what I said before. We could be backporting Ubuntu packages [13:55:11] ah, i missed backscroll [13:55:12] I have not found any tutorial to do so though :-/ [13:55:26] and overall, I prefer having someone to su; pear upgrade; [13:56:27] that is faster than finding the backport, build it in labs, having the debs sent to subversion, harass someone to publish the package on apt, update puppet, have it merged in production then puppet ran :-] [13:56:31] * Jeff_Green goes to read backscroll [13:56:51] hashar: ok, done [13:56:59] it is totally worth it for the main cluster, but probably not on gallium which is some kind of a special machine. [13:57:11] OR, we could use some PEAR/puppet integration [13:57:14] that would be great [13:57:22] something like: include pear:phpunit :-] [13:57:31] kill does not seem to be doing the trick for mysql on es1003. [13:57:31] but gallium being a labs instance isnt an option? [13:57:58] I hesitate to -9 it even if we are usin a snapshot etc [13:58:04] mutante: eventually we will probably move it to labs yes [13:58:14] apergos: maplebed suggested -9 and thats what i used then [13:58:32] but that was on es1004 which was broken anyways [13:58:45] <^demon> hashar: Even if we don't make it generic, having gallium's pear stuff puppetized should be done. [13:58:53] hashar: moving it to labs sounds like a good way to avoid the third-party repo discussion.. at least until now [13:59:17] mutante: wanna upgrade jenkins ? :D [13:59:22] RT is https://rt.wikimedia.org/Ticket/Display.html?id=2041 [13:59:31] ^demon: https://gist.github.com/305778 [14:00:10] jenkins is build using an apt package though [14:00:19] !log pointing enwiki and enwiki.prefix at eqiad search cluster [14:00:21] http://apt.wikimedia.org/wikimedia/pool/universe/j/jenkins/ [14:00:21] Logged the message, notpeter [14:00:47] hashar: jenkins is already the newest version [14:00:53] <^demon> mutante: Oooh :) [14:01:32] FUD!!!!!!!!! [14:01:36] hashar: ok, reading ticket first [14:01:41] :))) [14:01:59] on gallium it uses apt.wm.org which has an outdated package [14:01:59] http://apt.wikimedia.org/wikimedia/pool/universe/j/jenkins/ [14:02:11] so we need to have that .deb updated from upstream [14:02:32] 1.458 currently [14:02:42] I have NO idea how that deb is built though [14:02:55] huh [14:03:05] mysqladmin shutdown did it when kill wouldn't [14:03:07] who knows [14:03:08] most likely it's just a rebuild of an existing ubuntu or debian package with no changes [14:03:36] <^demon> hashar: Who did that last time? [14:03:40] mark: it looks like it is just a copy of upstream deb package [14:03:45] ^demon: was going to ask you :-D [14:03:59] <^demon> I can't remember. [14:04:07] well... that looks fine [14:04:58] !log created labs account for cneubauer [14:05:00] Logged the message, Master [14:06:43] any reason I would get logged out of wikipedia sevrel times today? [14:06:55] PROBLEM - MySQL slave status on es1003 is CRITICAL: CRITICAL: Lost connection to MySQL server at reading initial communication packet, system error: 111 [14:07:02] hold on hashar, i think we just pushed the existing package as suggested [14:08:43] PROBLEM - MySQL replication status on es1003 is CRITICAL: (Return code of 255 is out of bounds) [14:15:55] mutante: so I have downloaded the jenkins package from upstream AND from apt.wm.org [14:16:01] mutante: they both have the same md5 sum [14:16:34] so I guess we should just copy the latest .deb ( http://pkg.jenkins-ci.org/debian/binary/jenkins_1.458_all.deb ) into http://apt.wikimedia.org/wikimedia/pool/universe/j/jenkins/ [14:16:58] it should be rebuilt first [14:28:47] mark: even if it is for "all" platforms anyways? i found the old ticket meanwhile, it was just imported with "reprepro -C universe includedeb lucid-wikimedia" [14:29:54] ah, it's a script? [14:29:57] then it may not be necessary [14:30:22] yea, i think this is why we ended up just doing it that way last time [14:30:28] php? [14:30:36] java [14:30:43] it is a Java .war [14:30:56] ah right [14:31:04] ok, import it then [14:31:08] yea, that .war file, and only that, i remember now [14:31:11] kk [14:32:28] hhhmmm, searches for "Bhkui wsxbdfdfv'dsfvsrfvdfv.slvsbkfdsv jlvfsd. Ivsfm,hkexfju dwf6!)..).):):):):7374hfflflfdhfjfjfjf$;$;$;$;$;&;&.$4jendncd" are failing on the new enwiki search infrastructure. but I think I'm ok with that ;) [14:33:04] I mean, I know that that cat walking across a keyboard has as much of a right to search on wikipedia as anyone else... but... [14:33:38] <^demon> Searching for that should easter egg to searching for "Cat on keyboard" [14:33:50] heh [14:33:56] as long as there's an article with that title! [14:34:01] !log importing jenkins_1.458_all.deb to wikipedia apt repo and upgrading it on gallium [14:34:02] Logged the message, Master [14:34:13] that would be a sweet error handler [14:34:16] hashar, ^demon , do you want to keep or overwrite your jenkins config? [14:34:30] <^demon> Keep, I assume? [14:34:37] /etc/default/jenkins that is [14:35:05] the diff is: AJP_PORT=-1 and PREFIX=/jenkins [14:35:13] the PREFIX was for that redirect afair [14:35:16] ok, keeping [14:35:48] oh, yea, JENKINS_ARGS also includes --prefix=/ci [14:36:06] <^demon> Yeah, we definitely need that one in JENKINS_ARGS [14:36:17] /ci is indeed our prefix [14:36:18] its upgraded [14:36:31] URL being https://integration.mediawiki.org/ci/ [14:36:43] <^demon> Stacktraces, whee [14:36:48] yep, remember we switched that back and forth in teh beginning [14:38:52] at hudson.plugins.git.GitTool.onLoaded(GitTool.java:74) ... sigh? [14:39:11] looking at /var/log/jenkins [14:39:19] there must be a faulty plugin that need an update [14:41:21] <^demon> Do we know what plugin is busted? [14:41:31] should be listed on , but not loading http://gallium.wikimedia.org:8080/pluginManager/installed [14:42:48] <^demon> :8080 is denied except via localhost. [14:43:19] ^demon: which plugin, i think "GitTool" or something, given the message at hudson.plugins.git.GitTool.onLoaded(GitTool.java:74) [14:43:34] <^demon> I'm trying to figure out how to disable it manually. [14:43:36] I think too [14:43:49] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [14:43:56] bleh, wtf is up with my nick not being available. [14:44:43] ^demon: wild guess: replace "install" with "remove" or similar in: "java -jar jenkins-cli.jar -s http://localhost:8080 install-plugin" [14:44:49] Rob_H: I'm squatting on it [14:44:52] no, but really [14:45:08] its some server side timeout [14:45:16] i guess someone tried to use it a bunch and its locked for awhile [14:45:20] <^demon> Yeah, I'm trying to figure out how to disable the plugin so we can at least get in, then update. [14:45:22] cuz its not online to ghost it [14:45:23] uh, I know that I said "feel free to turn off any search server you like" but, just to be extra super clear, that is not the case today [14:45:45] notpeter: thats directed at me i imagine? [14:45:48] yes [14:45:51] heh [14:45:54] I figured you knew this [14:45:59] well, when would be best to do the ssds? [14:46:05] but..... I really don't want to miscommunicate about it [14:46:08] why did you guys push eqiad live ;p [14:46:14] tomorrow? [14:46:17] we're testing [14:46:21] im not coming down here two days in a row ;] [14:46:23] so monday [14:46:28] sure, sounds good [14:46:36] ^demon: "After stopping Hudson/Jenkins, go to your HUDSON_HOME/plugins directory and remove both the .hpi file and the folder with the same name. " [14:46:37] thank you! [14:46:38] mutante: ^demon I think I found out how to disable plugin [14:46:41] ah are you pounding on ciscos today? [14:46:54] only when mark is around to assist [14:46:57] its blocked until then. [14:47:10] mutante: ^demon: touch /var/lib/jenkins/plugins/git.hpi.disabled [14:47:12] it seems silly for me to work with cisco when i have not had someone with more experience than me atleast take a look at it. [14:47:36] mutante: ^demon then I had sudo /etc/init.d/jenkins restart [14:47:55] i had to come onsite because we have a bunch of boxes in shipping [14:48:10] mmm new toys? [14:48:10] hashar: ^demon i did that, and now its down in a different way [14:48:17] and if i dont get them, they will either charge us to store them, or charge us to deliver to cage [14:48:28] and both of those come out to about my take home pay for a day =P [14:49:22] hashar: ^demon , check again :) [14:49:24] mutante: well I had it working at one point [14:49:41] yeahhhhhh [14:49:43] \O/ [14:49:47] hashar: it works now, we just started/restarted at the same time or so, we should stop working on it the same time :) [14:49:52] going to run the plugin upgrade now [14:51:06] !log gallium - disabled incompatible GitTool plugin on jenkins and restarted it [14:51:08] Logged the message, Master [14:51:17] no I need to have the git plugin enabled again [14:51:25] !log pushing search prefix pool live in eqiad [14:51:26] Logged the message, notpeter [14:51:52] <^demon> hashar: Minor bug, "My Views" is broken with stacktrace. [14:52:08] major bug: all jobs are broken cause they do not have git [14:52:10] :D [14:52:24] <^demon> Yeah :p [14:52:51] restarting [14:53:12] I'll stop touching it unless you tell me to avoid conflicts;) [14:55:34] <^demon> hashar: https://issues.jenkins-ci.org/browse/JENKINS-12966 [14:57:21] Prio: Blocker Status: Open .. ouch :/ [14:57:29] :D [14:57:34] <^demon> Try disabling the "Static Analysis Collector Plug-in" [14:57:43] <^demon> Somebody says they conflict and that's a workaround. [14:58:26] trying [15:00:40] I have analysis-collector.hpi [15:01:02] enabling greenballs.hpi [15:01:44] enabling git.hpi [15:03:04] https://integration.mediawiki.org/ci/job/MediaWiki-GIT-Fetching/397/ \O/ [15:03:23] mutante: ^demon: disabling the analysis-collector plugin made it [15:04:09] nice, and that one isnt that important to have? [15:05:19] I am not sure what it does [15:05:54] plugin is https://wiki.jenkins-ci.org/display/JENKINS/Analysis+Collector+Plugin [15:06:27] New patchset: RobH; "cosmetic label change" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4331 [15:06:28] and I don't think we need it [15:06:35] if unsure what it does, disabling it wasnt that bad after all [15:06:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4331 [15:07:01] it is supposed to show nice graphs and charts but we probably have no use for them right now [15:07:06] yup [15:07:42] so as a postmortem : plugin are .hpi files in /var/lib/jenkins/plugins/ , they can be disabled by touching a file such as: extension.hpi.disabled [15:07:53] jenkins is restarted from CLI with /etc/init.d/jenkins restart [15:08:02] logs are /var/log/jenkins/jenkins.log [15:08:15] will dump that on wikitech-l [15:08:27] cool [15:08:39] mutante: so far, it seems good for me. We could probably close the RT ticket [15:08:42] well, maybe just wikitech, "Jenkins" [15:08:43] or keep it on hold if you prefer [15:09:09] fine with me, you decide. but i think technically it is resolved 'cause the new version is installed [15:09:17] New review: RobH; "comment change, no big deal" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4331 [15:09:20] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4331 [15:09:25] mutante: I agree. So just close the RT [15:09:29] ok [15:09:30] mutante: https://rt.wikimedia.org/Ticket/Display.html?id=2041 [15:09:45] yep, done [15:13:11] I have marked the bugzilla one fixed ( https://bugzilla.wikimedia.org/show_bug.cgi?id=31877 [15:13:13] \o/ [15:14:11] :) [15:14:13] !log puppet daemon being halted on brewster, i need to make local test changes to dhcp [15:14:15] Logged the message, Master [15:14:51] mutante: so I think we are fine for now :-] [15:14:58] I am taking a coffee break [15:15:06] will then create/fill/update http://wikitech.wikimedia.org/view/Jenkins [15:15:36] ^demon: ^^^ short story: jenkins upgraded :D [15:15:45] perfect, also taking break then [15:38:46] hmm too late, need to do cooking [15:38:50] will be back later tonight [15:38:56] mutante: thanks for the upgrade of jenkins !!!! [15:38:58] \O/ [15:40:04] yw:) [15:40:22] !log pointing search pool4 to eqiad (this is the "smaller languages" shard) [15:40:22] Who can fix the SSL cert for gerrit? https://rt.wikimedia.org/Ticket/Display.html?id=2777 [15:40:24] Logged the message, notpeter [15:42:37] apergos: can you search for something in greek? [15:42:46] tell me what project [15:42:48] and yes I can [15:42:57] can it wait a few minutes? I'm on a call with ct, a 1:1 [15:43:05] after that I can pound on it as you like [15:43:10] wikipedia [15:43:16] sure, just 1 would be fine, tbh [15:47:01] notpeter: who do I talk to about ssl? [15:48:38] * hexmode looks fruitlessly for Ryan_Lane [15:50:17] hexmode: uh...not sure. I'm in the middle of pushing something live, atm [15:50:29] so I don't have much mental resources, atm [15:50:36] notpeter: thanks anyway :) [15:54:39] woosters: when you're off the phone: https://rt.wikimedia.org/Ticket/Display.html?id=2777 [16:01:51] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:03:48] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [16:06:17] ......now that i have an installer screen on the cisco, i am too happy to express it. [16:06:31] ok, now to push it all into serial instead [16:07:09] grantede, it has mirror error, but thats cuz i have not updated netboot.cfg to account for the new subnet. [16:08:31] New patchset: Dzahn; "add SSLCACertificate file to gerrit apache site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4334 [16:08:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4334 [16:08:56] \o/ [16:09:02] hexmode: i think that should be it ^ [16:09:10] new errors make me happy [16:09:15] i was so tired of the old errors. [16:09:21] this is quote material [16:09:32] new errors = progress ;] [16:09:33] mutante: tyvm [16:09:50] hexmode: damn, how is that for service eh? [16:10:01] a+! [16:12:02] damn cisco error was a mix of new hardware pain, config load issues, unexpected daemon behavior, and replication lag of tftpd [16:12:04] and not done yet. [16:12:49] ryan lane better give me mad props to the lab users for this ;] [16:12:59] and mark as well [16:14:30] (also should get props from ryan) [16:14:33] heh [16:17:39] SUCCESS [16:17:43] serial installer working on ciscos. [16:18:00] there goes more days of my life on this than i care to recall. [16:18:28] ok, going to go get some on site work done. atleast to get the minimum out of shipping [16:18:39] i may not stay on site late today, since i have some installer puppetization to do. [16:33:33] notpeter: going to check earch on el pedia now [16:34:37] the initial results look reasonable [16:34:48] is there something in particular you want me to check for? [16:35:24] notpeter: so yea, i have all the SSDs for search upgrades here [16:35:35] I will get them prepared sometime before or on monday [16:35:45] I assume we will need to do rolling upgrades, rather than all at once? [16:35:54] New patchset: Jgreen; "adding basic manifest for ocg/mwlib hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4336 [16:36:07] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/4336 [16:37:28] Jeff_Green: the new payments logging host just arrived, though I cannot rack it up until I finish row C deployment, which includes the payments rack [16:37:30] just fyi [16:37:46] ok--thanks for the update [16:39:04] New patchset: Jgreen; "adding basic manifest for ocg/mwlib hosts typo fix" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4336 [16:39:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4336 [16:39:54] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4336 [16:39:57] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4336 [16:43:27] apergos: just that it wasn't all borken [16:43:31] thanks! [16:43:33] sure [16:44:01] RobH: no no, at the end of today I'm going to cut search back to pmtpa [16:44:04] this is just a test [16:44:14] on monday you can destroy anything you want except for the indexer [16:44:16] ahh, cool [16:44:19] awesome. [16:44:32] i am going to drop a ticket and make you a requestor so you know when its done [16:48:06] bah [16:48:15] reprepro can't handle multiple package versions in the same distro [16:49:24] !log brewster puppet running again, cisco installs wont work again until i finish puppetizing the files later today [16:49:26] Logged the message, RobH [16:49:59] RobH: sounds good! thanks! [16:52:18] notpeter: awww we're not going to move all search to eqiad forever and ever ? [16:55:09] LeslieCarr: we are! [16:55:17] starting... tuesday. [16:55:32] oh yay [16:56:25] once the new nodes have SSDs [17:04:57] ah [17:09:00] ok, headed homeward. [17:09:06] going to stop at store for foodz in route [17:09:15] back online shortly to finish ciscos [17:11:52] New patchset: Jgreen; "ocg1 wouldn't compile, uninformative puppet barf, flailing ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4338 [17:12:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4338 [17:12:29] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4338 [17:12:31] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4338 [17:23:56] New patchset: Jgreen; "looks like systemuser may be broken in generic-definitions.pp skipping user creation step for now" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4341 [17:24:11] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4341 [17:24:11] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4341 [17:24:13] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4341 [17:26:55] Jeff_Green: what's wrong with systemuser? I was the last person to touch it.... [17:27:10] oh I was just going to start looking at git history [17:27:15] err: Could not run Puppet configuration client: Invalid parameter system at /var/lib/git/operations/puppet/manifests/generic-definitions.pp:50 [17:27:25] i'm not even calling it anymore [17:27:45] at least not directly, maybe it's used in one of the base classes? [17:28:54] hhhmmm, what were you using it on? [17:29:20] i'm trying to cherry pick some manifests for ocg boxes from labs [17:29:43] I had a call to use it to create one user, which I just removed and apparently somethign else is still calling it [17:29:57] huh [17:30:06] this confusels me [17:31:04] maybe it's called by standard or admin::roots? I don't know. I'm only calling two other things and they no longer try to create users [17:35:02] can you show me the line that you had in there? [17:35:05] that was failing? [17:35:25] it's still failing with no line [17:36:03] my attempt to use it was: systemuser { "pp": name => "pp", home => "/opt/pp", shell => "/bin/bash" } [17:36:34] which I now see will no longer work anyway because we clob the shell [17:36:43] err clobber [17:37:00] but ocg1's catalog still fails to compile with that removed [17:37:04] so I've got exactly: [17:37:44] include: standard, admin::roots, (class that installs packages), (class that now does nothing at all) [17:37:45] perhaps labs uses a different version of puppet which doesn't support the system param yet [17:38:06] mark: I see other instances with the same syntax though [17:38:19] in production I mean [17:38:29] but you're running this in labs, right? [17:38:46] oh you're cherrypicking FROM labs? [17:38:54] yeah [17:39:24] so for example: [17:39:30] production repo [17:39:49] misc/install-server.pp: systemuser { mirror: name => "mirror", home => "/var/lib/mirror" } [17:40:03] it is possible that I messed this up when I was changing the systemuser definition [17:40:12] when did you change it? [17:40:18] a week ago? [17:40:20] less? [17:40:26] let's say a week ago [17:40:33] ic [17:42:22] wait. I never ate lunch. brb [17:43:11] the labs version does not have the "system => true" setting [17:44:00] which is probably wise because iirc we have a conflict between the stock system UID range and LDAP assigned UIDs [17:44:22] that puppet feature was added in 2.6.7 [17:44:26] and in production we run 2.7.6 [17:44:28] so that shouldn't be it [17:45:07] ah [17:46:28] why can't I login to ocg1? [17:46:37] because puppet has never successfully run there [17:47:04] Ironically, I'm trying to image it for the first time so I can use it to test puppet manifests :-) [17:47:21] Installed: 2.6.1-0ubuntu1~ppa1~lucid1 [17:47:23] i'm logged in through sockpuppet [17:47:31] for some reason it installed with an old version of puppet [17:47:34] just upgrade puppet and it'll be fine [17:47:48] strange. trying [17:48:26] ohhh, this could be a problem with our build process [17:48:41] does the first puppet run happen before or after the first dist-upgrade? [17:49:00] the installer normally installs the latest and greatest package version [17:49:07] so no need for a dist-upgrade after an install [17:49:14] unless it was installed a long time ago and sat there of course [17:49:27] which is exactly what happened [17:49:31] that explains [17:49:45] yeah, a dist-upgrade would be a good idea then [17:49:51] yeah done [17:49:59] and puppet's working now, whee! [17:51:46] hurray for not my fault! :) [17:52:17] Yay . . . but my brain still hurts [17:55:28] New patchset: Jgreen; "oh it was puppet that was broken (old version), not manifests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4343 [17:55:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4343 [17:55:44] New review: Jgreen; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4343 [17:55:46] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4343 [17:56:17] PROBLEM - Host ocg1 is DOWN: PING CRITICAL - Packet loss = 100% [17:58:23] RECOVERY - Host ocg1 is UP: PING OK - Packet loss = 0%, RTA = 1.15 ms [18:09:57] does Ops deal with OTRS? [18:10:18] config changes like https://bugzilla.wikimedia.org/process_bug.cgi [18:10:22] oops [18:10:43] like this: https://bugzilla.wikimedia.org/35715 [18:16:51] hexmode: i guess yes... but really no one does it seems. (the last changes I remember are 23631 and the move from db9 to db49. but maybe i missed/forgot something) [18:17:02] !b 23631 [18:17:02] https://bugzilla.wikimedia.org/23631 [18:17:28] jeremyb: yeah, I think Tim has been the guy to do it in the past [18:19:24] ^demon: since you've been around longer than me: am I right about OTRS? [18:19:30] !log updating OpenStackManager to r114744 on virt0 [18:19:32] Logged the message, Master [18:19:51] hexmode: personally i'd want some more research about the security issues. or have a content-type whitelist for inline and do attachment for the rest. or use a different domain name for downloads. a la googleusercontent.com [18:20:13] <^demon> hexmode: I know zilch. [18:20:24] sure... but I'm just trying to figure out who does what :) [18:20:32] <^demon> I agree with jeremyb. It's the exact reason we have bugzilla-attachment. [18:20:39] <^demon> Because attachments are dangerous. [18:20:43] ^demon: ty anyway [18:20:58] it isn't just that ticket though [18:21:11] there are several that have been copied over in a row [18:21:36] so it'd be good to have someone with knowlege ;) [18:22:31] hexmode: I think there is some decent prospect of OTRS work happening in the few months but idk what exactly. philippe was supposed to meet with the OTRS guy (over beer) but idk if that happened [18:23:00] jeremyb: oh, yeah, thanks for the reminder :) [18:23:32] idk what's happening with Jeff_Green either. are we blocked on getting a physical (not virtualized) DB? or can we just try to use the extra storage we have now? [18:23:55] petan seemed to think the new gluster stuff was giving him problems with mysqld [18:24:58] New patchset: Demon; "Temporary hack to disable extension list cron for the next 24 hours so I can unbreak Translatewiki." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4347 [18:25:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4347 [18:25:45] jeremyb: I'm just waiting for clearance from somebody (philippe and/or legal) to expose our customer data to OTRS Martin [18:25:56] err donor I guess is a better word, not customer [18:26:25] * hexmode rubs his hands together like an evil maniac [18:26:47] we've got LOTS of customers! [18:26:51] well that and being assigned four completely conflicting projects :-P [18:27:09] hexmode: ya, but customer != donor [18:27:13] New patchset: Demon; "Temporary hack to disable extension list cron for the next 24 hours so I can unbreak Translatewiki." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4347 [18:27:16] overlap for sure :-P [18:27:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4347 [18:27:37] oh wait, see now I'm confusing my conflicting projects [18:27:41] we should charge the customers more [18:27:50] "user" [18:28:01] participant!? [18:28:08] I FIGHT FOR THE USER [18:28:22] editor [18:28:27] I pit users against customers in fights [18:28:43] even if they've never editted, they're just a click away [18:29:01] well, silliness aside, the last I heard on all of this is what I noted in RT [18:29:09] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4347 [18:29:11] * jeremyb has no RT :( [18:29:12] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4347 [18:29:30] Jeff_Green: you're thinking about sugar maybe? [18:30:05] jeremyb: not personally, that's beyond my scope [18:30:24] just cause that's where the donors live [18:31:06] oh! another 404 fix up request! [18:31:37] jeremyb: from RT: Wed Mar 14 17:10:46 2012 waiting to hear back from Legal and/or Philippe regarding approving Martin's NDA [18:31:40] Jeff_Green: anyway, what's the plan now? just dump it in his lap and let him fix it or will we work with him or ? [18:31:54] the latter [18:32:46] the general idea is that he will port our patches into the newer version, we work together on testing the db schema changes, we come up with an upgrade plan, and we work together on following that through [18:32:47] Jeff_Green: and then anyone else who wants to help also needs an NDA? [18:33:17] (damn lawyers) [18:33:19] jeremyb: only if their help means they need access to the database [18:33:58] we've talked about setting up a test environment in labs, and actually I started on that [18:34:14] !log updating OpenStackManager to r114746 on virt0 [18:34:16] Logged the message, Master [18:34:58] so there NDA-free folks can work on features and whatnot using fake data [18:35:15] ok, great. do we have fake data prepared yet? [18:35:27] no, nor do we have any version of OTRS installed there [18:35:36] and is it so far all manual setup? ohh [18:35:46] there's very little done there [18:36:01] so, is someone in particular planning to prepare a fake dataset? [18:36:09] or even a dump of junk mail? [18:36:21] nope [18:36:39] or would it help to get someone else NDA'd to get the ball rolling on fake data? [18:36:51] I don't think the NDA would even be necessary [18:37:11] i mean, we're just talking about fake data in a standard OTRS schema [18:37:27] there's nothing requiring an NDA in our install of OTRS or patches [18:37:36] that's already in the public [18:38:11] and the only db we have now is in the old schema [18:38:33] hehe, patches with NDA [18:39:23] Jeff_Green: so, then a mysqldump -d dump of prod could be published maybe? [18:40:08] where I forget what -d is, but assume it is schema-only, yeah I think it could [18:40:19] but it would also be no different than the stock schema [18:40:43] I don't think we modified the schema in any way [18:41:47] oh, ok, didn't know it was exactly stock [18:41:55] lemme find you the patches [18:42:05] i think i found those before... [18:42:08] ah ok [18:42:34] yeah, as far as I know those are the only changes from a stock install and usage [18:42:51] well, that and some mail filtering with amavisd or similar [18:42:51] oh, right quilt [18:42:55] http://svn.wikimedia.org/viewvc/mediawiki/trunk/otrs/ [18:43:04] oh yes that's it [18:43:18] I think they're mostly security-related patches [18:43:53] you mean like 60-really-secure-mode.patch? [18:44:19] ha, well that one is for the easter-egg of javascript hopping bunnies [18:46:59] New patchset: Mark Bergsma; "Install (or keep) varnish 3.0.2-2wm2 on the eqiad upload caches" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4351 [18:47:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4351 [18:47:44] Jeff_Green: do we think he'll be using svn or gerrit or what? i hope not CVS still [18:47:51] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4351 [18:47:53] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4351 [18:48:45] i wonder if that was subconscious gerrit vs. git ;P [18:50:03] jeremyb: I don't know [18:58:30] some of these files are old enough that they have an old address for the FSF that I never saw before [18:59:08] heh [18:59:13] our install is pretty old [19:00:22] * jeremyb detects synchronized lunching [19:14:13] New patchset: Mark Bergsma; "We were hitting a Puppet bug. Let's see if it works without the relationship." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4356 [19:14:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4356 [19:14:46] New patchset: Mark Bergsma; "We were hitting a Puppet bug #7422. Let's see if it works without the relationship." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4356 [19:15:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4356 [19:15:04] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4356 [19:15:07] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4356 [19:26:17] New patchset: Demon; "L10n-bot auto-approvals: capitalization matters" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4357 [19:26:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4357 [19:30:34] PROBLEM - DPKG on cp1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:30:34] PROBLEM - DPKG on cp1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:30:43] PROBLEM - DPKG on cp1008 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:30:52] PROBLEM - DPKG on cp1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:31:10] PROBLEM - DPKG on cp1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:31:28] PROBLEM - DPKG on cp1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:31:42] notpeter: The search boxen use lucid now right? [19:31:46] PROBLEM - DPKG on cp1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:32:00] I am cleaning up and puppetizing dhcp stuff, and the old search boxes have specified karmic, but i assume its outdated [19:32:04] PROBLEM - DPKG on cp1007 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:32:11] those are karmic [19:32:16] but you can remove that anyway [19:32:22] right, but if they are reinstalled, they would go to lucid i would think [19:32:24] only support hardy and lucid now [19:32:27] yep [19:32:40] RECOVERY - DPKG on cp1001 is OK: All packages OK [19:32:40] RECOVERY - DPKG on cp1002 is OK: All packages OK [19:32:45] cool, also reordering them to be in alphabetical [19:32:49] RECOVERY - DPKG on cp1008 is OK: All packages OK [19:32:50] files are a mess =P [19:32:58] RECOVERY - DPKG on cp1003 is OK: All packages OK [19:33:16] RECOVERY - DPKG on cp1004 is OK: All packages OK [19:33:21] yep! [19:33:26] and precise! [19:33:34] RECOVERY - DPKG on cp1005 is OK: All packages OK [19:33:52] RECOVERY - DPKG on cp1006 is OK: All packages OK [19:34:00] wha? [19:34:06] we are running precise on stuff now? [19:34:08] =P [19:34:10] RECOVERY - DPKG on cp1007 is OK: All packages OK [19:35:06] i suppose its lts [19:36:27] not yet [19:36:30] next month [19:36:35] and it's a 5 year lts! [19:36:50] I cna't wait to install it on my laptop :) [19:38:15] what what ? [19:38:22] is there a new Mac OS X version out ? [19:38:31] yep [19:38:35] it's called linux :) [19:38:49] New patchset: Mark Bergsma; "Add a director_type parameter to varnish::instance (default: hash), use it for eqiad upload" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4358 [19:38:51] =) [19:39:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4358 [19:39:56] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4358 [19:39:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4358 [19:45:14] New review: Mark Bergsma; "(no comment)" [operations/debs/varnish] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4162 [19:45:24] New review: Mark Bergsma; "(no comment)" [operations/debs/varnish] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/4162 [19:45:27] Change merged: Mark Bergsma; [operations/debs/varnish] (master) - https://gerrit.wikimedia.org/r/4162 [19:48:30] urgh [19:48:35] so i have msfe and ms-fe [19:48:37] hate. [19:48:38] hate hate hate [19:56:20] wil have to ask ben about reinstall or at minimum hostname change [19:56:25] which is better to reinstall though. [20:21:46] New patchset: Hashar; "get ride of old Testswarm fetching script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4364 [20:22:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4364 [20:25:01] New review: Krinkle; "(no comment)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/4364 [20:25:24] hrmm [20:26:04] if i have a file using notify to the service [20:26:04] then there is no reason to also have the service subscribe the file right? [20:26:04] hey notpeter you did a bunch of puppet stuff recently, see above ? ^ [20:26:22] though i guess they arent bi-directional [20:26:35] =/ [20:27:19] or LeslieCarr since ya did bunch of nagios stuff [20:27:50] I see conflicting standards within our own puppet repo =P [20:27:51] they should be equivalent [20:27:59] well, it's a style thing [20:28:14] so i could just be OCD paranoid and use it in both places [20:28:19] and it shouldnt cause issues right? [20:28:26] uh, I wouldn't [20:28:39] what's the case? [20:28:52] redoing dhcp, i have it notify on the linux host files [20:29:08] I'd subscribe the service to all of them [20:29:09] but the dhcp3-server service also has a subscribe to some of the files (dhcpd.conf) [20:29:13] as that why they're all in the same place [20:29:16] RobH: hey [20:29:23] LeslieCarr: no worries, notpeter was about [20:29:29] just getting puppet advise [20:29:46] hrmm, well, i guess since mark wrote this to start he had the subscribe in the service entry [20:29:53] so i suppose i will match that for these files [20:30:18] you can also subscribe the service to an array of files, I beleive [20:30:28] im doin this [20:30:30] subscribe => File["/etc/dhcp3/dhcpd.conf", [20:30:30] "/etc/dhcp3/linux-host-entries.ttyS0-9600", [20:30:31] "/etc/dhcp3/linux-host-entries.ttyS0-115200", [20:30:31] "/etc/dhcp3/linux-host-entries.ttyS1-9600", [20:30:31] "/etc/dhcp3/linux-host-entries.ttyS1-57600", [20:30:31] "/etc/dhcp3/linux-host-entries.ttyS1-115200" ], [20:31:01] I believe that will work [20:31:10] and do what you want it to! [20:31:14] yea, just wasnt sure which was better or if either was best. [20:31:25] is style [20:31:30] i guess i just match the standard set inwhatever service entry is already in place. [20:31:38] I like having all those things in one place, make it easier to see [20:31:50] yeah, or you can assign a variable that list of files [20:31:51] yea, i can see that [20:31:56] yeah, I mean, they really are both fine [20:32:02] the all in one place [20:32:02] yep, that too [20:32:08] i dont see use of a variable for this yet [20:32:16] but that sounds like it would make it complex for no reason in this case ;P [20:32:35] though I suppose i can understand why it would be preferred in other cases [20:36:46] New patchset: Reedy; "Bug 23004 - noc.wikimedia.org should support HTTPS" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4367 [20:37:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4367 [20:40:28] PROBLEM - Puppet freshness on db59 is CRITICAL: Puppet has not run in the last 10 hours [20:50:00] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [20:50:00] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [20:50:54] PROBLEM - Host ms-be2 is DOWN: PING CRITICAL - Packet loss = 100% [20:51:39] RECOVERY - Host ms-be2 is UP: PING OK - Packet loss = 0%, RTA = 0.72 ms [20:52:24] RECOVERY - MySQL disk space on db59 is OK: DISK OK [20:57:11] hrmm.. [20:57:12] PROBLEM - Host db59 is DOWN: PING CRITICAL - Packet loss = 100% [20:57:32] i wonder what initially populates the files in the pxelinux.cfg directory [20:57:41] they seem to be present for each release on brewster [20:57:53] anyone add a release to brewster or is this usually mark territory? [20:59:47] i would jsut add a newer release to see the intended behavior, but brewster is not exactly spacious. [21:00:00] notpeter: i need percise to exist now. [21:00:50] RobH: me too! my laptop could really use the upgrade [21:00:53] why, though? [21:02:07] i wanna add a distro [21:02:21] and see if the tftpboot pxelinux.cfg directory contents are pulled down from it [21:02:27] or if they are a manual process of creation [21:02:34] if they are the latter, than my adding to puppet is no big deal [21:02:42] if they are the former, it could cause issues [21:02:58] i somehow doubt they are, but since its my assumption, i dislike it. [21:03:30] hrm, gotcha [21:03:32] sooon sooooooooon [21:03:54] doesnt mark realize i need him to lurk in here 24/7 without rest in case i have questions!? [21:05:09] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [21:05:09] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [21:06:03] lulz [21:09:39] RECOVERY - Host db59 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [21:12:57] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [21:13:15] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [21:13:22] ok,I'm going to conclude search testing. [21:15:39] PROBLEM - MySQL disk space on db59 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:23:18] PROBLEM - Apache HTTP on mw64 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:23:36] PROBLEM - Apache HTTP on mw62 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:25:33] PROBLEM - Apache HTTP on mw68 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:25:33] RECOVERY - Apache HTTP on mw62 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.397 second response time [21:27:30] PROBLEM - Apache HTTP on mw65 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:27:39] PROBLEM - Apache HTTP on mw66 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:28:51] PROBLEM - Apache HTTP on mw67 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:29:27] RECOVERY - Apache HTTP on mw65 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.442 second response time [21:29:36] RECOVERY - Apache HTTP on mw64 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.498 second response time [21:29:36] RECOVERY - Apache HTTP on mw68 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.210 second response time [21:30:48] RECOVERY - Apache HTTP on mw67 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.316 second response time [21:31:42] RECOVERY - Apache HTTP on mw66 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.030 second response time [21:32:00] PROBLEM - Apache HTTP on mw69 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:32:18] !log restarting lsearchd on search18 [21:32:19] Logged the message, notpeter [21:32:45] PROBLEM - Lucene on search18 is CRITICAL: Connection refused [21:33:57] RECOVERY - Apache HTTP on mw69 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.032 second response time [21:34:42] RECOVERY - Lucene on search18 is OK: TCP OK - 0.001 second response time on port 8123 [21:39:18] !log started enwiki.revision sha1 migration on db12 [21:39:20] Logged the message, Master [21:39:39] RECOVERY - SSH on db59 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [21:42:17] yay , go sha1 [21:45:12] PROBLEM - MySQL Slave Delay on db12 is CRITICAL: CRIT replication delay 366 seconds [21:45:48] PROBLEM - MySQL Replication Heartbeat on db12 is CRITICAL: CRIT replication delay 402 seconds [21:46:03] !log halting db15 for it to await decom [21:46:05] Logged the message, notpeter [21:54:33] New patchset: Pyoungmeister; "decom for db15" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4373 [21:54:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4373 [21:56:03] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/4373 [21:56:05] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4373 [21:57:21] New patchset: RobH; "added in further support for the dhcp server configurations" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4374 [21:57:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4374 [22:10:24] PROBLEM - NTP on db59 is CRITICAL: NTP CRITICAL: No response from NTP server [22:14:54] New review: RobH; "Do not review, I have already spotted some problems to correct." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/4374 [22:21:19] RECOVERY - MySQL disk space on db59 is OK: DISK OK [22:21:28] RECOVERY - Puppet freshness on db59 is OK: puppet ran at Thu Apr 5 22:21:01 UTC 2012 [22:21:37] New patchset: RobH; "had to correct tabs and spacing added in further support for the dhcp server configurations" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4374 [22:21:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4374 [22:22:04] RECOVERY - RAID on db59 is OK: OK: State is Optimal, checked 2 logical device(s) [22:22:49] RECOVERY - DPKG on db59 is OK: All packages OK [22:22:58] PROBLEM - Puppet freshness on search1022 is CRITICAL: Puppet has not run in the last 10 hours [22:23:07] RECOVERY - Disk space on db59 is OK: DISK OK [22:36:28] RECOVERY - NTP on db59 is OK: NTP OK: Offset 0.04662036896 secs [22:38:17] PROBLEM - Puppet freshness on search1021 is CRITICAL: Puppet has not run in the last 10 hours [22:38:33] New patchset: RobH; "had to correct even more tabs and spacing added in further support for the dhcp server configurations" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4374 [22:38:48] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4374 [22:41:25] New patchset: RobH; "endless corrections of tabulation and spacing, updating dhcp related files for install server Change-Id: I0a41915af9982819d19766df1747daff6f9f82bd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4374 [22:41:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4374 [22:42:19] Ok, now my changes are actually decent [22:42:34] Anyone with a vague understanding of our dhcp/pxe services wanna take a look at it? [22:42:50] i take that back, dhcp only, i didnt change pxe stuff yet. [22:44:31] too sleepy [22:44:43] not you apergos its insanely late there! [22:44:46] someone else will jump on it I'm sure [22:44:54] though if no one else does it by tomorrow, help yourself ;] [22:44:55] uh huh almost 2am [22:45:00] I am trying to be good and not self review. [22:45:09] well, not self submit [22:50:40] RobH: if gerrit offers a way to comment on a specific patch set on push without making it part of the commit msg that's great. but it looks like you made it part of the commit msg (on that last one 4374). so, --oneline (as has been suggested for generating release notes) will now give a msg for 4374 that has nothing to do with the change that will get merged on submit. it's just about the change between patchsets [22:51:25] idk how much we care to enforce for puppet but some people may use oneline and I think that wouldn't be accepted for mediawiki core [22:51:35] i was afraid that it would be wiping my old commit message entirely [22:51:49] hence i updated it to be all inclusive of changes in the text, but yea, its not desired. [22:52:10] i don't quite follow what you were afraid of [22:52:20] it didn't just take exactly what you typed? [22:52:39] i figured it would replace my entire commit message when i did the checkin [22:52:51] i thought you were saying thats not desireable? [22:52:57] desirable even [22:53:38] i'm not sure what you did or what you thought would happen [22:53:47] * jeremyb pulls a local copy to inspect in git [22:53:48] im not sure what you are saying i should do =[ [22:53:51] heh [22:54:09] you should probably have an unchanged commit msg because you didn't really change anything [22:54:22] and then just make a self review of +0 that explains what you did [22:54:30] but that's a guess [22:54:53] ahhh [22:54:55] whoa, there's twice as many patch sets as before [22:55:07] i kept recommitting and forgetting various files =P [22:55:11] should be 4 sets [22:55:18] yes [22:55:30] LeslieCarr can probably review for you ;) [22:55:37] what ? [22:55:44] i obviously missed something :) [22:56:55] nah, i just asked someone who understands our dhcp setup to review my patch [22:57:02] im trying to be good and not submit them myself. [22:58:04] is there a spacing linter? ;) [22:58:49] it would have saved me 3 patch sets. [22:59:00] as my dumbass kept forgetting to check one file or another by mistake [23:00:02] luckily gerrit blasts those in red to point out how stupid spacing is [23:06:46] RobH: link me the change ? [23:06:51] or is it too late :) [23:06:57] its not [23:07:11] its 7pm here, if no one had time today i was just gonna pester mark_ tomorrow ;] [23:07:12] https://gerrit.wikimedia.org/r/#change,4374,patchset=4 [23:07:29] either way i wouldnt push it live this evening, cuz i wanna do a test install to ensure it doesnt bork things. [23:07:53] but if you wanted to review, and comment if its ok, i can submit it tomorrow =] [23:20:15] ok [23:36:18] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/4374 [23:36:28] RobH: published comments [23:38:23] ahh, yea mark mentoined that and i had not used it [23:38:33] i will have to read the puppet guide on use and understand it, thanks! [23:38:39] (tomorrow!) [23:48:29] no prob :) [23:50:16] PROBLEM - Puppet freshness on sq34 is CRITICAL: Puppet has not run in the last 10 hours [23:54:52] New patchset: Lcarr; "changing spence specific nagios checks to use variable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4380 [23:55:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/4380