[00:02:33] (03PS4) 10Krinkle: Fix duplicate declaration between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 [00:05:13] (03PS5) 10Krinkle: Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 [00:06:10] OK, I'm looking at it now [00:06:38] Krinkle: what is the full set of packages that conflict? [00:06:58] I don't know, I'm finding out as I check out the patch on the local debug master and try to provision it [00:07:02] is there a way to test for this? [00:07:16] there's the puppet catalog compiler, but it's not going to be faster [00:07:22] k [00:07:48] Krinkle: can you give me a bit of context? [00:08:04] when did things break? which change(s)? [00:08:29] (03PS6) 10Krinkle: Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 [00:08:45] ori: The mediawiki refactor in May I think [00:08:50] labs instances have not been updating since then [00:09:28] today is the first time I'm logging in to the integration cluster since then to perform maintenance, bug fixes and other things that are semi-broken. As well as a more immediate breakage due to upstream dependencies in different projects dropping support for 0.8 [00:09:42] that's the main event that triggered this today and led me to create a new instance with Trusty [00:09:53] at which point I couldn't apply even the current configuration without any updates because it was alreayd broken [00:10:14] kk. So, before we continue, let me just say, I'm sure we can figure this out and come to a solution that we both like [00:10:43] So I'm just expressing some good faith because we got a bit heated a moment ago :) [00:10:47] Well, I have -200% preference on a solution. I care nothing about puppet or how things are done for I know nothing about puppet. [00:11:00] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [00:11:21] so your advice and personal preference hereby has my full support [00:11:42] awesome. so, what is the the local debug master? [00:11:49] can I ssh to it and have a look too? [00:12:52] (03PS7) 10Krinkle: Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 [00:12:59] I want to understand what I did wrong earlier today... I updated /a/common/php-1.24wmf13 on tin and sync-dir'ed... but the version I pushed is not live on meta. [00:13:01] ori: integration-slave1004 is the broken one [00:13:09] ori: integration-puppetmaster is the master [00:13:16] [00:08 UTC] root at integration-puppetmaster.eqiad.wmflabs in /var/lib/git/operations/puppet (rebase-w-upstream) [00:13:36] The SAL update messages were also wrong, maybe that is related?? [00:13:43] I'm continously applying patches there, and then rerunning sudo puppet agent -t on the instance [00:13:44] ... or known bug? [00:14:00] Krinkle: can I take over for a few mins? [00:14:14] ori: Let me finish this one run, then it'll be all yours. [00:14:15] I won't make any sticky change without consulting you [00:14:17] thanks [00:14:44] awight: how were the SAL messages wrong? (what did you see, and what were you expecting?) [00:15:07] ori: for example, 20:57 logmsgbot: awight updated /a/common/php-1.24wmf13 to Id3462554b: Made --maxtime a soft limit again [00:15:26] however, I had actually updated *from* that commit, to ade90e0e22492d87e6069db3a359b22ef56401a6 [00:15:32] (03PS8) 10Krinkle: Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 [00:15:53] (03PS1) 10Aaron Schulz: Add 1 more runner per job runner server [operations/puppet] - 10https://gerrit.wikimedia.org/r/147691 [00:16:03] awight: since you're a deployer, you can SSH to a random app server and check the state of the code [00:16:09] awight: to see if it got deployed or not [00:16:20] ok, will do. but any idea about the SAL backwardness? [00:16:27] (03PS9) 10Krinkle: Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 [00:16:54] ori: done [00:16:55] awight: i don't see ade90e0e22492d87e6069db3a359b22ef56401a6 [00:17:02] Krinkle: k, diving in.. [00:17:35] (03PS10) 10Krinkle: Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 [00:19:11] ori: Aren't we supposed to be merging when updating deployment branches on tin? That commit will only exist in the deployment dir. [00:19:19] ori: I am well-aware that the patch is currently massively ugly, and that integration-slave1004 still doesn't provision. [00:19:26] It now errors on: Duplicate declaration: Package[djvulibre-bin] [00:19:30] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [00:19:32] (03CR) 10Andrew Bogott: Fix duplicate declarations between mediawiki and contint (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [00:20:32] (03CR) 10Krinkle: Fix duplicate declarations between mediawiki and contint (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [00:20:57] (03PS11) 10Krinkle: Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 [00:21:31] ori: okay the wrong version is deployed. /me grimaces [00:21:43] (03CR) 10Krinkle: "Applying this revealed another fatal error during provisioning: Duplicate declaration: Package[djvulibre-bin]" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [00:22:09] (03CR) 10Krinkle: [C: 04-1] Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [00:22:29] ori: so I think the git merge confused the logging daemon. I guess I was supposed to use rebase? [00:24:02] greg-g: ^^ is merge appropriate for deploying new changes to the deployment branches, or do I have to use rebase? [00:24:26] I merged earlier today, and it seems to have confused the SAL daemon, and possibly caused the wrong stuff to be deployed :-/ [00:25:02] I ran cherry -v HEAD readonly/wmf/1.24-wmf13, then git merge readonly/wmf/1.24-wmf13 [00:25:48] awight: It this about mediawiki-core or an extension? I that cause submit the cherry-pick to the relevant wmf brach in gerrit and deploy that, on tin you just pull the latest version of that branch once merged in gerrit. [00:26:01] Is this*, In that case* [00:26:52] Krinkle: a submodule update. Argh. I did not sync the extension dir, that's the issue. ARGH. [00:27:17] Krinkle: but, I thought you couldn't just checkout the deployment branch cos we need the security patches. [00:27:21] hence rebase or merge. [00:27:30] rebase in that case, yes. [00:27:46] So that the relevant upstream history is in tact, and the hot patches always on top [00:27:51] that should be the default [00:27:56] when doing git pull on tin. [00:28:00] (it does rebase by default) [00:28:03] greg-g: I'm gonna finish my deployment from earlier today, I just realized I did not push the extension update, just the submodule pointer update. [00:28:22] Krinkle: and merge is verboten? [00:28:42] I think pull usually does merge by default. [00:28:46] Krinkle: can i update your patch? [00:28:58] I think it'd make the history messier and easier to miss that there is a security patch somehow buried when looking at git log -n10 [00:29:00] ori: sure [00:29:21] Krinkle: btw, i did: [00:29:22] Krinkle: ok I'll do it that way from now on [00:29:28] awight: afaik everybody does rebase, and to avoid more people from doing merge (or even reset) we made rebase the default in gitconfig even. [00:29:37] grep -Po "'[^']+'" modules/contint/manifests/packages.pp | sort | uniq > contint-packages.txt [00:29:40] so that plain git pull just works [00:29:48] grep -Po "'[^']+'" modules/mediawiki/manifests/packages.pp | sort | uniq > mw-packages.txt [00:29:50] awesome. [00:29:54] perl -ne 'print if ($seen{lc $_} .= @ARGV) =~ /10$/' mw-packages.txt contint-packages.txt [00:30:01] awight: well, git pull + git submodule update extensions/Foo [00:31:55] ori: I didn't bother doing them all at once because they seem to be grouped by some arbitrary sections ("Uninstalled packages", "Math" etc.) and some are ensure -> present, absent or latest. [00:32:13] so I figured I'd just do them a few at a time. [00:32:27] and during the first two I didn't know it was just gonna be this class. [00:32:40] It could potentially conflict with any of 100s of classes. It might still after this. [00:32:57] But nice regexes / perl magic [00:33:08] Anyone want to give me the OK to finish my earlier deployment? Or greg-g, you there? [00:33:09] awight: You sent me a contentless ping. This is a contentless pong. Please provide a bit of information about what you want and I will respond when I am around. [00:33:13] aha hi. [00:33:18] or... [00:33:44] * awight rubs eyes [00:34:33] Anyone want to stop me from deploying submodule updates to FundraisingTranslateWorkflow which I incompletely pushed earlier today? [00:34:51] Friday... [00:36:07] !log awight Synchronized php-1.24wmf12/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 3) (duration: 00m 04s) [00:36:12] Logged the message, Master [00:36:15] (03PS12) 10Ori.livneh: Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [00:36:32] !log awight Synchronized php-1.24wmf13/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 3) (duration: 00m 04s) [00:36:37] Logged the message, Master [00:36:55] !log awight Synchronized php-1.24wmf14/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 3) (duration: 00m 04s) [00:36:59] Logged the message, Master [00:37:07] ori: note ensure => latest, for php-luasandbox, and ensure absent for php-apc [00:38:22] (just observing, I didn't write that, though I do think I know why it is there) [00:38:54] ensuring extra packages from mediawiki::packages shouldn't be an issue. [00:38:59] I like the approach already [00:40:41] php-luasandbox is latest so we don't need to update them manually (ops is not going to bother updating labs instances of integration, and unless they send us an e-mail, we're not gonna know in time, I'd rather have the tests a version ahead than a version behind with production coming latest) [00:40:57] but I'm OK with dropping that (I think it was hashar's idea) [00:41:01] php-apc is hurting no one by being present [00:41:04] but luasandbox is an issue [00:41:08] php-apc is an issue however [00:41:13] why? [00:41:14] so it's quite the contrary [00:41:31] because APC is horribly broken when used on a server that has its files changing all the time. [00:42:00] Everytime we tried enabling it within a day you'll see countless failures al over the friggin place about classes missing or classes being defined in 10 different files, or files not existing, or files existing that shouldn't [00:42:18] it's cache can't keep up with git checkout between master, wmf, REL1_** etc. [00:42:30] and it's pointless to cache on CI anyway [00:43:03] and neither hashar or I had the capacity and time to investigate further, so we just scrapped it [00:43:20] Perhaps there is a way we can disable it separate from installing? [00:43:31] I guess there's a php.ini way [00:44:56] Note though, while php-apc is the first tangent we might dive into here, this entire thing being a patch in operations/puppet at all is the 3rd tangent of my 2nd tangest. There's a point at which I'll have too many closing parenthesis and exit statements before getting back to reality in this inception ;-) [00:46:53] puppet runs are failing now because libgcc1 and zlib1g are missing [00:47:01] unrelated to the duplicate def'n thing [00:47:07] if you look at that i'll resolve the apc thing [00:48:24] ori: you running puppet currently? [00:48:55] i was, on the slave [00:49:30] did you abort it maybe? it seems theres a lock file [00:51:32] I know even less of libgcc1 and zlib1g then puppet. I've disabled apc via php.ini plenty of times on the jenkins slaves. and I know the ci classes where it'd need to done. Maybe switch? [00:51:48] (03PS13) 10Ori.livneh: Fix duplicate declarations between mediawiki and contint [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [00:52:05] OK :) [00:52:35] sorry, didn't see that until now [00:52:43] * andrewbogott pays a brief visit [00:52:51] Are y'all still working on the duplicate-patches thing? [00:53:24] andrewbogott: well, as expected it's only the first in line of more conflicts and errors. [00:53:30] yep [00:53:40] It hasn't been able to successfully run since May, and every error is fatal so it's try-and-find more errors [00:53:43] you can just edit the files directly on the puppetmaster, rather than going via gerrit… if you're impatient :) [00:53:57] in my vain and hubristic assessment https://gerrit.wikimedia.org/r/#/c/147681/ is good to go, even though it doesn't fix everything [00:54:10] it does fix that particular class of problems (duplication with mediawiki::packages) [00:54:13] the other problems are not related [00:54:20] it's too many levels deep for my rmate editor port config, and if I'm using nano/vim, I might as well use gerrit and my local editor for speed. [00:55:06] wow, mediawiki::packages brings in a *lot*, it's still running. [00:55:27] yeah [00:56:19] I hope there aren't any (or too many) things that run in the background as deamons or php extensions that might change things. [00:56:21] Krinkle: these packages fail to resolve: libcidr0-dev libdclass-data libdclass-jni libgcc1 libncurses5 libstdc++6 php5-parsekit php5-wmerrors ruby-jsduck zlib1g [00:56:43] libmemcached11 ditto [00:57:04] libmemcached11 and php5-wmerrors can and should be fixed in ::mediawiki [00:57:05] ruby-jsduck is on apt.wikimedia.org and is installed on the other integration-slaves for a while now via the current manifest. [00:57:18] ori, do you mind resubmitting those reverts as separate patches? Easier to review if they're obvious stand-alone reverts [00:57:21] not resolve as it, not found by aptitude? [00:57:37] yeah [00:57:52] libdclass0 ditto [00:58:00] andrewbogott: sure [00:58:00] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 525 bytes in 0.006 second response time [00:58:08] does it run apt-get update by default? [00:58:23] yep [00:58:24] Notice: /Stage[first]/Apt::Update/Exec[/usr/bin/apt-get update]/returns: executed successfully [00:58:30] ori: where did you get those errors btw, if it's still running [00:59:02] and graphite broke [00:59:47] jenkins, i love you! You make me love my job so much. Sometimes I wish you were a real person. So that I can yell at you non stop! https://integration.wikimedia.org/zuul/ [01:01:07] please note - ZeroBanner's 1st patch has long been abandoned [01:01:28] despite sitting in jenkins queue :( [01:01:37] greg-g: I'm deploying one more submodule bump for the stupid FundraisingTranslateWorkflow submodule. [01:01:58] yurikJerusalem: Zuul has a tendency to not clean up old stuff. However as long as it's not inteferring with new events a restart is not worth it (which is the only way to clear them from the dashboard), as restarting would drop anything currently in the queue and also make it miss several minutes of new gerrit events (path sets and+2) [01:02:19] yurikJerusalem: to see if it's taking up resources, look at the sidebar on https://integration.wikimedia.org/ci/ [01:02:36] it's not there, so it's just a harmless
on a page that's hard to ignore; upstream issue otherwise. [01:02:44] Krinkle, https://gerrit.wikimedia.org/r/#/c/147627/ :) [01:02:54] yurikJerusalem: so it's not in the Jenkins queue, only in Zuul. [01:02:57] AaronSchulz: Are you deploying the --maxtime change tonight? [01:03:07] it has been about 5 hrs i think [01:03:35] i'm pretty sure jenkins shoud have merged it by now... right? :) [01:03:43] well, Jenkins is overloaded and I've been working for over 8 hours now trying to get a new instance to pool but it's taking a while. [01:03:52] because a lot of stuff is broken we didn't know about until now [01:03:54] and blocking it [01:04:09] awight: that was already done [01:04:19] Krinkle, not a problem really, no rush on it until monday anyway :) [01:04:33] AaronSchulz: ah ok thx. I thought I saw a new merge on the wmf14 branch. [01:04:36] yurikJerusalem: Unless it needs to be emergency deployed (in which case you wouldn't be submitting to gerrit), send the +2 review and pretend it's merged. If there's any subsequent patches, just make them depend on it. [01:05:06] Krinkle, sounds good, but really, nothing urgent there. btw, do we have an emergency deployment guide anywhere? [01:05:15] AaronSchulz: yep, just confirmed I was wrong. [01:06:03] * yurikJerusalem always wanted to know the magic commands to git review without git review directly from tin... [01:06:21] yurikJerusalem: Emergency plan is: Call greg-g or csteipp someone else and ask for a window, if all else fails use healthy judgement and maybe apply patch and deploy from tin directly (documenting it on a, possibly security-hidden marked, bugzilla ticket and SAL) [01:07:15] !log awight Synchronized php-1.24wmf12/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 4) (duration: 00m 04s) [01:07:38] If you don't know how to apply a patch to tin directly, there's probably a few dependent modules you'd need to install in your brain first to do it responsibly that don't fit in this message :) [01:08:00] Krinkle, not apply to tin, apply to git from tin :) [01:08:05] !log awight Synchronized php-1.24wmf13/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 4) (duration: 00m 04s) [01:08:06] but Reedy would know. [01:08:20] and a few others that can help [01:08:33] i heard Reed-y was a master of that [01:08:45] ori: I'm trying to aptitude search it but aptitude ins't installed. [01:08:46] would be good to document it some day... [01:08:49] ori: I gues it's because of Trusty? [01:08:58] Krinkle: apt-cache search [01:09:21] ori: It's a wmf package that we probably only pushed to precise. [01:09:32] Or is that not how apt repos work? [01:09:37] yurikJerusalem: push from tin to gerrit? [01:09:46] !log awight Synchronized php-1.24wmf14/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 4) (duration: 00m 04s) [01:09:54] Reedy, just curious, i heard it was possible [01:09:59] greg-g: finally done with insane bugfix deployment. [01:10:09] yurikJerusalem: Yeah. Just the same as no git review. git push origin HEAD:refs/for/master [01:10:24] You don't need to be on tin to do that. [01:10:41] It'll require you forward keys, at which point you can just do it locally. And both only work if you have the proper permissions [01:10:42] (03CR) 10Andrew Bogott: "Please submit the reverts as individual patches so I can tell what's what." [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [01:10:43] ok, i will really have to go back through this channel to write up a doc at some point [01:10:45] and neither bypasses gerrit [01:11:02] git.wikimedai.org is a mirror, gerrit is the repo. You can bypass review and jenkins within gerrit (which is what that does) [01:12:02] (03PS14) 10Ori.livneh: Contint: add ocaml-nox package; dedupe imagemagick [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [01:12:04] (03PS1) 10Ori.livneh: Revert "mediawiki: Fix duplicate imagemagick package definition" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147702 [01:12:06] (03PS1) 10Ori.livneh: Revert "Revert "contint: reduce duplication with mediawiki::packages"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147703 [01:12:08] (03PS1) 10Ori.livneh: mediawiki::packages places php5-wmerrors and libmemcached11 behind dist guard [operations/puppet] - 10https://gerrit.wikimedia.org/r/147704 [01:12:13] andrewbogott: ^ [01:12:45] (03CR) 10Krinkle: [C: 031] Revert "mediawiki: Fix duplicate imagemagick package definition" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147702 (owner: 10Ori.livneh) [01:13:52] (03CR) 10Andrew Bogott: [C: 032] Revert "mediawiki: Fix duplicate imagemagick package definition" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147702 (owner: 10Ori.livneh) [01:14:10] (03PS2) 10Ori.livneh: mediawiki::packages: place deprecated packages behind lsbdist guard [operations/puppet] - 10https://gerrit.wikimedia.org/r/147704 [01:14:45] Krinkle: it's still running [01:14:46] in my console [01:14:47] (03Abandoned) 10Aaron Schulz: Add 1 more runner per job runner server [operations/puppet] - 10https://gerrit.wikimedia.org/r/147691 (owner: 10Aaron Schulz) [01:14:47] heh [01:14:53] libdclass-java can't be found either [01:15:03] (03CR) 10Andrew Bogott: [C: 032] Revert "Revert "contint: reduce duplication with mediawiki::packages"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147703 (owner: 10Ori.livneh) [01:15:40] ori: What's ocaml-nox for? [01:15:55] It appears to come out of nowhere [01:15:55] don't ask me, you added it [01:16:02] No? [01:16:08] it's almost certainly for building the math package [01:16:22] It's new in this commit https://gerrit.wikimedia.org/r/#/c/147681/14/modules/contint/manifests/packages.pp [01:16:24] which you just wrote [01:16:55] Maybe https://gerrit.wikimedia.org/r/#/c/147703/1/modules/contint/manifests/packages.pp shouldn't removed it? [01:17:12] I guess maybe mediawiki::packages used to have it? [01:17:55] possiblyt [01:18:01] it's a bit funny but it's fine [01:18:04] OK [01:18:10] it got removed by the patch andrew just merged and then reintroduced in the follow-up [01:18:23] (03CR) 10Krinkle: [C: 031] Contint: add ocaml-nox package; dedupe imagemagick [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [01:18:49] heh: Notice: Finished catalog run in 2302.47 seconds [01:19:26] (03CR) 10Andrew Bogott: mediawiki::packages: place deprecated packages behind lsbdist guard (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147704 (owner: 10Ori.livneh) [01:19:58] from https://dpaste.de/3109/raw you can see that puppet didn't break on this host [01:20:00] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.003 second response time [01:20:04] it never ran through initial provisioning in the first place [01:20:32] Yes [01:20:48] andrewbogott: the php5-memcached package at one point inappropriately expressed a dependency on it [01:21:00] The instance was created today and from the get go it ran errors, just the existing instances 1001-1003 are not provisinoing (which did work originally) [01:21:09] andrewbogott: it is most likely safe to remove, but i don't want to do that in this change [01:21:11] just like the existing instances* [01:21:24] ori: OK, maybe a more explicit comment? [01:21:34] sure [01:21:43] once this instance is running and working I'll install the jenkins slave and have it be the new slave for jobs that depend on node 0.10 [01:22:11] i suspect you have some long hours ahead of you, there is a host of problems not related to ::mediawiki [01:22:17] but at least we can sort those out [01:23:10] (03CR) 10Andrew Bogott: Contint: add ocaml-nox package; dedupe imagemagick (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [01:23:27] * andrewbogott at 6% battery [01:24:00] (03CR) 10Krinkle: Contint: add ocaml-nox package; dedupe imagemagick (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [01:24:21] (03PS3) 10Ori.livneh: mediawiki::packages: place deprecated packages behind lsbdist guard [operations/puppet] - 10https://gerrit.wikimedia.org/r/147704 [01:24:25] (03CR) 10Krinkle: Contint: add ocaml-nox package; dedupe imagemagick (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [01:24:44] (03CR) 10Ori.livneh: Contint: add ocaml-nox package; dedupe imagemagick (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [01:24:54] andrewbogott: merge merge merge! [01:24:55] (03CR) 10Andrew Bogott: [C: 032] mediawiki::packages: place deprecated packages behind lsbdist guard [operations/puppet] - 10https://gerrit.wikimedia.org/r/147704 (owner: 10Ori.livneh) [01:25:12] Krinkle: did you see my other question about the removal of imagemagick? Is that on purpose? [01:25:23] andrewbogott: I did and both of us replied even :P [01:25:26] andrewbogott: it gets pulled in via ::mediawiki::packages now [01:25:45] Ah, so I see. Email delay :) [01:25:59] andrewbogott: thanks for helping us out by the way, much appreciated [01:26:21] (03CR) 10Andrew Bogott: [C: 032] Contint: add ocaml-nox package; dedupe imagemagick [operations/puppet] - 10https://gerrit.wikimedia.org/r/147681 (owner: 10Krinkle) [01:26:48] ori: Before I even bother going forward, tell me operations/puppet (at its very base) is capable of error-free provisioning a Trusty-based system? E.g. creating a new instance in labs with no special stuff works, right? [01:26:59] Is that everything so far? [01:27:11] Krinkle: yes, basic trusty instances can be puppetized. [01:27:16] OK [01:27:26] I don't know what else does and doesn't work though, lots of puppet roles probably need to be trustified. [01:29:19] * ori may steal the word "trustified" [01:29:34] (03PS1) 10Yurik: Moved 426-04 to unified design [operations/puppet] - 10https://gerrit.wikimedia.org/r/147706 [01:30:08] Krinkle: Are you unblocked for the moment? [01:30:16] And/or heading out for drinks in five minutes anyway? [01:30:32] I'm not heading out anywhere, I'm already home. [01:30:42] Ah, ok. [01:30:49] andrewbogott: There's no further patches but I'm not unblocked, but I'm not sure how much more I can do. [01:30:59] i'm going to head out as soon as i confirm the changes apply correctly on the app servers [01:31:03] Well, I'll check in again if/when I approach a power supply. [01:31:05] which i'm doing right now [01:31:21] andrewbogott: feel free to take off, if anything puppet-y breaks i can fix it myself in a pinch [01:31:30] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [01:31:42] ori: OK. My phone still works, so I'll get emails if something is broken. [01:31:48] andrewbogott: it applied correctly [01:31:51] I mean, if something is broken and you email me about it :) [01:31:52] enjoy your weekend! [01:31:55] cool. [01:31:59] OK, g'night all [01:32:28] Krinkle: gotta run, but happy to keep helping later with the trustification stuff [01:33:09] ori: Yeah, as a first step I'll remove androidsdk from this and move it to a separate ci role. That's only used for 1 rare job, and that the old precise instances are plenty for that. [01:33:21] * ori nods [01:33:23] can be migrated later [01:33:44] (the AndroidSdk is where those libcc and stuff are coming from) [01:38:40] PROBLEM - SSH on lvs3001 is CRITICAL: Server answer: [01:39:40] RECOVERY - SSH on lvs3001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [01:44:33] (03PS1) 10Krinkle: contint: Move androidsdk::dependencies from general dependencies to dedicated role [operations/puppet] - 10https://gerrit.wikimedia.org/r/147708 [01:54:30] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [02:00:35] (03PS2) 10Krinkle: contint: Move androidsdk and libdclass from general dependencies to dedicated role [operations/puppet] - 10https://gerrit.wikimedia.org/r/147708 [02:08:25] (03PS3) 10Krinkle: contint: Move androidsdk and libdclass from general dependencies to dedicated role [operations/puppet] - 10https://gerrit.wikimedia.org/r/147708 [02:16:23] !log LocalisationUpdate completed (1.24wmf13) at 2014-07-19 02:15:20+00:00 [02:16:32] Logged the message, Master [02:17:50] (03Abandoned) 10Krinkle: contint: Move androidsdk and libdclass from general dependencies to dedicated role [operations/puppet] - 10https://gerrit.wikimedia.org/r/147708 (owner: 10Krinkle) [02:27:29] !log LocalisationUpdate completed (1.24wmf14) at 2014-07-19 02:26:26+00:00 [02:27:33] Logged the message, Master [02:27:50] (03Abandoned) 10Legoktm: Remove unused wmgUseMarkAsHelpful from InitialiseSettings.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/146031 (owner: 10Legoktm) [02:38:16] hi, is anyone here familiar with url-downloader.wikimedia.org? [02:38:41] namely, I'm trying to figure out: [02:38:42] legoktm@terbium:~$ curl --proxy url-downloader.wikimedia.org:8080 -i http://extdist.wmflabs.org/ [02:38:42] HTTP/1.0 403 Forbidden [02:39:37] OTOH, I can access any non *.wmflabs.org domain just fine [02:40:03] legoktm@terbium:~$ curl --proxy url-downloader.wikimedia.org:8080 -i http://www.legoktm.com/ [02:40:03] HTTP/1.0 200 OK [02:57:36] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 19 02:56:30 UTC 2014 (duration 56m 29s) [02:57:41] Logged the message, Master [03:14:30] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [03:16:57] (03CR) 10Krinkle: "FIXME: Doesn't work anymore or never worked. https://bugzilla.wikimedia.org/show_bug.cgi?id=68260" [operations/puppet] - 10https://gerrit.wikimedia.org/r/115162 (owner: 10Hashar) [03:21:36] (03PS1) 10Krinkle: contint: Remove browsertests class from role::ci::slave::labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/147713 (https://bugzilla.wikimedia.org/68260) [03:23:20] PROBLEM - Disk space on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:20] PROBLEM - MySQL disk space on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:20] PROBLEM - check configured eth on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:20] PROBLEM - mysqld processes on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:20] PROBLEM - MySQL InnoDB on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:21] PROBLEM - RAID on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:31] PROBLEM - DPKG on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:40] PROBLEM - check if dhclient is running on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:40] PROBLEM - MySQL Recent Restart on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:40] PROBLEM - puppet disabled on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:41] PROBLEM - MySQL Processlist on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:41] PROBLEM - puppet last run on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:23:49] can't be good [03:25:30] PROBLEM - SSH on es1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:49] !log springle Synchronized wmf-config/db-eqiad.php: depool es1001 (duration: 00m 06s) [03:25:55] Logged the message, Master [03:26:50] ACKNOWLEDGEMENT - DPKG on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:50] ACKNOWLEDGEMENT - Disk space on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:50] ACKNOWLEDGEMENT - MySQL InnoDB on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:50] ACKNOWLEDGEMENT - MySQL Processlist on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:50] ACKNOWLEDGEMENT - MySQL Recent Restart on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:51] ACKNOWLEDGEMENT - MySQL disk space on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:51] ACKNOWLEDGEMENT - RAID on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:51] ACKNOWLEDGEMENT - SSH on es1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds Sean Pringle depooled. investigating. [03:26:52] ACKNOWLEDGEMENT - check configured eth on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:52] ACKNOWLEDGEMENT - check if dhclient is running on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:53] ACKNOWLEDGEMENT - mysqld processes on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:54] ACKNOWLEDGEMENT - puppet disabled on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:26:54] ACKNOWLEDGEMENT - puppet last run on es1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. Sean Pringle depooled. investigating. [03:27:10] RECOVERY - Disk space on es1001 is OK: DISK OK [03:27:10] RECOVERY - MySQL disk space on es1001 is OK: DISK OK [03:27:20] RECOVERY - mysqld processes on es1001 is OK: PROCS OK: 1 process with command name mysqld [03:27:20] RECOVERY - check configured eth on es1001 is OK: NRPE: Unable to read output [03:27:20] RECOVERY - MySQL InnoDB on es1001 is OK: OK longest blocking idle transaction sleeps for 0 seconds [03:27:20] RECOVERY - RAID on es1001 is OK: OK: optimal, 1 logical, 2 physical [03:27:20] RECOVERY - SSH on es1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [03:27:21] RECOVERY - DPKG on es1001 is OK: All packages OK [03:27:30] RECOVERY - check if dhclient is running on es1001 is OK: PROCS OK: 0 processes with command name dhclient [03:27:31] RECOVERY - puppet disabled on es1001 is OK: OK [03:27:31] RECOVERY - MySQL Recent Restart on es1001 is OK: OK 18798146 seconds since restart [03:27:31] RECOVERY - puppet last run on es1001 is OK: OK: Puppet is currently enabled, last run 1501 seconds ago with 0 failures [03:27:40] RECOVERY - MySQL Processlist on es1001 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 0 statistics [03:33:30] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [03:41:38] (03PS1) 10Springle: Depooled es1001 from tin. Make it stick. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147714 [03:42:08] (03CR) 10Springle: [C: 032] Depooled es1001 from tin. Make it stick. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147714 (owner: 10Springle) [03:42:15] (03Merged) 10jenkins-bot: Depooled es1001 from tin. Make it stick. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/147714 (owner: 10Springle) [03:58:20] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 01:57:23 UTC [03:58:40] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: Puppet has 1 failures [04:15:40] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [04:23:49] (03CR) 1020after4: [C: 031] phab-login screen HTML-replace deprecated HTML [operations/puppet] - 10https://gerrit.wikimedia.org/r/147640 (owner: 10Dzahn) [04:31:06] (03PS1) 10Chmarkine: rt -- update cipher suite list to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147715 (https://bugzilla.wikimedia.org/53259) [04:32:30] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [05:02:07] !log Ungracefully restarting Zuul to clear the items stuck in the queue (picked a moment with no real items waiting in the queue). [05:02:11] greg-g: ^ [05:02:13] Logged the message, Master [05:02:14] :) :) [05:02:17] thanks sir [05:03:25] There were over a dozen by now. [05:04:10] wow [05:15:07] Krinkle: oh, just checking bug mail, faidon is mostly MIA.. uhhh, I shouldn't use that acronym [05:15:25] Ah, right. I remember [05:15:36] now [05:15:37] I forgot [05:15:38] go.dog is probably your best bet right now [05:17:02] (03CR) 10ArielGlenn: [C: 031] include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 (owner: 10Dzahn) [05:17:10] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sat Jul 19 05:17:09 UTC 2014 [05:17:30] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [05:18:03] (03CR) 10ArielGlenn: [C: 032] Fix typos in wikidata table descriptions [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/146420 (owner: 10Aude) [05:18:14] Krinkle: thanks for that testing/bug reporting [05:18:50] maybe that email rob.la sent was a bit late in the game? hard to tell when asking for wider help would have been more work or less work. [05:18:58] I was hoping to get nodejs 0.10 working though. That was the reason I started going down this rabbit hole with ci puppet role [05:19:04] :) [05:27:05] (03CR) 10ArielGlenn: [C: 032] "mumble mumble rebase not happening mumble mumble well let's try it anyways" [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/146419 (https://bugzilla.wikimedia.org/68024) (owner: 10Aude) [05:33:03] (03CR) 10ArielGlenn: [C: 031] Add puppet module for a tor relay [operations/puppet] - 10https://gerrit.wikimedia.org/r/140948 (owner: 10Dzahn) [05:34:36] (03PS3) 10ArielGlenn: delete unused download.mediawiki.org Apache site [operations/puppet] - 10https://gerrit.wikimedia.org/r/145497 (owner: 10Dzahn) [05:38:50] (03CR) 10ArielGlenn: [C: 032] "good catch, this was replaced by the module indeed" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145497 (owner: 10Dzahn) [05:44:09] who should I talk to about getting an RT account? [05:44:23] jgage: maybe you? ^ [05:44:59] legoktm: odd as it sounds, file an rt ticket [05:45:13] heh [05:45:15] you can mail to um [05:45:32] ops-requests@rt.wikimedia.org [05:46:04] access-requests@rt.wikimedia.org [05:46:26] well the other will work too, this just cuts one step out of the procedure [05:46:28] oh? apergos wikitech needs an update then [05:47:21] ah maybe so, annnyways I think we have people mail directly to the access requests queue these days [05:49:31] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [05:49:43] (03PS3) 10ArielGlenn: Swap from AdminSettings to PrivateSettings for snapshots/dumps [operations/puppet] - 10https://gerrit.wikimedia.org/r/145017 (owner: 10Reedy) [05:50:59] apergos: Ticket creation failed: RT account for Kunal Mehta / legoktm No permission to create tickets in the queue 'access-requests' [05:51:08] mmeeeehhh [05:51:12] well nm me then [05:51:18] RT? [05:51:19] ops-requests it is [05:51:22] ok :P [05:51:22] Russian Today? [05:51:28] Oh [05:51:37] request tracker [05:51:40] Ah. [05:51:57] (03CR) 10Brian Wolff: "Re: "I'm going to add the minimum distance variable as well, because without it there will be noticeable quality degradation for sizes ver" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/145132 (https://bugzilla.wikimedia.org/67525) (owner: 10Gergő Tisza) [05:53:09] apergos: #7930, do I need to get manager approval or anything like that for this? [05:53:11] (03CR) 10ArielGlenn: [C: 032] "Reedy, I'm not pushing this live til Monday and it will then take until the all current running dumps have completed before it's in use, s" [operations/puppet] - 10https://gerrit.wikimedia.org/r/145017 (owner: 10Reedy) [05:57:33] it's usually just about an nda, so I think not [06:00:07] (03Abandoned) 10ArielGlenn: 'qualify' openstack_version [operations/puppet] - 10https://gerrit.wikimedia.org/r/97007 (owner: 10ArielGlenn) [06:02:20] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [06:02:41] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [06:08:11] (03PS3) 10ArielGlenn: add index.html pages for various directories on dataset hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/144640 [06:14:20] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 04:13:10 UTC [06:14:44] grrr seriously, that was with sudo -i [06:14:47] so [06:15:14] oh, no, I see, this was just that i haven't gotten over there yet for the last +2 I did, whew [06:16:20] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [06:16:41] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [06:28:20] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:40] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:40] PROBLEM - puppet last run on db1059 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:41] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:00] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:00] PROBLEM - puppet last run on search1010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:10] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:11] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:20] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:40] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:41] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:00] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:46] it's the 6:30 am restart again [06:30:55] shoudl clear up on the next run [06:45:00] RECOVERY - puppet last run on search1010 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:45:20] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:45:20] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:45:40] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:45:40] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [06:45:41] RECOVERY - puppet last run on db1059 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:46:00] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:46:10] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:46:11] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:46:40] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:46:41] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:47:00] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [07:13:10] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Jul 19 07:13:03 UTC 2014 [07:57:20] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 05:56:48 UTC [07:57:30] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sat Jul 19 07:57:21 UTC 2014 [08:12:38] (03PS1) 10Aaron Schulz: Set cirrusSearchLinksUpdate to low-priority so it strops elongating high-priority periods [operations/puppet] - 10https://gerrit.wikimedia.org/r/147728 [08:13:02] _joe_: ^ [08:14:56] <_joe_> AaronSchulz: breakfast, and I'll take a look [08:14:57] <_joe_> :) [08:16:17] see "+type:runJobs +cirrusSearchLinksUpdate +host:mw1001" and "+type:runJobs +refreshLinks +host:mw1001" on kibana if that helps [08:20:55] <_joe_> oh great [08:20:59] <_joe_> that would help [08:27:40] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [08:46:41] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [09:08:45] (03CR) 10Giuseppe Lavagetto: [C: 032] Set cirrusSearchLinksUpdate to low-priority so it strops elongating high-priority periods [operations/puppet] - 10https://gerrit.wikimedia.org/r/147728 (owner: 10Aaron Schulz) [10:20:05] _joe_: https://gerrit.wikimedia.org/r/147732 should help too [10:30:06] <_joe_> AaronSchulz: I'll leave that for monday [10:30:07] <_joe_> :) [10:35:38] wow, someone editing the Template:! page [10:35:42] heh [10:36:05] "This template is used on 2,500,000+ pages." [10:37:04] not even edited, just purged via API [10:37:12] someone must have set the purgelinks option [11:24:40] PROBLEM - puppet last run on mw1072 is CRITICAL: CRITICAL: Puppet has 1 failures [11:41:40] RECOVERY - puppet last run on mw1072 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [12:15:52] (03CR) 10JanZerebecki: [C: 031] Add puppet module for a tor relay [operations/puppet] - 10https://gerrit.wikimedia.org/r/140948 (owner: 10Dzahn) [12:21:41] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: Puppet has 1 failures [12:27:57] (03CR) 10JanZerebecki: [C: 031] rt -- update cipher suite list to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147715 (https://bugzilla.wikimedia.org/53259) (owner: 10Chmarkine) [12:39:41] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [12:57:20] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 10:56:47 UTC [13:20:30] PROBLEM - Puppet freshness on analytics1029 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 13:17:53 UTC [13:22:30] PROBLEM - Puppet freshness on analytics1029 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 13:17:53 UTC [13:24:30] PROBLEM - Puppet freshness on analytics1029 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 13:17:53 UTC [13:26:30] PROBLEM - Puppet freshness on analytics1029 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 13:17:53 UTC [13:28:00] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [13:28:30] PROBLEM - Puppet freshness on analytics1029 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 13:17:53 UTC [13:30:30] PROBLEM - Puppet freshness on analytics1029 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 13:17:53 UTC [13:32:30] PROBLEM - Puppet freshness on analytics1029 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 13:17:53 UTC [13:34:30] PROBLEM - Puppet freshness on analytics1029 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 13:17:53 UTC [13:36:30] PROBLEM - Puppet freshness on analytics1029 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 13:17:53 UTC [13:37:11] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sat Jul 19 13:37:08 UTC 2014 [13:38:10] RECOVERY - Puppet freshness on analytics1029 is OK: puppet ran at Sat Jul 19 13:38:05 UTC 2014 [13:42:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:46:59] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [13:47:49] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 0.219 second response time [13:51:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:58:49] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 0.074 second response time [14:01:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:04:49] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 1.700 second response time [14:08:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:09:49] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 0.251 second response time [14:13:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:21:49] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 0.289 second response time [14:25:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:29:50] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 2.593 second response time [14:33:23] lol [14:34:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:35:59] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 9.658 second response time [14:37:15] (03PS1) 10Chmarkine: blog -- update cipher suite list to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147739 (https://bugzilla.wikimedia.org/53259) [14:38:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:39:50] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 3.408 second response time [14:43:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:50:50] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 0.516 second response time [14:55:00] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:01:50] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 3.074 second response time [15:05:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:08:59] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 7.383 second response time [15:11:15] (03PS1) 10Chmarkine: ishmael -- update cipher suite list to support PFS [operations/puppet] - 10https://gerrit.wikimedia.org/r/147740 (https://bugzilla.wikimedia.org/53259) [15:12:00] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:12:59] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 8.620 second response time [15:18:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:19:50] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53275 bytes in 0.969 second response time [15:23:00] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:24:49] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53275 bytes in 0.328 second response time [15:27:49] (03CR) 10BBlack: "If we end up doing this for each carrier one by one, it starts looking pretty ugly in the interim until analytics catches up. I assume we" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147706 (owner: 10Yurik) [15:27:59] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:28:28] (03CR) 10BBlack: "Poor formatting there, let's try again!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147706 (owner: 10Yurik) [15:32:29] Does gitblit.wikimedia.org needs a restart or so? (see icinga; I also currently have recurring problems accessing it) [15:33:39] the check is flapping for nearly 2 hours now [15:35:49] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53275 bytes in 0.181 second response time [15:36:27] probably [15:40:00] PROBLEM - gitblit.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:42:49] RECOVERY - gitblit.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 53274 bytes in 0.237 second response time [15:43:31] !log restarted gitblit service on antimony [15:43:37] Logged the message, Master [15:45:52] someone should send a patch upstream to the JVM maintainers that makes all java processes re-exec themselves once a day :P [15:53:38] <_joe_> bblack: I had a carefully-orchestrated set of rolling restarts for several java apps, at my last work [15:53:59] <_joe_> we had one in particular that needed to be restarted within 2 hours, or it would explode [15:54:12] <_joe_> "enterprise quality" [15:54:29] yeah I've found java is like windows. something looks amiss, first step is reboot it :P [16:15:54] (03PS1) 10BBlack: fix syntax error in check_http_lvs_on_port [operations/puppet] - 10https://gerrit.wikimedia.org/r/147746 [16:16:30] (03CR) 10BBlack: [C: 032 V: 032] fix syntax error in check_http_lvs_on_port [operations/puppet] - 10https://gerrit.wikimedia.org/r/147746 (owner: 10BBlack) [16:20:02] RECOVERY - LVS HTTP IPv4 on ocg.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 238 bytes in 0.004 second response time [16:21:07] ^ thanks for the text message, even though you're still in downtime and the issue was acked :P [16:26:33] <_joe_> bblack: that's how icinga/nagios works :) [16:28:52] PROBLEM - check configured eth on amssq33 is CRITICAL: Timeout while attempting connection [16:32:53] ^ that was me trying to renegotiate its link, didn't work [16:33:08] https://rt.wikimedia.org/Ticket/Display.html?id=7933 [16:33:22] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 14:32:45 UTC [16:42:29] (03CR) 10Yurik: "Yes, we will move one or two carriers, than move the rest at once. But, to continue supporting analytics until they are done, we will have" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147706 (owner: 10Yurik) [16:42:48] bblack, ^ [16:43:52] yurikJerusalem: did you want to merge that now, or wait till monday and/or when you're back? [16:44:10] bblack, you can merge it now - they are not in prod, only testing [16:44:15] ok [16:45:06] (03PS2) 10BBlack: Moved 426-04 to unified design [operations/puppet] - 10https://gerrit.wikimedia.org/r/147706 (owner: 10Yurik) [16:45:23] (03CR) 10BBlack: [C: 032 V: 032] Moved 426-04 to unified design [operations/puppet] - 10https://gerrit.wikimedia.org/r/147706 (owner: 10Yurik) [16:45:55] bblack, thx! [16:46:01] np [17:57:23] (03CR) 1001tonythomas: "https://bugzilla.wikimedia.org/show_bug.cgi?id=64962#c7" [operations/puppet] - 10https://gerrit.wikimedia.org/r/141287 (owner: 1001tonythomas) [18:13:02] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Jul 19 18:12:56 UTC 2014 [19:13:02] PROBLEM - Disk space on analytics1022 is CRITICAL: DISK CRITICAL - free space: / 1067 MB (3% inode=95%): [19:18:58] (03PS1) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:19:35] bd808: ^ initial commit :) [19:25:23] (03CR) 10BryanDavis: [WIP] puppetception: Initial commit (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 (owner: 10Yuvipanda) [19:26:46] (03CR) 10Yuvipanda: [WIP] puppetception: Initial commit (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 (owner: 10Yuvipanda) [19:29:05] (03PS2) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:32:13] (03PS3) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:33:17] (03PS4) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:38:14] (03PS5) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:39:47] (03PS6) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:41:38] (03Restored) 10Aaron Schulz: Add 1 more runner per job runner server [operations/puppet] - 10https://gerrit.wikimedia.org/r/147691 (owner: 10Aaron Schulz) [19:41:40] (03PS7) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:41:51] (03PS2) 10Aaron Schulz: Add 1 more runner per job runner server [operations/puppet] - 10https://gerrit.wikimedia.org/r/147691 [19:44:52] (03PS8) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:47:25] (03PS9) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:51:20] (03PS10) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:55:28] (03PS11) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [19:59:22] (03PS12) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [20:01:03] (03PS13) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [20:02:34] (03PS14) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [20:07:04] (03PS15) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [20:11:07] (03PS16) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [20:18:03] bd808: hmm, so I think ^ mostly works, except puppet fails for some reason when running mw-vagrant [20:18:40] YuviPanda: What sort of failure do you get? [20:19:17] composer fails to do its thing because /vagrant/mediawiki doesn't exist yet, because the clones haven't happened yet [20:19:40] /var/lib/dpkg/info/php5-redis.postinst: 22: /var/lib/dpkg/info/php5-redis.postinst: cannot create /etc/php5/conf.d/redis.ini: Directory nonexistent [20:19:42] which is weird [20:20:16] bd808: ah, hmm, this is a precise box, maybe that's why? [20:20:24] * YuviPanda switches it to branch [20:20:30] I thought I'd fixed the composer one... [20:20:58] precise and mwv master will not be friends [20:21:01] ah, ok, much better on precise-compat [20:22:00] bd808: once this gets merged, I suppose I'll rewrite labs_vagrant to be role that just uses this module [20:26:22] bd808: https://dpaste.de/dUap [20:26:24] is on precise [20:27:44] bah [20:29:19] hmm, might be permissions again [20:30:46] (03PS17) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [20:30:50] right, so now after fixing permissions the subclones work [20:35:16] YuviPanda: with puppetception where do these other manifests come from? local edits, another repo? are they meant to be temporary? [20:35:34] bblack: they come from another git repo [20:35:44] similar to current labs_vagrant but more general [20:35:55] are they meant to reference stuff in ops/puppet? [20:36:40] nope [20:36:54] but if they want to, they can do the submodule/subtree + symlink-modules trick [20:37:26] (03PS18) 10Yuvipanda: [WIP] puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [20:37:47] when this is done, labs_vagrant will just be a role that calls to this [20:38:48] YuviPanda: were the permissions problems in mwv or in puppetception? [20:38:51] Yeah I don't have enough context on all of this to really know what to think. In the back of my mind it makes me uncomfortable anytime something moves knowledge about sys/net config/dependencies away from "git grep in ops/puppet" [20:38:57] bd808: in puppetception [20:39:19] bblack: this is solely for independent projects that are run in labs that are currently unpuppetized. won't be in prod or anything [20:40:06] It's really for Labs users to setup things easily/repeatably on their own instances [20:40:07] yeah the fears are about scenarios where, because they do run in our network, they interact with our actual infrastructure. I guess it can't be helped, though, if you want these results. [20:40:17] yup [20:40:19] at least there will be somewhere to look at all :) [20:40:24] indeed [20:40:31] utrs, for example, is on labs but completely unpuppetized [20:44:59] (03PS1) 10Ori.livneh: Load nutcracker server list from realm-specific yaml file [operations/puppet] - 10https://gerrit.wikimedia.org/r/147830 [20:50:24] bd808: hmm, apt-get anything from puppet fails, so mwv fails [20:51:09] YuviPanda: Why would apt-get fail? [20:51:29] bd808: because php5-redis fails with [20:51:46] /var/lib/dpkg/info/php5-redis.postinst: 22: /var/lib/dpkg/info/php5-redis.postinst: cannot create /etc/php5/conf.d/redis.ini: Directory nonexistent [20:51:51] manually creating the conf.d directory fixes things [20:52:10] YuviPanda: 14.04 or 12.04? [20:53:06] (03PS1) 10Ori.livneh: mediawiki: only provision ini files for igbinary and wmerrors on precise [operations/puppet] - 10https://gerrit.wikimedia.org/r/147832 [20:53:18] bd808: 12.04 [20:53:22] am on precise-compat [20:53:32] what is precise-compat? [20:53:50] tag for the last version of vagrant that worked on precise [20:54:01] We tagged + branched the last mwv version before the trusty switch [20:54:29] how come? [20:54:48] because someone with labs-vagrant on an old host reported how it is now borked [20:55:29] well, ditto standalone vagrants, but we just told people to suck it up and upgrade [20:55:48] and they did, which was a bit of pain, but which helped the overall effort immensely by generating useful bug reports [20:56:02] PROBLEM - puppetmaster backend https on palladium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:56:53] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.044 second response time [20:57:02] I'm not personally interested in supporting 12.04 but te tag seemed a easy way to provide a bridge for existing labs users [20:57:08] *the [20:57:17] yeah, there's going to be no updates to the branch [20:57:53] well, i won't quibble; you guys interact with the users more [20:58:02] if there's value in it, i don't mind [20:58:07] for new users we've just been telling them to use trusty [20:59:52] Apparently we have 26 labs-vagrant installs right now -- [21:00:00] oh wow [21:00:14] bd808: have you had a chance to try out https://gerrit.wikimedia.org/r/#/c/147317/ ? [21:00:22] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Sat 19 Jul 2014 19:00:02 UTC [21:00:48] * bd808 crashed his client with a long url again :( [21:01:46] ori: hmm, the box vagrant downloads is a 'clean' box, right? we aren't depending on any specific packages outside of stock ubuntu server being installed on it, right? [21:01:56] (03PS2) 10Krinkle: mediawiki: only provision ini files for igbinary and wmerrors on precise [operations/puppet] - 10https://gerrit.wikimedia.org/r/147832 (owner: 10Ori.livneh) [21:02:00] (03CR) 10Krinkle: [C: 031] mediawiki: only provision ini files for igbinary and wmerrors on precise [operations/puppet] - 10https://gerrit.wikimedia.org/r/147832 (owner: 10Ori.livneh) [21:02:09] hey Krinkle :) [21:02:19] YuviPanda: we shouldn't, yeah [21:02:23] ori: Hey [21:02:26] ok [21:02:37] It downloads a "cloud" box, but it seems to work with labs-vagrant (at least for the roles I've tried) [21:02:51] I'm guessing the problem here is that the host is mostly fucked, since this is the 3rd different patch for totally different things I'm trying on this box [21:02:54] it's possible that we assume incorrectly that a certain package is part of the base install [21:03:04] whereas it's really only included in the cloud install [21:03:12] at any rate it never hurts to be explicit [21:03:30] hmm, ok [21:03:33] i.e. a package { 'php5-common': ensure => present, } costs us barely anything if it's already installed [21:03:35] * YuviPanda stops testing on this box, tries on trusty [21:03:40] ori: could you maybe review these optimisations? https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/core+branch:master+topic:cleanup-mwloader,n,z [21:03:50] YuviPanda: Here's a SMW query to list all the labs-vagrant hosts -- http://bit.ly/Ws2rOO [21:03:57] Krinkle: sure [21:06:54] nice, Mount['/srv'] does do the trick :) [21:07:10] (03PS1) 10Ori.livneh: Get rid of deprecated HHVM module [operations/puppet] - 10https://gerrit.wikimedia.org/r/147833 [21:07:47] ori: Sorry, I didn't get to that on Friday. It's on my list for first thing Monday morning. [21:08:07] bd808: btw, composer dependencies is still messed up [21:08:25] tries to set working directory to /vagrant/mediawiki but doesn't depend on the git clone [21:08:52] YuviPanda: Grrr. I must have missed one [21:10:11] hmm, there's no mention of that cwd in the composer.pp file [21:10:18] YuviPanda: Exec['install_composer_deps'] seems to require Git::Clone['mediawiki/core'] [21:10:32] Is there another composer exec that needs this? [21:10:55] hmm, so working directory not found errors for all execs, actually [21:11:02] Error: /Stage[main]/Php::Remote_debug/Php::Ini[remote_debug]/Exec[/usr/sbin/php5enmod -s ALL _remote_debug]: Failed to call refresh: Working directory '/vagrant/mediawiki' does not exist [21:11:34] YuviPanda: Nothing setting up the /vagrant symlink in your new setup? [21:11:41] bd808: not yet, I do :) [21:12:01] lrwxrwxrwx 1 root root 22 Jul 19 21:07 vagrant -> /srv/puppetception/git [21:12:08] but /srv/puppetception is vagrant:www-data though [21:12:28] wqat Error: /Stage[main]/Mediawiki/Git::Clone[mediawiki/core]/Exec[git_clone_mediawiki/core]/returns: change from notrun to 0 failed: Working directory '/vagrant/mediawiki' does not exist [21:13:32] (03PS1) 10Ori.livneh: reprepro: stop importing Facebook's HHVM packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/147836 [21:13:54] bd808: created an empty directory at /vagrant/mediawiki [21:13:55] now it works [21:14:39] YuviPanda: The clone doesn't create the directory? /me is puzzled [21:14:57] bd808: you mean the mwv clone? [21:15:18] I mean Git::Clone[mediawiki/core] [21:15:28] (03CR) 10Ori.livneh: [C: 031] Add 1 more runner per job runner server [operations/puppet] - 10https://gerrit.wikimedia.org/r/147691 (owner: 10Aaron Schulz) [21:15:35] bd808: all execs seem to try to set /vagrant/mediawiki as their working directory. this fails because the directory doesn't exist, and the directory creation fails because of git clone fails because it uses an exec at some point and that doesn't execute because the directory does not exist [21:15:54] puppetception! [21:17:23] * bd808 tries to recreate in vagrant [21:17:26] indeed. best module name ever? :) [21:19:42] YuviPanda: I can recreate in vagrant. I'll try to track the source of that down and fix it. [21:19:49] bd808: cool! [21:20:22] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Sat Jul 19 21:20:21 UTC 2014 [21:20:37] bd808: i set the mw dir as the default cwd for all execs in that module [21:21:59] wouldn't that be a problem on first installs for all execs before the git clone, including the exec that does the git clone? [21:22:38] bd808: composer install also seems to complete without errors (I think?) but has return code of 1 https://dpaste.de/AEsr [21:23:36] YuviPanda: it's a bit surprising for that default to bubble out to the git module's exec [21:23:48] YuviPanda: anyways, invoke composer without '--quiet' obviously to see what it's doing [21:24:08] in fact we should probably omit 'quiet' in the manifest too [21:24:17] since puppet will capture stdout/stderr unless the command fails anyways [21:24:29] ori: Where did you set the default cwd? I can't find it [21:25:29] bd808: line 62 of mediawiki's init.pp [21:26:05] hmm, so composer fails with a timeout when trying to curl https://packagist.org/p/provider-active$435a0c80857cd873e76fb4ec3bd1c3fd22aedba4bb5940a95d431fd5f308a186.json [21:26:22] and curling that seems to give me something... *huge* [21:26:43] 4.5k [21:26:46] not that huge [21:26:46] bah, by huge I mean something 6.5k [21:26:59] I was expecting a smaller file so just curl'd it to stdout :D [21:28:39] (03PS1) 10Ori.livneh: osmium(hhvm test host): remove ::hhvm::dev [operations/puppet] - 10https://gerrit.wikimedia.org/r/147840 [21:31:21] The "https://packagist.org/p/provider-active$435a0c80857cd873e76fb4ec3bd1c3fd22aedba4bb5940a95d431fd5f308a186.json" file could not be downloaded: Failed to open https://packagist.org/p/provider-active$435a0c80857cd873e76fb4ec3bd1c3fd22aedba4bb5940a95d431fd5f308a186.json (Operation timed out after 4979 milliseconds with 81677 out of 325944 bytes received) [21:31:22] hmm [21:31:35] it definitely does take more than 5s to download [21:31:37] at least on that labs host [21:31:40] should we increase timeout? [21:32:17] Seems reasonable. We give apt more time I think for similar reasons. [21:33:05] Or we stop running that using hhvm [21:33:16] ah, right. that as well [21:33:19] which is what makes the timeouts [21:33:29] I'd think running it with increased timeout would be better [21:33:34] http://vanderveer.be/speed-up-composer-by-using-hhvm-including-a-slowtimer-error-fix/ [21:34:46] (03PS1) 10Ori.livneh: mediawiki::multimedia: install libav-tools instead of ffmpeg on trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/147841 [21:35:46] (03CR) 10jenkins-bot: [V: 04-1] mediawiki::multimedia: install libav-tools instead of ffmpeg on trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/147841 (owner: 10Ori.livneh) [21:36:17] bd808: trying with that now (livehack) [21:38:26] bah [21:38:27] [InvalidArgumentException] [21:38:34] Command "ResourceLimit.SocketDefaultTimeout=30" is not defined. [21:39:31] of course, I was passing them to composer instead of hhvm [21:39:32] * YuviPanda fixes [21:47:24] bd808: hmm, weird [21:47:25] Notice: /Stage[main]/Mediawiki/Exec[install_composer_deps]/returns: The "https://packagist.org/p/provider-archived$eb4712f935872d8a51789ac9f0cb120fc306457b423326be86941fd5eeded900.json" file could not be downloaded: Failed to open https://packagist.org/p/provider-archived$eb4712f935872d8a51789ac9f0cb120fc306457b423326be86941fd5eeded900.json (Operation timed out after 29580 milliseconds with 589581 out of 618878 bytes r [21:47:25] eceived) [21:48:53] I've increased it to 300s now [21:51:22] increasing it to 300 lets it go! [21:55:14] bd808: ori w00t http://extdist-test.wmflabs.org/wiki/Main_Page via puppetception! [21:55:34] nice [21:56:07] bd808: so, hhvm. should we increase timeout to 5m, or use zend? [21:56:52] ori: ^ [21:56:52] We have switch most (all?) of the other maintenance scripts to php5, but either works for me. [21:57:00] let me increase then [21:58:28] I think I have a fix for the /vagrant/mediawiki dependency too. Just testing again before uploading. [22:19:03] (03PS19) 10Yuvipanda: puppetception: Initial commit [operations/puppet] - 10https://gerrit.wikimedia.org/r/147759 [22:19:05] bd808: ori ^ woot, WIP tag removed :) [22:19:18] comments / +1s / -1s appreciated :) [22:25:56] (03PS1) 10Aaron Schulz: Increase the number of parsoid job runners [operations/puppet] - 10https://gerrit.wikimedia.org/r/147850 [22:27:18] (03CR) 10Aaron Schulz: "Also see https://gerrit.wikimedia.org/r/#/c/147849/ and http://ganglia.wikimedia.org/latest/?c=Parsoid%20eqiad&m=cpu_report&r=hour&s=by%20" [operations/puppet] - 10https://gerrit.wikimedia.org/r/147850 (owner: 10Aaron Schulz) [22:27:24] (03PS2) 10Aaron Schulz: Increase the number of parsoid job runners [operations/puppet] - 10https://gerrit.wikimedia.org/r/147850 [23:08:40] (03PS2) 10Ori.livneh: mediawiki::multimedia: install libav-tools instead of ffmpeg on trusty [operations/puppet] - 10https://gerrit.wikimedia.org/r/147841 [23:33:10] !log aaron Synchronized php-1.24wmf4/includes: 926e1997b53f563a4e7f3c540e32b45ddb24b3c5 & 017891ba41cc72987bf3cb441004a847d20105b4 (duration: 00m 09s) [23:33:15] Logged the message, Master [23:33:29] !log aaron Synchronized php-1.24wmf4/maintenance: 926e1997b53f563a4e7f3c540e32b45ddb24b3c5 & 017891ba41cc72987bf3cb441004a847d20105b4 (duration: 00m 08s) [23:33:35] Logged the message, Master