[00:00:24] Lightning deploy time, going to deploy VE [00:00:44] I think gwicke wanted to deploy node_modules first? [00:01:29] Parsoid is currently fine without deploying anything [00:01:38] okay, nevermind me [00:01:42] other node services depend on binary npm libs though [00:02:05] so during a node upgrade you need to re-run npm install before restarting the service [00:02:32] best to also rm -r node_modules in my experience [00:03:02] mwalker, oops - that was my commit [00:03:15] MaxSem: ya it was; but I don't understand why it broke things [00:03:31] what's broken? [00:03:53] all writers are rendering to PDF [00:04:03] so bookcmd=render&writer=epub will still render a PDF [00:04:18] who cares about other formats, anyway?:P [00:04:20] but... bookcmd=render&writer=rdf2latex (which is our renderer) works just find [00:04:25] !log catrope synchronized php-1.23wmf6/extensions/VisualEditor/ 'Update VisualEditor with cherry-picks' [00:04:28] apparently lots of people! [00:04:41] Logged the message, Master [00:05:09] * MaxSem tries to recall how this stuff works [00:05:47] OK I'm done [00:05:53] superm401: You had stuff for the LD window? [00:06:06] RoanKattouw, yeah, about to do it. [00:06:09] Cool [00:07:11] hmm, it seems that parsoid for officewiki is now producing 503s [00:07:43] hmm? [00:07:47] Error: tunneling socket could not be established, cause=Parse Error [00:08:03] is the URL for officewiki an https one? [00:08:05] * paravoid looks [00:08:29] yes it is [00:08:38] s/https/http/ for these 5 URLs [00:08:46] if it's https, the node requests code tries to do a CONNECT [00:08:52] when using a proxy [00:09:02] k, will prepare a patch [00:09:21] it's board, collab, office, wikimaniateam & wikitech [00:09:31] oh [00:09:33] on that note [00:09:38] wikitech won't work as it is :( [00:09:40] shit [00:09:49] I can match against the apiURI [00:09:54] (03CR) 10Qgil: "For what is worth, the logo is based on https://commons.wikimedia.org/wiki/File:Wikimedia_logo_text_RGB.svg + the "inter" string. This fav" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [00:09:56] wikitech doesn't run on the main cluster [00:10:11] you can't connect to api.svc and expect wikitech to work [00:10:25] hrm, minor complication [00:10:44] lookup structure for special cases? [00:10:57] I guess so [00:11:20] apiProxyURIs with an '*' entry and others by prefix [00:12:53] it's an accident that it worked now [00:13:14] wikitech used to be on linode, for example :) [00:14:06] !log mflaschen synchronized php-1.23wmf6/extensions/Thanks/ 'Deploy Thanks bugfix to 1.23wmf6' [00:14:15] You're up, RoanKattouw. [00:14:22] Logged the message, Master [00:17:07] MaxSem: careful; you might lose brain cells [00:18:12] superm401: I went before you [00:18:15] So I'm already done [00:18:59] Oh, whoops, didn't see that. Thanks. [00:21:07] paravoid: is there another service for wikitech and the other special cases that accepts proxy requests? [00:22:15] I don't think so [00:22:31] MaxSem: do you have shell on pdf[1-3]? [00:22:50] hrm [00:23:48] I'll disable the proxy for now [00:25:01] mwalker, no [00:25:08] :'( [00:25:30] my only thought at this moment is that we're missing something in the metabook format [00:25:34] somehow [00:25:36] !log reverted Parsoid proxy change as officewiki and some other https-only wikis were broken [00:25:52] Logged the message, Master [00:27:08] gwicke: everything but wikitech is easily solvable, as far as I'm aware [00:27:12] by s/https/http/ [00:27:17] paravoid: maybe we should have waited a bit with removing the randomization from the parsoid api uri [00:27:58] yeah, just better to un-break things first [00:29:40] I'll add support for completely disabling the proxy for specific wikis [00:31:45] sorry that my change was broken for wikitech :) [00:31:50] or that I did not notice https in the config [00:32:36] there seems to be a problem with securepoll [00:32:40] someone can help me? [00:34:24] Reedy ^ [00:34:48] mwalker: I just poked him in -staff :P [00:34:54] heh [00:34:55] * Vito waits [00:35:39] (03PS1) 10Ryan Lane: Add a deploy.restart runner and module call [operations/puppet] - 10https://gerrit.wikimedia.org/r/100509 [00:35:57] gwicke: ^^ [00:36:24] so, with that change you can add "parsoid" to the service_name config for the parsoid/Parsoid repo [00:36:25] Vito: who setup your securepoll? the decryption key should be held in escrow by them I think [00:37:08] mwalker: I don't know, I'm scrutineer for en.wiki's ACE2013 [00:37:12] mwalker: it was Reedy I think [00:37:14] and from tin you can call: sudo salt-call -l quiet publish.runner deploy.restart parsoid/Parsoid [00:37:24] I'll make a script to wrap that command [00:37:45] it'll batch the restarts to 5 minions at a time [00:37:57] of course, I still need to test this change in labs before I merge it in [00:38:30] Jamesofur|away: you might also be able to help with Vito's problem [00:42:46] hmm; Vito it looks like everyone who might know anything is currently away -- I would drop a message on [en:User_talk:Philippe_(WMF)] [00:43:51] maybe also to [en:User_talk:Jalexander] [00:45:41] mwalker: I'm not sure if Philippe can help us these days [00:45:49] anyway I think I'll fall asleep [00:46:22] hmm; I thought he was the keeper of all the keys [00:46:24] *shrugs* [00:46:26] Ryan_Lane, awesome, let me add that to wikitech [00:46:42] gwicke: wait a bit :) [00:46:47] I need to test and merge it in [00:46:59] Ryan_Lane, ok ;) [00:47:03] I'd like to make a wrapper for the service restart too, since that command is ugly as sin [00:49:10] (03PS1) 10Aaron Schulz: Include redis on logstash servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/100511 [00:49:22] greg-g: I know what commit is breaking collection now; but I'm not sure the best way of pinning the cluster to the commit before it (i'd rather not have to revert that commit and everything past it) [00:49:48] ideally I'd fix it; but I can't seem to find whatever the offending line is [00:49:57] ... at this particular point in time [00:53:28] Ryan_Lane, btw: https://www.mediawiki.org/wiki/Parsoid/Packaging#Option_2:_deploy_repo_with_code_as_submodule [00:54:05] ori-l: yeah it looks like the buffering is really just on the client, so I can lower that down [00:55:14] (03PS2) 10Aaron Schulz: Include redis on logstash servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/100511 [01:03:22] gwicke: hm. I'm not totally sure I understand option 2's proposal [01:03:36] I also wonder what you mean by puppet can manage the config in option 1 [01:03:49] do you mean parsoid's configuration itself, or the upstart file? [01:03:59] upstart [01:04:15] * Ryan_Lane nods [01:04:15] although potentially also localsettings.js [01:04:38] it's better to deploy localsettings.js via git-deploy [01:04:54] ops doesn't want to touch application deployment or configuration [01:05:36] ops does want to maintain upstarts and such, since they run with root privs [01:06:38] updated option 2 to include localsettings.js [01:06:39] Ryan_Lane: I actually received opposite guidance -- that I needed to template / puppetize my configuration [01:06:53] mwalker: for which application? [01:06:59] also added another option that moves the debianization to a submodule too [01:07:00] the new PDF renderer [01:07:12] bleh. who said that? [01:07:21] ori-l and paravoid [01:07:26] I wonder why [01:07:36] it's an application like anything else [01:07:41] wait, what did I say? [01:07:44] I think it makes sense to manage the config via deployment [01:08:03] it makes sense for the init/upstart script to be managed by puppet [01:08:26] paravoid: I interpreted what you said a while ago; that my configuration file for ocg-collection should be a template in a module [01:08:39] I don't remember saying that [01:08:47] and I don't think that's right? [01:08:50] this was a couple of weeks ago when I was trying to get ori to create me a /operations/config/ocg repo or some such [01:08:58] I wasn't around for that [01:09:00] heh [01:09:06] I'd definitely manage the config via deployment [01:09:08] * mwalker scrounges in log files [01:09:12] deploying localsettings like parsoid is is best imho [01:09:20] *is doing it [01:09:26] otherwise ops needs to be around to make config changes [01:09:27] i.e. git-deploy [01:09:33] which may need to occur during deployments [01:10:04] I'd like to get to the point where apps can generally be maintained by devs first, and ops second [01:10:33] ahhhh [01:11:24] it was jeremyb, ori-l, and MZ [01:11:30] an unlucky combination [01:11:31] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [01:12:01] apologies for the slander paravoid [01:12:01] :D [01:12:07] hehe, no worries [01:12:34] mwalker: so, yeah, making a config repo that you deploy will make things way easier in the long run [01:12:44] *nods* that's what I was thinking :) [01:12:49] paravoid: btw: https://gerrit.wikimedia.org/r/#/c/100509/ [01:12:49] my opinion for the record is that you shouldn't involve puppet in the deployment hot path [01:12:57] I had a request in for /operations/ocg-config [01:13:01] both because puppet takes a while to converge, might be broken etc. [01:13:12] how does https://www.mediawiki.org/wiki/Parsoid/Packaging#Option_2:_deploy_repo_with_code_as_submodule look like to you? [01:13:14] I still need to test my change, of course [01:13:16] and because access rights for puppet are restricted to a smaller set than deployers [01:13:49] gwicke: ah. you'd deploy all of that, and parsoid itself would be a submodule [01:13:58] so you'd do config and code at the same time [01:14:11] that's definitely an option [01:14:21] not a bad one either [01:16:41] Ryan_Lane: yup [01:18:06] I've avoided that elsewhere (like mediawiki) due to a large amount of recursive submodules [01:18:15] because it makes deployment somewhat complicated [01:18:31] I think in this case it's probably simpler [01:19:05] doing it the other way around would force us to use even more submodules [01:19:15] * Ryan_Lane nods [01:19:19] that looks like a good approach to me [01:21:14] now if a script copied in a debian dir specced by ops and built a deb from that, there would not be a need for ops involvement [01:21:38] well, for third party use you can use a launchpad ppa [01:21:53] no need for ops involvement with that [01:22:50] sure [01:22:58] I was more thinking about using it for deployment [01:23:14] so that we can get dependencies etc [01:23:32] how would you install the deb? [01:24:06] and how would it be added to the apt repo? [01:24:12] it sounds not impossible to script 'apt-get install parsoid' with a version parameter [01:24:45] when you needed to deal with dependencies, we'd need to be involved [01:24:59] ops would spec the deps in debian/ [01:25:25] so far no one is agreeing with you that this is a good idea ;) [01:25:45] (03PS1) 10Faidon Liambotis: Varnish: set backend_random for POSTs [operations/puppet] - 10https://gerrit.wikimedia.org/r/100516 [01:25:48] upgrade: 'apt-get install parsoid=0.1.33' [01:25:56] gwicke: ^^ [01:25:57] downgrade: 'apt-get install parsoid=0.1.32' [01:26:05] gwicke: what would call that? [01:26:29] and again, how would the package get into the repository? [01:26:42] Ryan_Lane, a dsh or salt script for example [01:27:05] (03CR) 10Faidon Liambotis: [C: 032] Varnish: set backend_random for POSTs [operations/puppet] - 10https://gerrit.wikimedia.org/r/100516 (owner: 10Faidon Liambotis) [01:27:14] what you're describing is a new deployment system. using debs [01:27:35] copying a file built by an ops-controlled script using ops-controlled debian stuff to the repo is relatively simple [01:27:36] and as I mentioned that's a shitload of work for not very much gain [01:27:40] no. it's not [01:27:55] cp a b [01:28:17] 1. it's a reprepro call, and it needs to happen on a specific system [01:28:23] 2. the package needs to be built somewhere [01:28:30] where does 2 occur? [01:28:32] jenkins? [01:28:36] we don't really trust jenkins [01:28:55] I'd think the deployer triggers the build and increments the version [01:28:57] then, a dsh or salt script needs to be created [01:29:05] and it needs to be callable by deployers [01:29:59] this is way more complex than you're thinking it is [01:30:11] and it gains us very little [01:30:29] why don't we look at how we can handle dependencies in the current deployment system, rather than rewriting it from scratch? [01:30:53] well, we would not have to write another deploy system for others [01:31:04] could use proper dependencies [01:31:17] and generally help third parties [01:31:49] you're taking something that's as simple as deploying source and turning into something that builds a binary, injects it into an apt repo, then requires apt-get update && apt-get install [01:31:57] the amount of scripting does not sound prohibitive [01:32:08] reprepro doesn't support multiple versions of a package in a repo [01:32:13] so we'd need to replace reprepro too [01:32:50] and you'd still need to deploy the codebase to a system via git first, too :D [01:33:01] most deb repos seem to have multiple versions of the same package, I wonder what they are using [01:33:02] to build the deb [01:33:10] fwiw, as much as I love .debs obviously, I think they're unsuitable for an agile deployment workflow [01:33:38] I just think it's an overcomplicated solution to deployment [01:33:54] salt has the ability to manage packages [01:34:02] we could specify dependencies in the repo config [01:34:24] paravoid, which issue to you see that would make agile hard? [01:34:35] the need to increment the version number? [01:34:39] (and multiple versions of the same package in the same suite for the same architecture is not something that Debian or its tools do in general) [01:34:58] paravoid: ubuntu PPAs do it [01:35:08] the cassandra repo for example has all versions since adam and eve [01:35:18] and they add a new one every two weeks or so [01:35:55] anyway, let's look at what you need and solve the problem using the system we have [01:35:56] building their deb from git is a single line [01:35:59] http://www.apache.org/dist/cassandra/debian/dists/20x/main/binary-amd64/Packages [01:36:02] rather than rearchitecting it [01:36:03] has just one version [01:36:28] a PPA for third parties, and git-deploy for us is a simpler solution [01:36:40] they have different suites for 1.2.x etc. [01:37:05] it might have been the datastax repo that had all of them [01:37:16] anyway, the whole "modify source, rebuild deb, put in apt, upgrade packages" workflow isn't great [01:37:24] it's not just the version [01:37:26] it's too messy [01:37:44] from a security perspective especially [01:37:46] it's also going to be slow, which is going to fuck us if we need to revert quickly [01:38:03] and it's a lot of places for the system to break [01:38:05] but in general, it feeld overcomplicated for something that can just be a simple "rsync code" process [01:38:15] s/rsync/git push/ or whatever [01:38:22] "copy files" if you want [01:38:27] * Ryan_Lane nods [01:38:27] http://debian.datastax.com/community/pool/ [01:39:03] gwicke: so, as I asked earlier. what's the problem you're trying to solve? I'm more than happy to add features or make changes to trebuchet [01:39:19] if it's dependencies, that's likely a solvable problem [01:39:38] services won't usually be a single repository [01:40:22] I'm running fever and it's 3:40am, so if you'll excuse me :) [01:40:28] paravoid: yeah, go to bed :D [01:40:33] I'd be happy to discuss this via mail, if you need my opinion, fwiw [01:40:49] ouch, that sounds like bed time [01:41:11] gwicke: right, so make multiple repos and deploy them separately, or have a combined one using submodules [01:41:15] paravoid: k, thx [01:41:50] I still do think .debs are great to publish for use by third parties [01:41:52] Ryan_Lane, I'm a bit wary about duplication of effort [01:42:01] restart the services via the deploy.restart command I just pushed [01:42:04] we'll have to do some proper packaging anyway [01:42:21] so to me it looks attractive to save the effort to do this twice [01:42:23] maybe tag releases every now and then (2 weeks sounds fine, less often even better I think) [01:42:49] gwicke: lots of upstreams make packages for third parties and deploy from git [01:42:54] if I was a third party admin that wanted to just install a wiki and apt showed updates every tuesday and thursday I'd very annoyed [01:42:59] openstack is a great example of this [01:43:00] I'd be* [01:43:10] rackspace and a number of other public clouds use openstack from git [01:43:16] third parties use debs [01:43:17] Ryan_Lane, I guess that is what unstable vs. stable is for [01:43:34] eh? what do you mean? [01:43:55] stable vs. unstable repo [01:43:57] when I say they use git, I mean they don't use debs at all [01:44:13] re third parties only wanting major releases [01:44:18] and there's only one repo. they use branches and tags for releases [01:44:26] err [01:44:29] one repo per project [01:44:45] each project does releases via branches and tags [01:44:51] stable releases are branches [01:45:02] so that security and major bugs can be backported [01:45:11] debs are generated from it [01:46:11] setting up a new service-based MW system using git only won't be very convenient [01:46:17] so IMO we should do nice packaging for that [01:46:28] for third parties it won't be very convenient [01:46:34] for us it is very convenient [01:46:58] you know we moved away from using debs for this kind of stuff, right? [01:47:04] I mentioned this earlier [01:47:10] whether that is also useful for our own service deploys remains to be seen IMO [01:47:35] we've been systematically killing off any "configuration" packaging [01:48:07] I don't think that the distribution mechanism is that important [01:48:40] it matters more for near-atomic deploys [01:49:03] atomicity isn't the key. speed of deploy is [01:49:11] and ability to quickly revert [01:49:18] especially the ability to quickly revert [01:49:46] also, far less people understand how to create debs [01:49:52] so realistically it'll be on ops to do so [01:50:13] yeah, but I don't think that we can get away without packaging anyway [01:50:35] for third party use, yeah [01:51:23] though we've avoided it with MW for ages now [01:52:34] anyway, let me know what you need change in the deployment system to make this work like you need [01:53:11] unless we decide to ignore third party users [01:53:11] for PHP code I agree that unpacking a huge deb would be too slow [01:53:12] installing something like parsoid from a deb otoh is a different animal [01:53:12] a second maybe? [01:53:12] most of the time will be waiting for the restart [01:53:14] seeing as that it took a shitload of political capital to make a new deployment system I think it's unlikely you'll convince anyone to make a debian based one ;) [01:53:39] (03CR) 10GWicke: "Nice!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/100516 (owner: 10Faidon Liambotis) [01:54:26] but really, let me know what's missing from the current system [01:54:45] and I'll see about adding it [01:55:15] if the major issue is dependencies, I'll look at how to handle it [01:55:59] you should give me an example of a dependency that needs to be handled and the way in which it needs to be handled [01:56:11] it looks like we'll handle dependencies between our own code with subrepos, and to me that is fine [01:56:27] right, system level dependencies are another beast, though [01:56:39] coordinating something like node upgrade and node_modules would be nice to include in a deployment system [01:57:18] Selective deployment for testing? [01:57:19] especially when going for automated canary stuff and staggered service restarts [01:57:30] RoanKattouw: yeah, I talked about adding a canary option [01:57:56] and I just pushed in a change for staggered service restarts ;) [01:58:51] ideally the deployment system would also manage pooling/depooling as well [01:59:23] I could manage the LVS files for services. that's not amazingly hard [01:59:59] we could actually have multiple LVS pools and move canaries to another testing pool [02:00:17] Right [02:00:19] which would be a matter of depooling them from one and pooling them in the other [02:00:32] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (200976) [02:00:56] Ryan_Lane, should I document that now btw? [02:01:15] the restart code? [02:01:23] I'll document it on the trebuchet page when I merge it in [02:01:39] I'll probably do so tomorrow [02:02:05] k, added a note about it soon being available at https://wikitech.wikimedia.org/wiki/Parsoid#Misc_stuff [02:02:53] cool [02:03:07] when I merge it in, I can also push in a change to your repos and see if that's what you wanted [02:04:42] we'll do our next deploy on Wed otherwise [02:05:10] * Ryan_Lane nods [02:13:51] mini-dinstall looks like an interesting alternative to reprepo [02:15:52] !log LocalisationUpdate completed (1.23wmf5) at Tue Dec 10 02:15:52 UTC 2013 [02:16:08] Logged the message, Master [02:16:10] paravoid: ^ I think the LocalisationUpdate run just fixed "Spezial:Zentrale_automatische_Anmeldung/createSession". [02:29:36] !log LocalisationUpdate completed (1.23wmf6) at Tue Dec 10 02:29:36 UTC 2013 [02:29:51] Logged the message, Master [02:30:34] (03PS2) 10Ryan Lane: Add a deploy.restart runner and module call [operations/puppet] - 10https://gerrit.wikimedia.org/r/100509 [02:30:35] (03PS1) 10Ryan Lane: Manual restart for parsoid [operations/puppet] - 10https://gerrit.wikimedia.org/r/100526 [02:32:34] gwicke: for my changes to really matter at all, the upstart needs to be available for parsoid [02:32:58] so if you're depending on this change for any reason, you guys may want to look at getting the upstart in :) [02:54:03] (03PS1) 10Dzahn: fix links in uncyclomedia tables [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/100528 [02:54:04] (03CR) 10Dzahn: [C: 032] fix links in uncyclomedia tables [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/100528 (owner: 10Dzahn) [03:15:08] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 10 03:15:08 UTC 2013 [03:15:25] Logged the message, Master [03:24:59] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [03:30:48] (03PS1) 10Springle: depool db1049 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100529 [03:31:23] (03CR) 10Springle: [C: 032] depool db1049 for upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100529 (owner: 10Springle) [03:32:22] !log springle synchronized wmf-config/db-eqiad.php 'depool db1049 for upgrade' [03:32:38] Logged the message, Master [03:42:20] (03CR) 10MZMcBride: "Blergh." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/98002 (owner: 10John F. Lewis) [04:15:31] (03CR) 10Mattflaschen: "Please change to 118, per discussion starting at https://bugzilla.wikimedia.org/show_bug.cgi?id=57315#c40 , and also mention it under "Rec" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97675 (owner: 10MZMcBride) [04:16:19] (03PS1) 10Springle: repool db1049 after upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100530 [04:16:42] (03CR) 10Springle: [C: 032] repool db1049 after upgrade [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100530 (owner: 10Springle) [04:17:46] !log springle synchronized wmf-config/db-eqiad.php 'repool db1049 after upgrade' [04:18:02] Logged the message, Master [04:35:49] Why am I getting 51+ minute database rep lag messages? [04:47:49] T13|sleeps: Where? [04:57:07] It's resolved itself. [04:57:57] Was on enwiki and I'm assuming it had to do with db1049 repool [05:24:55] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [05:25:21] (03PS1) 10Dzahn: push up to version 2.5 [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/100535 [05:28:55] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (206870) [05:29:04] ori-l: mwalker|away: I can't remember exactly what I said about making the repo. but I do remember protesting when bd808|BUFFER wanted to put scholarships app conf in apache conf (via env vars) [05:30:49] (03PS2) 10Dzahn: push up to version 2.5 [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/100535 [05:32:13] (03CR) 10Dzahn: [C: 032] push up to version 2.5 [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/100535 (owner: 10Dzahn) [05:33:46] (03PS1) 10Springle: dedicate db1049 to specific query types as per groupLoadsByDB [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100536 [05:34:16] (03CR) 10Springle: [C: 032] dedicate db1049 to specific query types as per groupLoadsByDB [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100536 (owner: 10Springle) [05:35:34] !log springle synchronized wmf-config/db-eqiad.php 'db1049 to LB=0 except groupLoadsByDB' [05:35:50] Logged the message, Master [05:40:54] (03CR) 10Dzahn: [C: 032] icinga: raise timeout of check_job_queue nrpe command [operations/puppet] - 10https://gerrit.wikimedia.org/r/99411 (owner: 10Hashar) [05:43:16] (03CR) 10Dzahn: [C: 032] remove outdated tesla subnet from dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/96489 (owner: 10Dzahn) [05:44:24] (03CR) 10MZMcBride: "Ready now?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/97331 (owner: 10Dereckson) [05:46:52] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [05:57:39] (03CR) 10Dzahn: "from neon:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99410 (owner: 10Hashar) [06:12:35] (03CR) 10Dzahn: "worked. it's 30 now. puppet_services.cfg.. nrpe_check!check_check_job_queue!30" [operations/puppet] - 10https://gerrit.wikimedia.org/r/99411 (owner: 10Hashar) [06:24:01] (03CR) 10Dzahn: "thanks, it's fixed now. keep 'em coming." [operations/debs/wikistats] - 10https://gerrit.wikimedia.org/r/96018 (owner: 10Jack Phoenix) [07:31:24] (03PS1) 10Springle: explicitly direct each updateSpecialPages class by name. QueryPage::recache vslow seems useless? [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100544 [07:32:34] (03CR) 10Springle: [C: 032] explicitly direct each updateSpecialPages class by name. QueryPage::recache vslow seems useless? [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100544 (owner: 10Springle) [07:33:42] !log springle synchronized wmf-config/db-eqiad.php 'explicit LB for each updateSpecialPages job' [07:33:59] Logged the message, Master [07:56:55] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201563) [07:59:00] springle, there was an enwiki labsdb replication breakage report earlier, haven't double-checked it yet: http://lists.wikimedia.org/pipermail/labs-l/2013-December/001942.html [08:01:07] Eloquence: hmm not sure. enwiki -> labs running and no lag atm [08:01:28] we had dewiki problems recently (but it's ok now, too) [08:02:12] *nod* will ask the user to provide details if it's still an issue. [08:10:03] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [08:12:53] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (204287) [08:37:15] (03CR) 10Expi1: "Yeah, that's what I tried to do, since I could couldn't find a original high resolution image. I've been struggling to get the current ico" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [08:59:55] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [09:03:27] (03CR) 10Odder: "I think that the general idea that lay behind the original favicon is that grey text would not be visible on a tab, which is the most prom" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100326 (owner: 10Gerrit Patch Uploader) [09:03:58] springle: i aplogize for the bug i caused yesterday. can you please explain why it happened? [09:05:41] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (207043) [09:19:34] matanya: generic::systemuser appears to require an explicit name => 'blah' [09:19:41] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [09:20:01] springle: this is weird, and i don't understand why [09:21:31] nor i really :) puppet run failed on the boxes with a message about missing array index. so i went looking at other calls to generic::systemuser, and they all listed name => 'blah'. then it worked... [09:25:41] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201478) [09:27:22] matanya: based on a very cursory scan of puppet docs, declaring the instance of generic::systemuser { 'blah' makes $title=blah, not $name. i guess generic::systemuser needs to default $name to $title somehow [09:27:52] thanks springle i'll try to debug this later [09:28:41] RECOVERY - check_job_queue on terbium is OK: JOBQUEUE OK - all job queues below 200,000 [09:29:45] (03PS1) 10Dan-nl: beta: gwtoolset-whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/100547 [09:35:21] apergos: morning :-) Do you have any idea how much disk space we have in Swift ? [09:35:29] morning [09:35:39] no. let's see what ganglia says [09:35:44] orrr [09:35:45] PROBLEM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , Total (201302) [09:35:49] there is a GLAM related extension that is going to be deployed this week that would let folks bulk import materials from various museum [09:36:05] I thought it might be worth a mail to ops to warn you guys [09:37:45]