[00:00:02] We dont' want people to have to run it locally, instead we publish it from doc.wikimedia.org [00:00:11] re-built nightly or postmerge. [00:00:15] just like doxygen. [00:00:20] but for the js code base. [00:00:26] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54619 [00:00:50] paravoid: any additional comments on https://gerrit.wikimedia.org/r/#/c/54324/ ? [00:01:25] My point is that we need to figure out a way to make it work. I'm not saying it has to be as-is, unacceptable is unacceptable. But we do need to have clear points on what needs to change for it to become acceptable. [00:02:00] To run from shell a few times day. Note that this doesn't run publicly (I mean, it doesn't serve http requests. It's output is static html/css which is then rsynced to doc.wikimedia.org) [00:02:00] remove the dependency of using a javascript interpreter from within ruby [00:02:40] paravoid: jsduck might be able to improve the dependency mess, but I don't think it makes sense for it to not use v8/esprisma. That's the best there is. [00:02:52] It needs to parse the js files in order to extract the documentatino. [00:03:12] What would you replace v8 and/or esprisma with? [00:03:17] ori-l: see my comments ~1h ago [00:04:36] paravoid: er, which? i changed /var/eventlogging -> /srv/eventlogging and parametrized the rsync targets [00:04:46] PROBLEM - Puppet freshness on sq79 is CRITICAL: Puppet has not run in the last 10 hours [00:04:52] 01:03 < paravoid> ori-l: so, we can't have puppet run code from gerrit as root [00:04:56] 01:04 < paravoid> puppet pulling from git is so and so, running setup.py is just not acceptable from a security PoV [00:06:39] then Ryan mentioned something about git-deploy [00:06:43] see above :) [00:06:47] PROBLEM - Puppet freshness on colby is CRITICAL: Puppet has not run in the last 10 hours [00:06:47] RECOVERY - MySQL Slave Delay on db71 is OK: OK replication delay 0 seconds [00:07:06] I just mentioned that if you wanted to add another repo, I'd help add it [00:07:34] add it to what? what is using it currently? and how would deployment look? [00:08:01] !log asher synchronized wmf-config/db-pmtpa.php 'returning servers' [00:08:20] Logged the message, Master [00:08:21] we could also do deb but I'm guessing this would be too slow/impractical [00:08:49] yes [00:10:39] Krinkle: don't use ruby in the first place? :) [00:10:46] RECOVERY - MySQL Slave Delay on db69 is OK: OK replication delay 12 seconds [00:10:47] sorry, I don't have good answers for you [00:11:12] paravoid: I didn't write the software, and unless you're volunteering for a port, it's unlikely to change. [00:11:17] I'm not [00:11:21] :-/ [00:11:31] j/k [00:11:39] New patchset: Dzahn; "turn planet into a puppet module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54493 [00:11:54] but yeah, it's annoying when you've been using great software and find out it's a reversed pyramid ending in a big shit hole. [00:11:55] paravoid: this is a little crazy. jsduck is not krinkle's hobby-horse; it's what we're using for javascript documentation. [00:12:36] i'm not saying that this is grounds for pushing it through as-is, but " don't use ruby in the first place? :)" is a little dismissive [00:12:39] paravoid: So, just to understand, what's the problem with therubyracer? [00:12:46] I said I don't have good answers :) [00:13:14] I know embedding an big upstream sounds wrong (as it'd be installed multiple times potentially in different parts of the filesystem), but what is the problem in practice? [00:13:27] it's not just that [00:13:33] it's non-trivial to package [00:13:34] I tried. [00:13:48] given that we don't fully control the environment and even if we'd fork it, it's non-trivial to refactor such a construction [00:13:57] paravoid: Hm.. How so? [00:14:03] What do you use to package btw? [00:14:09] gem2deb [00:14:14] I assume you did do git submodule update --init, first, right ? [00:14:25] I'm not fetching from git [00:14:45] it also embeds scons mind you [00:14:50] a python-based build system [00:14:57] What are you runnin gem2deb on for jsduck/therubyracer? [00:15:03] ? [00:15:07] gem2deb jsduck [00:15:08] if not a git checkout [00:15:18] gem2deb runs gem fetch [00:15:23] paravoid: yes, but that doesn't include dependencies, right ? [00:15:27] no [00:15:42] Ryan_Lane: can I pm re: git-deploy? [00:15:42] I see, so then you run it for therubyracer [00:15:46] yes [00:15:47] try it [00:16:18] it needs jeweler btw [00:16:21] so gem2deb that too [00:16:33] then libv8's source [00:16:37] and work around scons [00:17:12] * Krinkle likes npm / node_modules system [00:17:14] that "just" works. [00:17:22] jeweler is also full of require 'rubygems' [00:17:25] and is uniformed / standardised. [00:17:35] gem is also uniformed/standardized [00:17:39] ori-l: pms are evil [00:17:42] doesn't mean that gems aren't crap [00:17:48] let's discuss in-channel [00:17:49] not all of them [00:17:57] Ryan_Lane: ok, I just thought it would be disruptive to have two concurrent discussions; no other reason. [00:18:11] it's a public channel, we're used to it ;) [00:18:30] ori-l: you're just deploying code, right? no binaries? [00:18:30] paravoid: well, apparently it isn't as easy as just "gem install --install-dir ~/gem_install --bindir ~/bin" [00:18:48] paravoid: Or is it? If that includes everything in the local gem_install directory, wouldn't that do it? [00:18:56] Ryan_Lane: executable scripts but no binaries [00:19:01] tar that directory and take it? [00:19:04] that should be fine [00:19:14] ewwwww [00:19:26] ori-l: does it need to restart any services after deployment? [00:19:38] Krinkle: so what's the plan? [00:19:59] paravoid: I don't know, I'm not experienced in this area. [00:20:04] how would jenkins bep protected from running untrusted/unverified javascipt code? [00:20:16] Ryan_Lane: yes and no. services would need to be restarted, but this requires some advanced notification, so i'd rather make that manual [00:20:31] paravoid: First of all, it runs on merge, not on test. So the code is reviewed, passes unit tests and is merged. [00:20:31] ori-l: well, it's possible to do it automatically [00:20:45] ori-l: what systems would this be deployed to? [00:20:53] paravoid: secondly, afaik the only code executed in v8 is esprisma, not the mediawiki js code [00:21:05] esprisma reads the js files as strings and parses them into a tree, it doesn't execute it. [00:21:13] Ryan_Lane: just vanadium [00:21:32] so wait [00:21:41] It's not as bad as you might think. [00:21:42] this wants to use a javascript program [00:21:48] (esprima) [00:22:02] so they way they do it is by embedding a javascript interpreter inside ruby? [00:22:14] am I the only one that thinks this is completely crazy? [00:22:33] !log adding dns entries for labstore systems in eqiad [00:22:47] Logged the message, Master [00:23:04] The way I'd do it is shell out to node / esprisma and read the JSON response from stdout and work with that. That'd be somewhat saner than trying to do it inside ruby [00:23:06] paravoid: it'd be like generating dynamic content within tags in html! [00:23:08] But I didn't write it. [00:23:37] paravoid: but point is, it is a very ugly detail, but is it a blocking problem? [00:23:40] ori-l: I'd say it'd be more like bash embedding perl, python & ruby to be able to run perl/python/ruby scripts [00:23:46] paravoid: btw, what issues do you get when you try to package it? [00:24:18] I'm sorry, I just think it's crazy [00:24:26] and I don't want to see that in production [00:24:27] even if it built [00:24:35] Yes, we all do. But let's at least try to work towards a solution. [00:24:36] Ryan_Lane: ok, automated is fine. I just want to establish some kind of baseline [00:24:46] ori-l: ok [00:24:50] it's not required [00:24:52] just available [00:24:57] paravoid: see what precisely in production? [00:25:10] paravoid: v8 itself? [00:25:22] Ryan_Lane: then let's leave it manual. [00:25:24] otherwise you're going to need a root to restart, and if it restarts with deployed code that isn't the same as what's installed, you're just waiting for things to break weirdly [00:25:37] Krinkle: patch lib/jsduck/esprima.rb to shell out to a javascript interpreter [00:25:49] Krinkle: and do that conditionally if the v8 gem isn't found [00:26:11] Ryan_Lane: i have sudo on vanadium, so that's tenable [00:26:17] * Ryan_Lane nods [00:26:17] ok [00:26:26] the v8 stuff is three lines of code from what I can see [00:26:26] paravoid: .. which is when therubyracer is not installed, correct? [00:26:39] I guess [00:26:44] git-deploy only deploys code underneath /srv/deployment/, btw [00:26:47] the first line says 'require "v8"' [00:26:53] ori-l: will that be a problem? [00:26:57] it won't install to random spots [00:27:03] so it's probably inconsistently named [00:27:11] Ryan_Lane: that's totally fine [00:27:13] this is done for sanity reasons [00:27:14] ok [00:27:24] right, require 'v8' is for therubyracer [00:27:29] that's what its docs suggest too [00:27:31] so yes [00:27:37] ori-l: want to do this now? [00:27:49] Ryan_Lane: i would looooooove to :) [00:27:59] is your configuration in the same repo? [00:28:04] or do you need two different repos? [00:28:16] config will go in puppet [00:28:18] ok [00:28:22] that makes things easier [00:28:22] paravoid: ack-ing confirms v8 is indeed only used in that one file [00:28:31] and that file is trivial [00:28:37] json = @v8.eval("JSON.stringify(esprima.parse(js, {comment: true, range: true, raw: true}))") [00:28:39] ori-l: does the config change very infrequently? [00:28:40] that's basically it [00:28:44] return Util::Json.parse(json, :max_nesting => false) [00:29:00] so just have this shell out to node or whatever [00:29:22] Ryan_Lane: yeah; it saw a lot of changes until it settled about a month ago and hasn't changed since. i don't expect it to need to change. [00:29:33] often, that is. [00:29:37] ok. we may eventually want to move that to the deployment system [00:29:42] but for now let's ignore it [00:29:58] New patchset: Pyoungmeister; "WIP: first bit of stuff for taming the mysql module and making the SANITARIUM" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53907 [00:30:03] sure [00:30:06] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [00:30:11] paravoid: yeah. Basically exec(node esprisma/esprisma.js) write contents of string js to stdin. [00:30:13] and read stdout. [00:30:23] ori-l: everything I'm about to do is going to be in puppet [00:30:47] Ryan_Lane: does it make sense to update this https://gerrit.wikimedia.org/r/#/c/54324/ ? [00:31:10] to add what? [00:31:12] Krinkle: node or spidermonkey [00:31:17] paravoid: right [00:31:20] Krinkle: I think execjs abstracts all that [00:31:29] might be too complicated for your case though [00:31:36] and I'd have to try building that too anyway :) [00:31:40] oh, to remove setup.py? [00:31:41] yes [00:31:55] should the initial git::clone still be there? [00:31:59] no [00:32:02] paravoid: Perhaps, you'd like to file an issue against jsduck to suggest this? I can do it, but I feel I'm not sure my words will have as much value since I'm not sure how to make sure there is no answer like "just install therubyracer" [00:32:13] this will init a repo as well [00:32:23] Ryan_Lane: ok, i'll update the patch, sec [00:32:31] I'm going to add the changes to manifests/role/deployment.pp [00:32:33] e.g. explain that it is horrible to package and using evil embedding of all of v8 [00:32:36] then run puppet on sockpuppet [00:32:48] then run a state on varnadium [00:33:10] there's some annoying bootstrapping for new repos [00:33:10] paravoid: ah, ExecJS is a ruby lib [00:33:14] I'll walk you through it [00:33:25] and I'll fix it when I start working on the deployment system again :) [00:33:45] Ryan_Lane: sure; I'm up on 6th. would it help to come down and coordinate in person? [00:33:46] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 187 seconds [00:33:53] ori-l: oh. yes [00:33:57] that'll make things easier [00:34:02] ok, i'll be right there [00:34:06] it would be good if we kept notes [00:34:10] so that we can document this [00:34:11] paravoid: looks like ruby–execjs only depends on ruby-multi_json which has no further dependencies. [00:34:16] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 189 seconds [00:35:02] irb(main):005:0> require 'execjs' [00:35:02] => true [00:35:05] I packaged execjs [00:35:18] I'd say provide them a patch to use execjs instead of therubyracer [00:35:19] nice [00:35:27] that would make it more portable [00:35:27] to e.g. jruby [00:35:29] or windows [00:35:47] paravoid: this is why everyone pings you on debian packages. because you're awesome at this :) [00:36:07] irb(main):005:0> require 'execjs' [00:36:08] => true [00:36:08] irb(main):006:0> ExecJS.eval "'red yellow blue'.split(' ')" [00:36:08] => ["red", "yellow", "blue"] [00:36:42] source = open("https://evil.github.com/random.js").read [00:36:42] context = ExecJS.compile(source) [00:36:42] :D [00:36:48] nothing is perfect [00:37:04] people will always do stupid things [00:47:24] paravoid: Interesting, I see you've got @debian.org (via GitHub profile). That explains a few things. [00:47:33] haha [00:47:36] Awesome [00:50:14] https://github.com/senchalabs/jsduck/issues/166 [00:50:15] hahaha [00:50:21] it looks I wasn't the first to ask for that [00:50:27] I love the reply [00:52:05] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [00:52:25] paravoid: "done" [00:52:26] wtf [00:52:36] I told you upstream is crap [00:52:37] you didn't believe me [00:52:59] Oh, I know, got enough bad experiences with upstream. [00:53:19] I'm not surprised if a pull request or issue report is unanswered for several months [00:53:26] certain projects are just abandonded [00:53:29] not this one though [00:54:18] paravoid: So ahm, you're fairly certain that this will be harmless? https://github.com/Krinkle/jsduck/compare/require-rubygems [00:55:02] New patchset: Dzahn; "doc.mediawiki.org: Redirect to canonical wikimedia.org and fix invalid SSL" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54614 [00:56:06] paravoid: actually [00:56:07] paravoid: https://github.com/senchalabs/jsduck/commit/b49e003be90e986cb1cf91c671ffcb617f22a4b2 [00:56:11] "Switch back from using ExecJS to using v8 directly." [00:56:14] He did do it [00:56:18] hahahaha [00:56:27] awesome [00:56:31] As we no more support JRuby, we can require therubyracer as a dependency. [00:56:41] so you seriously want us to use this crappy piece of software? [00:56:55] well, jsduck itself is pretty awesome actually [00:57:16] http://integration.wmflabs.org/mw/extensions/VisualEditor/docs/ [00:57:58] http://integration.wmflabs.org/mw/extensions/VisualEditor/docs/#!/api/ve.init.mw.Platform-method-addMessages [00:57:59] etc. [00:58:46] PROBLEM - Puppet freshness on europium is CRITICAL: Puppet has not run in the last 10 hours [00:59:42] New patchset: Ryan Lane; "Add eventlogging repo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54622 [01:00:07] \o/ [01:01:13] paravoid: At least it should be fairly simple now to patch locally [01:01:14] https://github.com/senchalabs/jsduck/commit/b49e003be90e986cb1cf91c671ffcb617f22a4b2 [01:01:20] I'm doing it now. [01:01:35] Thanks! [01:03:17] is jenkins not working for the puppet repo right now? [01:05:14] Ryan_Lane: Looks like it's running https://integration.wikimedia.org/zuul/status [01:05:40] but it's not reporting to gerrit yet, it's still in zuul's buffer [01:05:46] weird [01:06:04] I blame jsduck [01:06:04] quite a large buffer, more than it should be [01:06:15] ;) [01:06:15] quack? It's not even in production [01:06:46] Krinkle: I'm just kidding with you [01:07:14] I've scheduled a graceful reload of zuul in case it has a problem [01:07:55] nothing unusual in the log [01:07:59] it's working but not reporting [01:08:03] I blame gerrit :P [01:10:09] ori-l: how do you figure gerrit compromise == puppet private compromise? [01:11:12] !@!$@^%!$@$ ruby hell [01:12:08] Krinkle: btw, if making packages is my team's job, then patching jsduck to use execjs is your team's :) [01:12:12] I'm done now [01:12:21] but just saying, don't expect many ops people to go as far as I did :) [01:12:24] New patchset: Reedy; "(bug 39225) Configure $wgBabelCategoryNames and $wgBabelMainCategory for pl.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54144 [01:12:30] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54144 [01:13:04] paravoid: Sure, I was prepared to take part of tomorrow to patch jsduck to not have rubygem require and use execjs. That I'm prepared to do I think. [01:13:32] paravoid: Thanks a lot and I owe you one :) [01:13:54] (64sid)gearloose:~/wikimedia/gems# jsduck [01:13:55] Error: You should specify some input files, otherwise there's nothing I can do :( [01:14:00] any files to test? [01:14:46] That output looks familiar [01:14:49] yes, coming up [01:15:31] New patchset: Reedy; "Set babel main category for minwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54623 [01:15:32] paravoid: Got mediawiki core available? jsduck --config=mediawiki/core/maintenance/jsduck/config.json [01:15:38] I don't, not on that box [01:15:56] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54623 [01:16:29] paravoid: Just create a file in your favourite editor and paste https://raw.github.com/wikimedia/mediawiki-extensions-VisualEditor/master/.docs/external.js into it [01:16:30] (64sid)gearloose:~/wikimedia/gems# ls *deb [01:16:30] ruby-dimensions_1.2.0-1_all.deb ruby-execjs_1.4.0-1_all.deb ruby-jsduck_4.6.2-1_all.deb ruby-parallel_0.6.2-1_all.deb [01:16:34] fwiw [01:16:35] (contents of that url rather) [01:16:37] New patchset: Reedy; "(bug 45911) Set $wgCategoryCollation to 'uca-pt' for the Portuguese Wikipedia and Wikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52903 [01:16:40] that'll produce some sample lines [01:16:59] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52903 [01:17:23] Error: Oh noes! The template directory does not contain extjs/ directory :( [01:17:26] Error: Please copy ExtJS over to template/extjs or create symlink. [01:17:30] damn [01:18:28] !log reedy synchronized wmf-config/InitialiseSettings.php [01:18:41] Logged the message, Master [01:19:02] paravoid: you shouldn't need extjs afaik [01:19:45] [pid 21035] stat("/usr/lib/ruby/template-min/extjs", 0x7fffb1617690) = -1 ENOENT (No such file or directory) [01:19:46] Oh, I see. Yes, the generated html pages use extjs. [01:19:48] !@%!#%@! [01:19:49] seriously? [01:19:53] It's provided though, right? [01:19:56] /usr/lib/ruby/template-min? [01:20:08] Omg, that's horrible. [01:20:25] so there's a basic template I see [01:21:06] @root_dir = File.dirname(File.dirname(File.dirname(__FILE__))) [01:21:09] @template_dir = @root_dir + "/template-min" [01:21:22] view-source:http://integration.wmflabs.org/mw/extensions/VisualEditor/docs/ it references ext-all.js there [01:21:25] I wonder where it got it from [01:21:47] PROBLEM - Varnish traffic logger on cp1021 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:22:44] Warning: /root/wikimedia/gems/external.js:1: Unsupported tag: @source [01:22:47] Warning: /root/wikimedia/gems/external.js:6: Unsupported tag: @source [01:22:50] Warning: /root/wikimedia/gems/external.js:11: Unsupported tag: @source [01:22:55] paravoid: That's okay, I was counting on that [01:23:05] okay, now I have to fix the template_dir mess [01:23:09] I gave you a file that doesn't work by default, so I know it parses it properly and triggers warnings for those unknown tags. [01:24:14] paravoid: Ah, found it. Its in the Rakefile [01:24:24] A one-time fetch of a stable version of extjs [01:24:33] I don't need that [01:24:35] it's packaged [01:24:45] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [01:24:46] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [01:24:46] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [01:24:46] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [01:24:46] Right, I figured you ran rake. [01:24:52] I didn't :) [01:25:00] but it's packaged. [01:25:11] extjs is packaged [01:25:16] libjs-extjs [01:25:20] Or you mean, ti'll do it when Debian is installed it'll run this? [01:25:22] Ah, right. [01:25:26] even better. [01:28:38] paravoid: btw, when this package is done.I'll take it apart is a learning course. Probably won't ask any questions, but it'll be very fruitful. despite the horrors we encountered, its a case that seems to have hit pretty much every bump one can reasonably and unreasonably expect in the wild. [01:28:52] s/is a learning curve/as a learning curve [01:28:57] PROBLEM - Varnish traffic logger on cp1034 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:30:46] RECOVERY - Varnish traffic logger on cp1021 is OK: PROCS OK: 3 processes with command name varnishncsa [01:32:49] done [01:33:07] fixed js-classes too while I was there [01:33:29] paravoid: This may be a better example: https://raw.github.com/Krinkle/jquery-visibleText/master/jquery.visibleText.js [01:33:43] When ran like `jsduck --output='docs/' --external='jQuery' -- jquery.visibleText.js` it should produce something like http://cl.ly/image/3b0h2m1r1D0U [01:34:25] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 184 seconds [01:35:18] New patchset: Reedy; "(bug 45968) Set $wgCategoryCollation to 'uca-pl' on Polish Wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54352 [01:35:26] now to push all those to git... [01:35:37] meh, tomorrow [01:35:44] it's getting close to 4am, so... [01:36:03] paravoid: Where are you located? [01:36:06] athens [01:36:06] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 207 seconds [01:36:11] Ah [01:36:12] (greece, not georgia) [01:36:34] Enschede, Netherlands [01:37:11] late for you too then [01:37:59] true, though not for my body, only for my clock. [01:38:07] I pretty much always work the night shift. [01:38:24] start and get up late too. [01:38:56] Ryan_Lane: I wonder what you'd do :) [01:39:04] I think you would have puked on your keyboard a dozen times by now [01:39:11] hahahaha [01:39:12] yes [01:39:13] I would have [01:39:24] eventlogging is now deployed via git-deploy [01:39:38] the server piece on vanadium, that is [01:39:38] cool [01:39:42] I brought you a new client :) [01:39:46] heh [01:41:22] I need to put some work into the puppet side of this. a lot of unnecessary configuration [01:41:25] Ryan_Lane: Could you create an empty repo operations/debs/jsduck as placeholder for now? [01:41:39] (assuming that's where it'll go, right paravoid?) [01:41:41] no [01:41:51] ow.. [01:41:54] ruby-jsduck [01:42:03] and ruby-execjs, ruby-parallel, ruby-dimensions [01:42:21] Ah, I see. You're separating them at the package level. that makes sense [01:42:22] I'm trying to debug why it doesn't work with spidermonkey now [01:42:26] it worked with node [01:43:34] paravoid: are you adding a dependency on either on the debian package for jsduck or execjs, or will it remain "detecting" all the way, meaning we need to ensure nodejs on the same servers. [01:43:47] a dependency for nodejs that is [01:43:57] I'll add a recommends: nodejs [01:44:07] I'd like to make that nodejs | spidermonkey-bin though [01:44:20] Cool, debian has recommended dependencies. That's nice. [01:44:38] ah, it's deprecated [01:44:39] paravoid: is there no generic name for that? [01:44:43] for what? [01:44:56] javascript runtime [01:44:59] like java has [01:45:23] there's no generic javascript runtime because there's not a compatible interface in all [01:45:31] ah. right. [01:45:50] execjs tries "nodejs", "node", "js", etc. [01:45:53] /System/Library/Frameworks/JavaScriptCore.framework/Versions/A/Resources/jsc too :) [01:46:01] and "cscript //E:jscript //Nologo //U", [01:46:03] ok. heading home [01:46:05] * Ryan_Lane waves [01:46:21] yeah, the user simply has to make sure he has at least one of them installed [01:46:28] a case of RTFM :) [01:46:43] it also supports therubyracer, therubyrhino, mustang, johnson [01:46:45] beyond apt-get install [01:46:53] mustang even? [01:46:55] so I think it's very stupid for upstream to just revert to v8 [01:47:19] yep [01:47:22] it's also much cleaner for us with node :) [01:49:23] indeed [01:49:29] actually, nodejs is already ensured on gallium [01:49:35] I'd guess [01:49:50] we use grunt and jshint (both js libraries also) [01:50:46] PROBLEM - Varnish traffic logger on cp1025 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:50:46] I'd suggest to file a bug with jsduck requesting execjs to be readded [01:50:50] only nodejs itself is in puppet though, grunt and jshint are installed locally in the job repository and executed as "node file/path/to/that/lib.js" since they update so often. [01:51:00] paravoid: Already did an hour ago [01:51:00] https://github.com/senchalabs/jsduck/issues/339 [01:51:06] oh cool [01:51:39] I added jruby mention but removed it again since the repo owner doesn't seem to care about it (stopped supporting it) [01:51:57] did do an anti-mention in a second comment though [01:52:46] RECOVERY - Varnish traffic logger on cp1025 is OK: PROCS OK: 3 processes with command name varnishncsa [01:54:49] okay [01:54:50] paravoid: Lets pick this up tomorrow, it is relatively high up my personal todo list, but certainly not within 24 hours. A few days or even a few weeks wouldn't be the end of the world. Thanks a lot for everything so far. [01:54:51] sleep now [01:54:57] heh [01:54:58] nn :) [01:55:06] yeah I'll do the git inits, commits & pushes tomorrow [01:55:09] and apt [01:55:15] hopefully tomorrow [01:55:25] let me know if i can help with anything (testing) [01:55:39] thanks. [01:56:12] taking an hour break myself. [01:56:15] bbl [02:00:46] PROBLEM - Puppet freshness on cp1036 is CRITICAL: Puppet has not run in the last 10 hours [02:01:15] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:04:45] PROBLEM - Puppet freshness on sq73 is CRITICAL: Puppet has not run in the last 10 hours [02:06:46] PROBLEM - Puppet freshness on mc1002 is CRITICAL: Puppet has not run in the last 10 hours [02:06:46] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [02:09:47] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 30 seconds [02:10:25] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 1 seconds [02:11:06] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 181 seconds [02:11:28] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 188 seconds [02:13:26] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 192 seconds [02:15:06] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 197 seconds [02:16:05] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:18:05] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [02:20:10] RobH: Were you able to sync the updated redirects.conf? [02:20:27] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [02:21:07] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [02:21:24] Hmm. [02:21:44] Never mind, I see now. [02:23:18] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [02:29:02] !log LocalisationUpdate completed (1.21wmf11) at Tue Mar 19 02:29:02 UTC 2013 [02:29:16] Logged the message, Master [02:33:56] PROBLEM - Varnish traffic logger on cp1025 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:37:45] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [02:37:45] PROBLEM - Puppet freshness on db35 is CRITICAL: Puppet has not run in the last 10 hours [02:38:45] PROBLEM - Puppet freshness on db56 is CRITICAL: Puppet has not run in the last 10 hours [02:50:19] !log LocalisationUpdate completed (1.21wmf12) at Tue Mar 19 02:50:19 UTC 2013 [02:50:38] Logged the message, Master [02:51:55] PROBLEM - MySQL Slave Delay on db66 is CRITICAL: CRIT replication delay 193 seconds [02:52:29] PROBLEM - MySQL Replication Heartbeat on db66 is CRITICAL: CRIT replication delay 223 seconds [02:55:57] RECOVERY - Varnish traffic logger on cp1025 is OK: PROCS OK: 3 processes with command name varnishncsa [03:03:05] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 183 seconds [03:03:26] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 195 seconds [03:32:28] RECOVERY - MySQL Replication Heartbeat on db66 is OK: OK replication delay 0 seconds [03:32:55] RECOVERY - MySQL Slave Delay on db66 is OK: OK replication delay 0 seconds [03:49:44] New patchset: Reedy; "Enable NS_TEMPLATE subpages on wikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54630 [03:50:14] !log reedy synchronized wmf-config/InitialiseSettings.php [03:50:37] Logged the message, Master [03:51:10] New patchset: Reedy; "Enable NS_TEMPLATE subpages on wikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54630 [03:51:41] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54630 [03:51:41] !log reedy synchronized wmf-config/InitialiseSettings.php [03:52:02] Logged the message, Master [04:11:05] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [04:11:25] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [04:30:45] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [04:38:55] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 199 seconds [04:39:27] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 220 seconds [04:46:06] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [04:46:25] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [05:34:46] RECOVERY - Puppet freshness on mc1008 is OK: puppet ran at Tue Mar 19 05:34:38 UTC 2013 [06:05:46] PROBLEM - Puppet freshness on search26 is CRITICAL: Puppet has not run in the last 10 hours [06:07:45] PROBLEM - Puppet freshness on db43 is CRITICAL: Puppet has not run in the last 10 hours [06:07:45] PROBLEM - Puppet freshness on mw1014 is CRITICAL: Puppet has not run in the last 10 hours [06:07:46] PROBLEM - Puppet freshness on mw1115 is CRITICAL: Puppet has not run in the last 10 hours [06:07:46] PROBLEM - Puppet freshness on mw121 is CRITICAL: Puppet has not run in the last 10 hours [06:16:07] PROBLEM - LVS HTTPS IPv4 on wikibooks-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:16:08] PROBLEM - LVS HTTP IPv6 on wikimedia-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:16:08] PROBLEM - LVS HTTP IPv6 on foundation-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:16:08] PROBLEM - LVS HTTPS IPv4 on wikiquote-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:16:38] Yeah, I'm having connection issues alright [06:16:55] RECOVERY - LVS HTTPS IPv4 on wikiquote-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 1.889 second response time [06:16:56] RECOVERY - LVS HTTP IPv6 on foundation-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 3.032 second response time [06:16:56] RECOVERY - LVS HTTPS IPv4 on wikibooks-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 3.285 second response time [06:16:56] PROBLEM - LVS HTTP IPv6 on wiktionary-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:16:56] PROBLEM - LVS HTTPS IPv4 on mediawiki-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:17:25] PROBLEM - LVS HTTP IPv4 on wikimedia-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:17:48] PROBLEM - LVS HTTPS IPv4 on wikimedia-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [06:17:59] RECOVERY - LVS HTTPS IPv4 on mediawiki-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 3.060 second response time [06:18:10] PROBLEM - LVS HTTPS IPv6 on wikibooks-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:18:19] PROBLEM - LVS HTTPS IPv6 on foundation-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:18:33] eep [06:18:59] RECOVERY - LVS HTTPS IPv6 on wikibooks-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 7.462 second response time [06:19:00] and good morning to you too icinga [06:19:09] RECOVERY - LVS HTTPS IPv6 on foundation-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 0.671 second response time [06:19:18] i'm checking routers [06:19:19] PROBLEM - LVS HTTP IPv6 on wikisource-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:19:22] thank you [06:19:57] it looks like pybal lost connection with all of the appservers [06:20:09] RECOVERY - LVS HTTP IPv6 on wikisource-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 0.028 second response time [06:20:20] RECOVERY - LVS HTTP IPv4 on wikimedia-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 95678 bytes in 1.038 second response time [06:20:22] PROBLEM - LVS HTTPS IPv4 on wikisource-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:21:01] RECOVERY - LVS HTTP IPv6 on wikimedia-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 95684 bytes in 0.042 second response time [06:21:01] RECOVERY - LVS HTTP IPv6 on wiktionary-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 3.648 second response time [06:21:09] PROBLEM - LVS HTTPS IPv4 on mediawiki-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:21:10] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:21:10] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:21:10] PROBLEM - LVS HTTP IPv6 on wikipedia-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:21:47] Ryan_Lane: when? [06:22:00] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 0.265 second response time [06:22:00] RECOVERY - LVS HTTP IPv6 on wikipedia-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 1.229 second response time [06:22:21] sec. was watching pybal log and saw a fairly large number of idle check reconnections [06:22:31] pybal.log is noisy [06:22:36] RECOVERY - LVS HTTPS IPv4 on mediawiki-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 1.699 second response time [06:22:37] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 7.072 second response time [06:22:46] RECOVERY - LVS HTTPS IPv4 on wikisource-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 61239 bytes in 0.041 second response time [06:22:47] RECOVERY - LVS HTTPS IPv4 on wikimedia-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 95683 bytes in 0.056 second response time [06:22:52] yes, but seeing screens full of reconnections is abnormal [06:22:57] so we have a flapping link that no traffic should be going over [06:23:00] however we have mx80's in tampa [06:23:07] especially when we're not seeing timeouts [06:23:17] it's possible that they're not able to keep up with the flaps and causing routing issues [06:23:32] even though the recalculation shouldn't be too heavy [06:23:34] fucking mx80's [06:23:41] Ryan_Lane: I just wanted to confirm it, trying to find the right grep command [06:24:38] this is curious: http://ganglia.wikimedia.org/latest/graph.php?c=LVS%20loadbalancers%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&st=1363673827&g=network_report&z=medium&r=hour [06:24:59] yep [06:25:41] spike on lvs1001 [06:27:33] pity it went away already [06:37:46] PROBLEM - Puppet freshness on cp1034 is CRITICAL: Puppet has not run in the last 10 hours [06:39:00] gerrit-wm seems to cycle a lot lately. [06:40:03] weird http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&title=EventLogging&vl=events+%2F+sec&x=&n=&hreg[]=vanadium.eqiad.wmnet&mreg[]=%5E%28client-generated-raw%7Cserver-generated-raw%7Cvalid-events%29%24>ype=stack&glegend=show&aggregate=1 [06:40:34] ori-l: if people can't get to the site, logs aren't going to hit it, right? [06:41:04] that's a very astute point :) didn't realize we had a full-blown outage. [06:41:26] :) [06:42:36] anyways, mine looks a lot cooler than tim's, which is drab and mathy and lacks that fiery umph [06:43:58] RewriteCond %{HTTP_HOST} \.wikimediafoundation\.org$ [06:44:02] That looks illegal. [06:44:12] Hmmm. [06:44:18] there's no ^ [06:44:44] Right. I'm not sure it's necessary. I see (^|\.) elsewhere in the file. [06:44:57] The underlying bug is that http://bookshelf.wikimedia.org is going to wikipedia.org. [06:45:03] That rule is above it in redirects.conf. [06:45:05] https://noc.wikimedia.org/conf/redirects.conf [06:47:10] ori-l: i think we have a solution to our loggers being overloaded problem ;) [06:47:48] use more than one core on oxygen? [06:47:50] * ori-l ducks. [06:50:39] So it looks like http://bookshelf.wikimedia.org/f works. [06:51:02] I wonder if the plain version ever worked. [06:51:33] okay, no more pages = leslie bedtime [06:51:34] g'night [06:51:50] good night [06:55:05] New patchset: Ori.livneh; "Add 'eventlogging' puppet module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54324 [06:55:11] https://bugzilla.wikimedia.org/show_bug.cgi?id=46307 [06:58:44] ori-l: 19 01:10:09 < jeremyb_> ori-l: how do you figure gerrit compromise == puppet private compromise? [06:59:09] well, '==' isn't strict, so the type coercion will make it compromise == compromise [07:16:29] ori-l: please explain? [07:16:52] ori-l: are you under the impression that the private repo lives on gerrit? (it doesn't) [07:18:35] it was just a lame joke. i'm not interested in speculating out loud about how you could damage infrastructure, though. [07:20:21] ori-l: i was replying to your comment on https://gerrit.wikimedia.org/r/#/c/54324/ [07:20:27] ori-l: that was a joke? i couldn't tell [07:20:49] no, the type coercion thing [07:20:54] right [07:23:12] anyway, i see the setup.py thing was removed [08:06:18] New review: Hashar; "I still have to move all that stuff to the contint module. Meanwhile that will do it :-]" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/54604 [08:08:10] New review: Hashar; "Maybe put a vim modeline at the top to ensure tabs are now expanded by 2 spaces?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54603 [08:13:37] New review: Hashar; "(1 comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/54610 [08:18:50] New review: Hashar; "(11 comments)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/54611 [08:20:18] and hello :-} [08:28:55] PROBLEM - MySQL disk space on blondel is CRITICAL: DISK CRITICAL - free space: /a 0 MB (0% inode=4%): [08:28:55] PROBLEM - Solr on vanadium is CRITICAL: Average request time is 527.2783 (gt 400) [08:29:04] PROBLEM - carbon-cache.py on professor is CRITICAL: PROCS CRITICAL: 0 processes with command name carbon-cache.py [08:29:04] PROBLEM - swift-account-reaper on ms-be12 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [08:29:14] PROBLEM - swift-account-reaper on ms-be11 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [08:36:12] !log zuul : giving results processing priority over enqueued jobs. Cherry picked 263fba9 from upstream. That should resolve Zuul being slow to report back to Gerrit {{bug|46176}} [08:36:20] Logged the message, Master [08:37:17] !log gallium: stopped puppet to deploy Zuul whenever the queue is empty [08:37:23] Logged the message, Master [08:38:02] !log gallium: stopped zuul and upgrading [08:38:09] Logged the message, Master [08:38:53] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 1 process with args zuul-server [08:41:01] !log gallium: restarted puppet and zuul [08:41:08] Logged the message, Master [08:42:55] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 2 processes with args zuul-server [08:43:33] grmblbl [08:43:48] it is lying [08:46:44] !log gallium: killed stalled puppet agents. Locked on apt-get and update-java-alt [08:46:50] Logged the message, Master [08:55:53] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [09:15:55] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 184 seconds [09:15:55] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 184 seconds [09:33:53] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 184 seconds [09:33:53] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 184 seconds [09:35:53] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [09:35:53] RECOVERY - Puppet freshness on colby is OK: puppet ran at Tue Mar 19 09:35:48 UTC 2013 [09:35:53] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [09:36:55] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 14 seconds [09:36:56] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [09:57:53] PROBLEM - Puppet freshness on formey is CRITICAL: Puppet has not run in the last 10 hours [10:04:55] PROBLEM - Puppet freshness on sq79 is CRITICAL: Puppet has not run in the last 10 hours [10:58:52] PROBLEM - Puppet freshness on europium is CRITICAL: Puppet has not run in the last 10 hours [11:08:42] whats the current version of TMH on commons? sadly Version no longer links to git commits [11:11:35] somehow it looks like TMH was not updated to master [11:24:55] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [11:24:56] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [11:24:56] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [11:24:56] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [11:59:46] j^: look at Special:Version on commons :-] [12:00:17] ahhh git version is not there of course :-] [12:00:53] PROBLEM - Puppet freshness on cp1036 is CRITICAL: Puppet has not run in the last 10 hours [12:00:56] so that is 1.21wmf11 which has TMH as a submodule [12:04:54] PROBLEM - Puppet freshness on sq73 is CRITICAL: Puppet has not run in the last 10 hours [12:06:53] PROBLEM - Puppet freshness on mc1002 is CRITICAL: Puppet has not run in the last 10 hours [12:06:53] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [12:34:52] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 184 seconds [12:34:53] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 184 seconds [12:37:52] PROBLEM - Puppet freshness on db35 is CRITICAL: Puppet has not run in the last 10 hours [12:37:53] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [12:38:53] PROBLEM - Puppet freshness on db56 is CRITICAL: Puppet has not run in the last 10 hours [12:42:03] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 18 seconds [12:42:04] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 15 seconds [12:57:56] hashar: what i was looking for was: git ls-tree origin/wmf/1.21wmf11 extensions/TimedMediaHandler [12:58:14] in core, that lists the sha1 of the extension used in origin/wmf/1.21wmf11 [12:59:32] j^: nice :-] [13:00:57] still discovering a new git command every day [13:03:33] New patchset: Krinkle; "extract2.php: Clean up old code." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54645 [13:17:23] PROBLEM - Varnish HTTP bits on palladium is CRITICAL: Connection refused [13:21:22] RECOVERY - Varnish HTTP bits on palladium is OK: HTTP OK: HTTP/1.1 200 OK - 637 bytes in 0.001 second response time [13:31:14] New patchset: Mark Bergsma; "Use GEOIP_MMAP_CACHE instead of GEOIP_MEMORY_CACHE" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54651 [13:33:39] New review: Mark Bergsma; "This doesn't really seem like something that should be present on each and every system. Could you f..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/50306 [13:49:59] New review: Hashar; "I once did generic::packages::ack-grep https://gerrit.wikimedia.org/r/#/c/6467/4/manifests/generic-d..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50306 [14:13:26] mark: so? is MMAP_CACHE better? [14:13:38] I was wondering which of the two to use when writing that powerdns geoip backend [14:16:28] hashar: could you review https://gerrit.wikimedia.org/r/#/c/54614/ ? [14:16:36] paravoid: sure [14:16:54] bah [14:16:56] thanks :) [14:17:04] let me rebase that one [14:18:48] New review: Faidon; "Use SSLCACertificatePath /etc/ssl/certs instead, this would be more future proof." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/54614 [14:19:10] looks good otherwise, but your +1 is useful too [14:20:30] ~. [14:20:49] paravoid: dunno [14:22:11] New patchset: Hashar; "doc.mediawiki.org: Redirect to canonical wikimedia.org and fix invalid SSL" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54614 [14:23:12] New review: Hashar; "Rebased on https://gerrit.wikimedia.org/r/#/c/54465/1 which fix the headers." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54614 [14:24:22] New review: Faidon; "I had a short chat with Tim yesterday. He also doesn't like rotation to be done on API servers, for ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52707 [14:25:12] hashar: also see my review [14:25:24] oo [14:26:05] paravoid: guess we want to do that in another change [14:26:10] and update some other virtual hosts too [14:26:24] well, if you want to do that, even better ;-) [14:26:42] PROBLEM - Host arsenic is DOWN: CRITICAL - Host Unreachable (208.80.154.62) [14:26:51] New review: Hashar; "Good to me, that will clean up the doc. virtualhosts :-]" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/54614 [14:27:07] paravoid: I get a bunch of changes on gallium if you want [14:27:20] https://gerrit.wikimedia.org/r/#/c/54465/1 <--- fix up some puppet::/// links in comment [14:27:36] New review: Faidon; "This is the same as https://gerrit.wikimedia.org/r/#/c/53861/ " [operations/puppet] (production) C: -2; - https://gerrit.wikimedia.org/r/54073 [14:27:52] and there is the infamous zuul.wikimedia.org virtual host which is changes few files :/ [14:28:08] grr [14:28:09] New review: Faidon; "Easy enough :)" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/54465 [14:28:22] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54465 [14:29:26] !log Rebooting arsenic with hyperthreading disabled [14:29:32] Logged the message, Master [14:30:20] zuul.wikimedia.org, ugh [14:30:31] why does it have to be a separate subdomain? [14:30:37] rather than under integration? [14:30:42] isn't it an integration thing? [14:30:44] It is under integration isn't it? [14:30:50] ahh [14:30:50] http://integration.mediawiki.org/zuul/ [14:30:53] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [14:31:03] Damianz: there's a patchset to move it to zuul.wm.org [14:31:06] paravoid: I have probably forgot to add a reason on my change. I will eventually have to split Zuul to another machine. [14:31:21] =/ [14:31:28] who says we'll give you another machine [14:31:52] New patchset: Hashar; "Apache conf for https://zuul.wikimedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54466 [14:32:22] mark: platform engineering budget draft 2013-2014 (001) ? :-] [14:32:22] RECOVERY - Host arsenic is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [14:32:56] hahaha we wish [14:33:10] more seriously, I will try to avoid having multiple machines, though I could potentially have a second one over the summer [14:33:23] depends on how many test jobs are going to be added [14:33:35] I'm not sure a second machine justifies a split in service subdomains [14:33:40] it sure is easier [14:33:55] the main reason is zuul.wikimedia.org is nicer :-] [14:34:13] mark has wanted to have a varnish to do all kinds of routing in front of misc services for a while [14:34:16] but let's not block this on that [14:34:27] (correct me if I'm wrong) [14:34:44] I would be MORE than happy to get rid of the public IP on gallium and will happily be the first user of whatever misc varnish you guys set up :-] [14:34:45] well what's the harm [14:34:59] of what? [14:35:04] adding another vhost [14:35:15] another subdomain? [14:35:19] yes? [14:35:25] do you feel it in your paycut? :) [14:35:25] nothing, I just don't like fragmenting that much [14:35:37] i don't like untangling it later [14:35:50] if we know it's going to be separate then we might as well separate it now [14:36:22] my point was that all of these fall under the integration project and it'd be nice for all of them to be on the same domain [14:36:41] but I don't have a strong opinion, so, sure. [14:36:57] i don't particularly care either way tbh [14:37:06] just wonder what the "justification" would be ;) [14:37:30] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54651 [14:37:40] or [14:38:16] no matter [14:38:31] I don't think the separate machine discussion is that much related to the service name discussion, other than that it's just easier for the same vhost to be all on the same host. [14:38:47] i think it is [14:38:57] as we currently don't really have a good way of splitting off to multiple machines [14:39:31] we could proxy from gallium's apache to that machine, it's not like it would work without gallium [14:39:42] yuck [14:39:43] (afaik) [14:40:24] apache is proxying to localhost:800N now anyway :-) [14:40:46] both jenkins and zuul are interfaced with mod_proxy [14:41:19] in that case it doesn't matter [14:42:38] ? [14:43:44] in that case, having everything under one hostname might be nicer [14:44:23] hashar: what do you think? [14:44:38] zuul.wikimedia.org is nicer and easier to write in my browser :-] [14:45:44] currently got http://integration.wikimedia.org/zuul/ to proxy to the Zuul daemon [14:45:47] lol [14:45:58] I think that's fine tbh [14:46:00] Zuul good. is not really nice :-] [14:46:03] "Zuul good." [14:46:05] haha [14:46:30] I would like to send some statistics to graphite and build a nice welcome page with graphite graphs and the list of jobs [14:46:34] being run [14:46:50] can still do that at that page by tricking apache a bit. But I thought having a dedicated portal would be nicer [14:46:52] err [14:46:55] dedicated vhost [14:47:56] we don't even have jenkins in a separate vhost [14:48:06] yeah that is the next patch :] [14:48:10] hahaha [14:48:34] I don't think we shoud split every CI tool to its own subdomain [14:49:21] ok ok :-] [14:49:23] up to you :-] [14:49:30] personal preference, this [14:49:34] it is [14:49:45] that's why I don't have a strong opinion [14:49:46] for a long time I took paravoid's stance [14:49:57] and put everything under one domain as much as possible [14:50:04] then sometimes web security becomes a problem ;) [14:50:42] I've thought of XSS too, I don't think it's a factor in this case [14:51:22] if anything, all of them are under wikimedia.org... [14:51:39] but it was one of the reasons I wanted to see ts.wikimedia.org go ;-) [15:00:13] paravoid: at least doc.mediawiki.org can be made live https://gerrit.wikimedia.org/r/#/c/54614/ : -D [15:00:46] for zuul.wikimedia.org let me know on the change https://gerrit.wikimedia.org/r/#/c/54466/ If we prefer not using yet another virtual host, I will amend the change to use /zuul/ [15:01:18] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54614 [15:03:30] and we will have to get that python-voluptuous package :-] [15:03:45] I tried again git build package this morning bug failed [15:03:52] I guess I can use some training :-] [15:03:53] oh I delegated that to ottomata [15:03:55] you didn't see? [15:03:56] :) [15:04:07] must have skipped that mail hehe [15:04:22] wasn't a mail [15:04:38] haha, [15:04:50] hashar! i'm sooper busy, but am on RT this week so should be able to help [15:04:57] i've learned a wee bit about git-buildpackage [15:05:12] I've glanced at the package too, so don't worry too much [15:05:13] greaat :-] [15:05:24] I just want you to be more involved in that, since I can't just handle everyone's packages :) [15:05:34] I am a complete .deb noob. I just pipe everything to paravoid :( [15:05:54] what about organizing some kind of training session for people interested? [15:06:09] like a hangout showing up how to package a python module or a ruby gem [15:06:39] I am sure ^demon would be interested in knowing how to package for Ubuntu [15:07:03] <^demon> Yeah, I'd asked about that before :) [15:08:20] this has been raised a few times, even Erik commented on that [15:08:30] I'm not too fond of the idea tbh [15:08:40] I don't think it'd help much anyway [15:09:25] you prefer being popular in this channel ? :-] [15:09:53] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 203 seconds [15:09:53] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 205 seconds [15:14:53] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [15:15:04] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [15:18:58] New review: Mark Bergsma; "Please use the SHA1 hash all the way for review to merge, to avoid the race conflict. I'm not wild a..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/50452 [15:33:17] hashar: did you see the fun I had with Krinkle last night? [15:33:42] was in conf calls yesterday night and I don't have a bouncer :-D [15:33:49] ah [15:33:51] jsduck [15:33:55] ruby packaging woes [15:33:57] oh [15:34:04] wrote ruby code [15:34:08] zeljkof also has some gem packages :-] [15:34:19] to patch jsduck to use execjs instead of therubyracer [15:34:34] and hence use Node.js, instead of linking with libv8 from ruby [15:34:49] "fun" [15:34:51] hashar, paravoid: I just heard _ruby_ :) [15:34:55] noooo [15:35:14] zeljkof: looks like ops start loving ruby, one of them just wrote a few patches in ruby :-] [15:35:38] paravoid: nice work around, did you get it working on gallium? [15:35:45] hashar: :) [15:36:30] paravoid: Hi there [15:36:44] paravoid: Perhaps we can rewrite everything in production in ruby gems :P [15:36:59] Krinkle: sounds good to me :) [15:37:14] * Krinkle has little to no experience with ruby and doesn't actually like or dislike it in any way. [15:37:25] I'd rather go for nodejs in that case, if anything. [15:37:53] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 202 seconds [15:38:03] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 206 seconds [15:38:44] Krinkle, I think ops hated ruby? [15:39:23] Well, we don't love it. Of course not. [15:39:33] Platonides: if poolcounter.py is no more needed, would you mind deleting it from the repo ? [15:39:39] But that's probably more because of consistency not because of ruby itself. [15:40:09] do ops have preference for any language? PHP? python? [15:40:24] you don't wanna have a ton of different systems running through each other. That doesn't make maintenance any easier. [15:40:47] Most things are in PHP due to MediaWiki itself being PHP, so anything that needs to interact with the site is best off in PHP. [15:40:52] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 30 seconds [15:40:59] Secondary: Python, javascript, bash. [15:41:02] That's pretty much it. [15:41:09] Krinkle: makes sense [15:41:23] Anything else is best avoided unless it is a third party thing that is hard to avoid. [15:41:35] In which case it should be thightly packaged through debian [15:41:58] for example, Gerrit and Jenkins run in java. [15:42:01] zeljkof: the main issue is maintaining security patches with apps facing the internet. That is tedious enough witht PHP so adding a new language is adding more and more stress to the ops team. That is the main reason for avoiding ruby [15:42:03] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 16 seconds [15:42:28] Yeah, I don't think we'd approve ruby apps facing public. [15:42:46] hashar, Krinkle: there is ruby implementation on top of JVM :) jruby [15:42:57] I know [15:43:20] paravoid: btw, did you push the packages yet? [15:43:31] New review: Hashar; "Please note 0.7 is uncompatible with Zuul :D So I need 0.6.1 =)" [operations/debs/python-voluptuous] (master) - https://gerrit.wikimedia.org/r/44408 [15:43:34] I'd love to give them a spin, perhaps test drive from jenkins tonight? [15:43:37] and we can have ruby stuff that we need hosted at cloudbees for a while [15:44:03] no [15:44:09] well, running it elsewhere isn't "we can have it run". One can use Travis CI, or Wikimedia Labs for anything too. [15:44:18] arbitrary code on machines we don't care about. [15:44:23] But that's not for anything we rely on. [15:44:27] Only experimental or manual. [15:44:51] Krinkle: browser automation is still in experimental mode, right? [15:44:59] You tell me, you run that :) [15:45:33] Krinkle: well, nobody really uses it except QA, we are still selling it to various teams :) [15:45:51] * zeljkof is back in 5-10 minutes [15:46:06] If and when Selenium tests are run from production Jenkins, it will have to be packaged if in ruby. [15:46:10] And preferably not in ruby. [15:46:22] Which I think is a viable option given that the selenium tests don't rely on Ruby in any way. [15:46:33] webdriver is available in PHP, Java, Nodejs, Python. Lots of languages. [15:46:48] Some are better than others, but there's probably one of them good enough that isn't in ruby. [15:46:59] Krinkle: and a few more languages, but I think only 4 are officially supported by selenium project [15:47:03] I'd say the nodejs version seems a promising implementation. [15:47:14] zeljkof: As long as saucelabs supports it :) [15:48:18] paravoid: Just so I know how to plan the day, when do you expect to do that? (no rush, just trying to schedule things efficiently) [15:51:34] today or tomorrow. [15:53:36] paravoid: and what about the zuul.wm.o virtual host ? https://gerrit.wikimedia.org/r/#/c/54466/ [15:55:49] New review: Faidon; "I'd like to keep the CI tools under a single domain (integration.wikimedia.org) rather than fragment..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/54466 [15:56:22] paravoid: thx :-D [15:57:30] New patchset: Ottomata; "Setting up udp2log instance on gadolinium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54670 [15:59:31] gadolinium? [15:59:38] locke replacement [15:59:39] moar logboxes! [15:59:51] yeah [15:59:52] how many gbe does it have? [16:00:21] locke has gotten up to 70MB/s [16:00:42] I am off for today *wave* [16:00:44] the NIC you mean? [16:00:45] laters! [16:01:56] yes [16:02:31] New review: Sdickson; "Missing files: sha1.c and sha1.h are obtainable from the link supplied in README -- Thanks!" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/47390 [16:03:28] New patchset: Hashar; "contint: libevent-dev on gallium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54672 [16:03:34] easy review ^^^^ one more -dev package on gallium [16:04:00] and I disappear for real now [16:04:19] only if you promise to split packages.pp [16:04:33] to separate subclasses per module to test [16:04:38] damn :) [16:05:02] New review: Faidon; "But please split packages.pp to separate subclasses per software you build :)" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/54672 [16:05:15] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54672 [16:05:38] paravoid, any problems with me self merging that gadolinium commit? its mostly just rearranging what is already on locke, and adding a role class [16:05:47] there will be more commits to come, but this will let me prep gadolinium [16:05:52] PROBLEM - Puppet freshness on search26 is CRITICAL: Puppet has not run in the last 10 hours [16:07:48] go ahead [16:07:51] no [16:07:52] PROBLEM - Puppet freshness on db43 is CRITICAL: Puppet has not run in the last 10 hours [16:07:53] PROBLEM - Puppet freshness on mw1014 is CRITICAL: Puppet has not run in the last 10 hours [16:07:53] PROBLEM - Puppet freshness on mw1115 is CRITICAL: Puppet has not run in the last 10 hours [16:07:53] PROBLEM - Puppet freshness on mw121 is CRITICAL: Puppet has not run in the last 10 hours [16:07:53] use file mode 0555 [16:07:58] read-only [16:08:01] I didn't even have a look :) [16:08:24] would it be a good time to use /srv instead of /a ? :-) [16:08:30] i think I'm the only one who cares though :) [16:08:56] also a matter of taste [16:08:56] i'd prefer that as well [16:09:49] /a has already been partitioned, i guess I can remount it [16:09:59] why the sudo for yourself? [16:10:14] copy/paste from the other logging nodes [16:10:25] remount it but don't forget to update partman [16:10:25] that's not a valid justification ;) [16:10:28] i guess its leftover from when I had root? [16:10:31] sorry [16:10:33] didn't have root [16:10:36] hehe [16:10:37] i guess so [16:10:52] but i mean, i'd rather use sudo as myself than ssh as root, so is that ok? [16:11:06] we're going towards sudo [16:11:11] but until then i'd prefer things to stay consistent [16:11:15] hm, ok, i'll remove it [16:11:16] don't mind [16:11:17] if everyone starts giving themselves sudo this way in puppet [16:11:19] it becomes a bit of a mess [16:11:34] well, heh, I didn't give it to myself (at least on the other nodes), but yeah I know what you mean [16:11:34] ok [16:11:39] yeah :) [16:11:55] New review: Krinkle; "(2 comments)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/54466 [16:12:08] New review: Mark Bergsma; "(4 comments)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54670 [16:13:08] paravoid, gadolinium is using the same partman as a bunch of other nodes [16:13:17] i'd rather not change it just for this, if you don't mind [16:13:20] can I leave /a? [16:13:38] why not change the partman recipe [16:13:42] so future installs will also use /srv ? [16:13:46] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54645 [16:14:01] hmmm, i guess that won't hurt the others, hmmm, as long as we don't have to reinstall the others [16:14:18] what other nodes anyway? [16:14:29] 20ish of them [16:14:43] whoa, no actually more [16:14:51] databases? :) [16:14:52] lvm.cfg [16:15:03] copper|neon|harmon|ssl[1-3]0[0-9][0-9]|ssl[0-9]|zirconium) echo partman/raid1-lvm.cfg ;; \ [16:15:03] barium|caesium|celsus|cerium|colby|constable|europium|gadolinium|kuo|lardner|mexia|neodymium|palladium|promethium|strontium|titanium|tola|xenon|wtp100[1-4]) echo partman/lvm.cfg ;; \ [16:15:06] that's a bit too generic to change I guess [16:15:09] oh sorry [16:15:11] not that first list [16:15:14] thats raid-1-lvm [16:15:17] palladium? huh [16:15:27] palladium is a bits varnish server [16:15:30] strontium too... [16:15:31] solr[1-3]|solr100[1-3]) echo partman/lvm.cfg ;; \ [16:15:39] boron|chromium|hydrogen) echo partman/lvm.cfg ;; \ [16:16:20] don't change that I guess [16:16:21] !log krinkle synchronized extract2.php [16:16:22] not worth the trouble [16:16:27] Logged the message, Master [16:18:42] PROBLEM - Host cp3019 is DOWN: PING CRITICAL - Packet loss = 100% [16:18:58] k [16:19:28] New patchset: Ottomata; "Setting up udp2log instance on gadolinium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54670 [16:19:55] !log Rebooted cp3019 with hyperthreading and DAPC disabled [16:20:02] Logged the message, Master [16:20:17] oh heh [16:20:22] still biting us [16:20:53] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 200 seconds [16:21:01] i'm not gonna replace H310s in bits servers I think [16:21:03] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 203 seconds [16:21:11] not as long as bits is memory-only caching anyway ;) [16:21:33] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [16:22:17] is mw1085 gonna be the next srv217? [16:22:28] mark: probably [16:22:47] working on it now.... [16:22:54] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [16:23:05] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [16:23:12] RECOVERY - Host cp3019 is UP: PING OK - Packet loss = 0%, RTA = 82.99 ms [16:23:56] mark: I was thinking of writing a fact for the RAID controller and putting it up on servermon [16:24:02] so we can easily see [16:24:16] ok [16:24:20] do you think it's that important? :) [16:24:33] PROBLEM - SSH on mw1085 is CRITICAL: Connection refused [16:24:34] PROBLEM - Apache HTTP on mw1085 is CRITICAL: Connection refused [16:24:38] no, that's why I haven't done it yet [16:24:48] same for DAPC [16:24:54] I just think we'll forget half of them [16:26:53] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54670 [16:29:25] New patchset: Ottomata; "Turns out gadolinium is hard to type" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54673 [16:29:32] !log Starting load testing with esams bits servers [16:29:33] haha who would have thought [16:29:38] Logged the message, Master [16:29:50] hashar is keeping a list of typos like pmpta [16:29:54] so that jenkins can V-1 [16:30:47] New patchset: Ottomata; "Turns out gadolinium is hard to type" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54673 [16:30:49] hehe, k, added it to his typos file :p [16:31:28] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54673 [16:32:53] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 188 seconds [16:33:03] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 191 seconds [16:36:32] PROBLEM - NTP on mw1085 is CRITICAL: NTP CRITICAL: No response from NTP server [16:36:32] I just love ruby people [16:36:32] one of the gems I built [16:36:32] no copyright or license statement [16:36:32] anywhere. [16:36:32] I'm pretty sure it'd be illegal to use [16:36:33] RECOVERY - SSH on mw1085 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [16:37:10] :q [16:37:53] PROBLEM - Puppet freshness on cp1034 is CRITICAL: Puppet has not run in the last 10 hours [16:38:09] i have one varnish server doing all esams bits traffic now [16:38:11] 25kreq/s [16:39:22] Krinkle: so, do you want to deal with legal@ ? :-) [16:39:40] paravoid: legal@ what? [16:39:47] seriously, if we pick this kind of ridiculously licensed software someone should suffer :) [16:40:04] * Krinkle checks [16:40:17] GPLv3 [16:40:20] what's wrong? [16:40:23] parallel is a dependency of jsduck [16:40:29] no copyright statement or license anywhere [16:40:40] gemfile says "license = MIT" but that's not a proper licensing statement [16:40:43] and MIT is very ambiguous [16:40:57] there are like 5 different MIT licenses [16:41:04] (expat being the most common one) [16:41:22] this effectively means we can't even use it [16:41:41] We're copying something that is already public on GitHub, not modifying or redistributing. [16:41:48] still illegal [16:41:55] sorry [16:42:12] So what do you want me to do. [16:42:18] (we're also redistributing, but that's a different story) [16:42:30] contact the author and ask him to put a copyright & license statement to his work [16:42:37] Right, because our aptidude source is public, not internal. [16:42:40] because we want to use it and we don't have the right too. [16:42:51] it doesn't matter here anyway [16:42:58] "use" is a right we don't have [16:43:01] paravoid: https://github.com/grosser/parallel, right ? [16:43:07] yes [16:44:35] (or replace the use of parallel with your own code) [16:45:34] or ask legal@ if it's okay to use that, then I'll use it :) [16:45:56] (I'm pretty sure it isn't, but I'm not a lawyer, much less the foundation's counsel) [16:50:03] Krinkle: saw your issue, thanks! [16:50:06] well said :) [16:51:18] paravoid: just go ahead and send a message to debian-legal, I'm sure there'll be a useful concensus in no time! [16:51:41] hahaha [16:51:41] * greg-g says that as a previous subscriber, low low activity participant, of debian-legal [16:51:48] in this case it would be :) [16:51:56] probably, you're right :) [16:52:05] typical case of people putting up things on github without copyright or license statements [16:52:22] our own new Deputy GC even wrote a blog post or two on that very issue [16:52:37] lvilla, of course [16:52:37] heheh [16:52:40] of course [16:52:57] I had a laugh last week with luis [16:53:08] he send us a mail about our plans with openstreetmap [16:53:32] to see about the OSM dataset licensing requirements [16:53:48] and he started the mail by basically saying "so, OSM uses ODbL, which I wrote" [16:53:56] fun :) [16:54:49] mark: Do you recall for how long we hosted OSM tilesets (and around what date?) [16:55:00] I have an email from the OSM folks and I want to give CT all the info we have. [16:55:19] what do you mean? [16:55:23] years ago? [16:55:26] Yes [16:55:30] it was for like 90 days right? [16:55:30] no not really [16:55:35] it was in 2009 I think [16:55:42] sometime after the april berlin hackathon [16:56:03] close enough, I just wanted to mention when since the email discussion may reference it. [16:56:21] i don't really know what exactly they did back then [16:57:33] Krinkle: http://tieguy.org/blog/2012/12/03/licensing-confusion-is-great-for-lawyers/ fwiw [16:57:44] and phipps' https://www.infoworld.com/d/open-source-software/github-needs-take-open-source-seriously-208046 of course [16:58:43] paravoid: wait, he wrote ODbL? I thought that was.... what's his name..... Jordan Hatcher (opencontentlawyer.com) [16:59:10] hm, I could be wrong and paraphrasing [16:59:27] notpeter: around? [16:59:34] RECOVERY - Apache HTTP on mw1085 is OK: HTTP OK: HTTP/1.1 200 OK - 454 bytes in 0.001 second response time [16:59:51] cmjohnson1: not in office yet atleast [16:59:57] paravoid: I found a repository on github that looks surpisingly much like the ruby-parralel code you are packaging. Except this one has a license file. [16:59:58] paravoid: https://github.com/Krinkle/parallel/tree/license [16:59:59] :P [17:00:19] New patchset: Demon; "Link "report bug" link in Gerrit to Bugzilla" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54677 [17:00:21] now you can go to jail [17:00:25] oh..okay..i can't login to mgmt on mc1016 [17:00:35] =/ [17:01:08] It is a pull request, but common sense aside. Technically this is a fork licensed under the MIT-license. [17:01:18] cmjohnson1: hrmm, is system offline or just mgmt? [17:01:19] Which, yep, is a legal mess. [17:01:25] just mgmt [17:01:32] which is odd [17:01:51] hrmm, refusing connections [17:02:12] cmjohnson1: does the web mgmt work? [17:02:49] negative. [17:04:09] cmjohnson1: We should be able to depool and take down a single server without issue, in particular since the rest (15) look good. [17:04:23] though i thought we purchased 18.... the other two new ones still not online =[ [17:06:30] cmjohnson1: So, you wanna do this? [17:06:36] i can, but you wanna learn [17:06:42] New patchset: Demon; "Re-introduce CVE links" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54678 [17:07:43] I want to do it [17:08:01] cool [17:08:02] https://wikitech.wikimedia.org/wiki/Memcached [17:08:25] so peter just showed up and i forgot to ask about this [17:08:32] PROBLEM - Apache HTTP on mw1085 is CRITICAL: Connection refused [17:08:34] but meh, we are just taking down one, it'll be fine, but jsut in case [17:08:49] cmjohnson1: if you look at nagios, you can see that one of the eqiad memcached is puppet freshness failed, but rest are golden [17:08:52] PROBLEM - Host cp3020 is DOWN: PING CRITICAL - Packet loss = 100% [17:09:28] cmjohnson1: ok, talked to peter [17:09:36] <^demon> LeslieCarr: Soo, ever since those new ircecho changes went out, gerrit-wm has been flapping about 2x an hour. I've not seen anything weird on Gerrit's side. [17:09:37] seems memcached may be particularly brittle, but dunno [17:09:50] cmjohnson1: lets wait another hour or so and asher will be about [17:09:55] so, let's not mess with it ...it is not that important [17:10:03] hrm, during the puppet runs perhaps ? [17:10:05] well, i wanna fix it [17:10:12] before its an emergencyh [17:10:18] so lets just wait until binasher is about [17:10:20] <^demon> LeslieCarr: Not sure, but that sounds like an easy theory to test. [17:10:25] kk [17:11:05] robh: wanna go over the certs thing? [17:11:13] lets discuss in pm [17:11:32] k [17:12:34] New review: MarkTraceur; "Let's get wikibugs sorted so we can actually use it." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/53973 [17:13:54] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [17:14:04] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [17:16:43] <^demon> LeslieCarr: Yes, and I now know what's going on--we've hit this before. Have to sort the lists before generating config from them :) [17:17:00] ah [17:17:12] RECOVERY - Host cp3020 is UP: PING OK - Packet loss = 0%, RTA = 83.01 ms [17:21:03] !log Ended bits.esams performance testing [17:21:09] Logged the message, Master [17:23:07] notpeter, can you help me deploy some frontend cache logging config changes this in an hour or two? [17:23:20] i still don't fell comfortable doing it myself [17:23:32] RECOVERY - NTP on mw1085 is OK: NTP OK: Offset 0.003566622734 secs [17:24:45] New patchset: Demon; "Sort channels and logs before making config lines from them" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54681 [17:25:21] New patchset: Demon; "Sort channels and logs before making config lines from them" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54681 [17:26:36] ottomata: permaybe. ping me then? [17:26:57] <^demon> LeslieCarr: Got a patch up for you and jeremyb to look at ^ [17:27:52] cool [17:28:04] New review: Lcarr; "not submitting until jeremyb approves." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/54681 [17:28:09] mk [17:28:46] hahaha [17:30:19] ^demon: you've tested? [17:30:24] <^demon> No, I have not. [17:30:27] k [17:30:34] * jeremyb_ tests [17:30:42] <^demon> I was going based off what we roughly did before :) [17:31:14] i don't know ruby well enough. (half of what i know i learned while writing that file) [17:31:28] hi gerrit-wm! [17:32:04] <^demon> I was comparing it to what we did in replication.config.erb, if you want to look at that for reference. [17:32:10] oh, ok [17:32:16] i was going to just test it locally [17:39:34] ^demon: do we know the prod ruby version? [17:39:46] New patchset: Mark Bergsma; "Purge buffered data when fflush() doesn't work" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54683 [17:40:56] <^demon> jeremyb_: ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] [17:44:09] ^demon: how can we get a relevant /quit msg when puppet boots it? [17:44:26] <^demon> I...don't know. [17:44:28] or is that more a LeslieCarr's ryan question? :) [17:44:36] <^demon> Prolly ;-) [17:44:47] * jeremyb_ waits for ruby1.8 to install [17:45:03] when puppet boots what? [17:46:09] gerrit-wm, I assume [17:46:16] Ryan_Lane: ircecho [17:46:31] ^demon: wow, it really jumps around [17:46:37] well, we should see why its doing so [17:47:36] Krinkle: so, I patched jsduck to remove the parallel calls [17:47:41] New patchset: Ottomata; "Adding gadolinium as a webrequest udp2log receiver." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54686 [17:47:41] Ryan_Lane: right, we figured that out. i just want it to have a relevant /quit msg for the future [17:47:44] Krinkle: this will make it slower, possibly much slower [17:47:50] oh cool, i found a batch of tampa apaches sitting without OS. [17:47:51] eh? [17:47:52] =P [17:47:56] Krinkle: but it's a workaround until that bug gets fixed [17:48:00] *why* is it constantly restarting? [17:48:02] paravoid: Oh, wow. I didn't think it'd be that simple. [17:48:06] paravoid: Thanks! [17:48:09] I can't think of any good reason for this [17:48:10] <^demon> Ryan_Lane: Same problem we had with gerrit's replication. Puppet doesn't ensure array/list order between runs. [17:48:18] Krinkle: if you don't get a reply, I'd suggest replacing that part with another gem or code of your own [17:48:21] then let's fix that [17:48:23] ^demon: s/puppet/ruby/ [17:48:26] <^demon> So when you print out data from a run, it's semi-random. So you just sort it. [17:48:28] paravoid: sure [17:48:32] <^demon> (That's what my patch fixes) [17:48:56] notpeter, actually i'm ready to to this now [17:48:57] https://gerrit.wikimedia.org/r/#/c/54686/ [17:48:58] if you are? [17:49:04] <^demon> Just sort before printing, and you're fine. Having ircecho have a more informative /quit is just a feature request :p [17:49:15] or maybe not, in a meeting [17:49:25] ottomata: heh, ok [17:49:48] ^demon: right :) [17:49:50] i wasn't paying attention, but then they just told me to so, bwaaa :) [17:49:53] (still tweaking though) [17:50:41] whoa, there's a ruby 2.0 now?? [17:50:56] Coren: So the issue happens here again [17:51:04] Ryan_Lane insists its not what we think it is ;p [17:51:21] but he offers no alternative, so im totally ignoring him ;] [17:51:39] if you want to blame silly things, that's fine [17:51:44] RobH: Can you add a sleep every nth ssh? [17:51:53] its a dsh command [17:52:02] RobH: Ah. Darn. [17:52:12] yea [17:52:26] so now i just have to run them manually [17:52:31] against smaller dsh node groups [17:52:39] and not follow the procedure everyone else follows =[ [17:52:41] i dislike [17:52:49] on a positive note, i think using salt will fix this ;] [17:52:54] see Ryan_Lane im supporting salt [17:53:08] ^demon: you know the line #s from an erb stack trace are useless :( [17:53:21] <^demon> Yep. [17:55:39] hey notpeter: when you have a few moments (not time critical) can you take a look at db29? replication seems to have paused [17:56:21] ok, so now it happens in both ubuntu and os x [17:56:28] i have no fucking idea what is up with my key [17:58:23] pgehres: sure [17:59:46] pgehres: ok, slaving is started again [17:59:55] although it's 2 million seconds behind... [17:59:56] notpeter: thanks a bunch [17:59:59] https://bugzilla.wikimedia.org/46328 -- is this an OPs issue? [18:00:04] haha, fun [18:00:29] well, although it's catching up at about 100k/min [18:00:46] that's like 21 days [18:00:55] must have broken right after install then [18:01:39] !log robh synchronized docroot [18:01:45] Logged the message, Master [18:01:45] for the record, it broke on a timeout, so I just started lsave again, and it ran the query no problem [18:02:13] do you have any idea what causes that kind of timeout? [18:02:32] ^demon: around? [18:02:43] <^demon> Yep. [18:02:47] hey [18:02:48] is this something that one can fix with only mysql admin privs and not sudo [18:02:54] I'm trying to create a new project [18:03:02] operations/debs/foo, inheriting rights from operations/debs [18:03:23] <^demon> Mmk. [18:03:26] oh nevermind, I'm an idiot [18:03:40] !log robh synchronized docroot [18:03:46] Logged the message, Master [18:04:32] New patchset: Dzahn; "puppet-lint: fix all "ERROR: tab character found", :retab with 2-space soft tabs per puppet style guide" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54603 [18:05:04] thanks though :) [18:05:11] New patchset: Faidon; "Initial release." [operations/debs/ruby-dimensions] (master) - https://gerrit.wikimedia.org/r/54689 [18:05:30] <^demon> paravoid: Glad I could help ;-) [18:07:24] New patchset: Faidon; "Initial release." [operations/debs/ruby-execjs] (master) - https://gerrit.wikimedia.org/r/54690 [18:07:37] New patchset: Faidon; "Initial release." [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54691 [18:07:45] Krinkle: ^^^ [18:08:03] Krinkle: I also have ruby-parallel but putting it up in gerrit wouldn't be very legal :) [18:08:12] Yeah [18:09:21] do you want to wait for the ruby-parallel author? [18:10:04] (note debian/patches on the jsduck one) [18:11:29] Checking now [18:11:29] paravoid: So where is the actual software code? [18:11:55] New patchset: Jeremyb; "futureproof for ruby1.9.x (not in use yet)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54692 [18:12:02] gems [18:12:41] New review: Jeremyb; "tested" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/54681 [18:13:12] paravoid: The only url I see is in the ./watch file [18:13:14] Is that it? [18:13:18] ^demon: LeslieCarr: ^ [18:13:33] s/url/relevant url/ [18:13:35] Krinkle: yes [18:13:45] man uscan [18:14:07] How did it end up on http://pkg-ruby-extras.alioth.debian.org/cgi-bin/gemwatch [18:14:10] Did you put it there? [18:14:23] Or do they look for anything on rubygems.org [18:14:41] the latter [18:14:46] k [18:14:56] there's a readme in the footer [18:15:15] Gotcha [18:16:07] paravoid: Wow, there's a lot of different services involved. I take back what I said about gems being almost as easy to package as npm modules. I guess the advantage is maintainability (nothing is duplicated) the downside is that there is a lot of going back and forth before it all comes together [18:16:31] I thought one of the reasons of having our own debian repo is so we aren't subject to network spoofing as everything is local [18:16:49] but it isn't, The debian package is only a few instructions. It still gets everything from a ton of different sites on the net. [18:17:21] Or are they loaded once pushed to ubuntu.wikimedia.org? [18:17:43] the git is the source [18:17:51] i.e. apt-get install on an individual server by puppet doesn't go to http://pkg-ruby-extras.alioth.debian.org/ to fetch it [18:17:51] I'm going to build packages and put them up in apt [18:17:55] of course not [18:18:12] Hm.. [18:18:26] So the debs repo isn't the package itself, it's basically a build script to make the package. [18:18:38] which is then what goes into apt. [18:18:40] the git repo is the debian packaging [18:18:53] the result of the packaging, i.e. the .deb(s) go to apt [18:19:07] right [18:19:16] and the .deb file does contain all the source code? [18:19:26] depends [18:20:19] the deb is the "binary" [18:20:33] in interpreted languages binary == source :) [18:20:39] but not the debian/ source [18:22:19] but source isn't necessarily canonical source. could e.g. be minified [18:24:45] New patchset: Faidon; "Initial release." [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54691 [18:25:52] Okay :) [18:26:38] paravoid: Can I / Should I test this on a labs instance before it is pushed to apt / ubuntu .wikimedia.or ? [18:27:01] ok, notpeter, i'm heading to a cafe, wanna help me in 30? [18:27:05] i hope so! [18:27:40] paravoid: Not sure how, but I suppose there is a way to (..after I clone the repos and fetch the gerrit change drafts..) to build it locally and tell apt-get to install it from a local path (instead of from the repo by name) [18:28:01] Krinkle: no point, I've tested it already [18:28:09] re intepreted, see http://mail.python.org/pipermail/pydotorg-www/2013-March/002158.html [18:28:14] paravoid: Very well then. [18:28:21] I can merge & push it to apt.wm now, I'm just wondering if we should wait for parallel [18:28:51] interpreted* [18:28:59] paravoid: the user was active 20 minutes ago on github. [18:29:19] paravoid: I'll try and e-mail him [18:29:27] https://github.com/grosser/dotfiles/commit/be57f16d2f381d90dbb4942402a1d117c3bf2ad2 [18:30:11] paravoid: though.. he'd have to make a release too, right? It being merged in the repo wouldn't make it available over gemwatch [18:30:26] indeed. [18:30:50] paravoid: but jsduck doesn't need to be modified right? because we don't use/rely on the gemspec file for dependencies and versions thereof. [18:31:08] the gem, no [18:31:11] the package, yes [18:31:29] because a) there's a Depends line, b) there's the disable-parallel patch of mine that needs to be removed [18:35:06] New patchset: Matthias Mullie; "re-enable AFTv5" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54694 [18:43:59] Change merged: Bsitu; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54694 [18:44:49] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54603 [18:46:39] !log dist-upgrade on gallium [18:46:45] Logged the message, Master [18:46:46] Does someone know how long it takes for domains acquired by WMF to stop being spamfarms? https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Report,_December_2012#Domains_Obtained [18:46:55] eg wikiartpedia.org [18:47:46] Nemo_bis: for domain in wikiartpedia.biz wikiartpedia.co wikiartpedia.info wikiartpedia.me wikiartpedia.mobi wikiartpedia.net; do echo $domain; whois $domain | grep "Registrant Organization"; done [18:47:47] lesliecarr: where do you want the sfp to rj45's? [18:47:56] paravoid: Merged https://github.com/senchalabs/jsduck/issues/338 [18:48:07] paravoid: Merged https://github.com/grosser/parallel/issues/50 [18:48:16] paravoid: Released parallel 0.6.3 [18:48:26] jsduck won't release yet, so keep that patch. [18:48:43] Nemo_bis: added DNS zones on Feb 18, "All the wikiartpedia domains listed in this ticket now redirect to wikipedia.org" [18:49:30] not .org [18:49:30] Nemo_bis: the .org was never listed on that ticket. re-opening [18:49:34] ok [18:49:39] thanks for reporting [18:52:47] Nemo_bis: registered to us but DNS has not been switched. commenting on ticket. will be taken care of [18:53:06] New review: Krinkle; "ruby-parallel 0.6.3 has just been released which fixed the license. We can drop that patch." [operations/debs/ruby-jsduck] (master) C: -1; - https://gerrit.wikimedia.org/r/54691 [18:54:22] paravoid: JSDuck maintainer also replied regarding the issue on therubyracer. [18:54:33] New patchset: Dzahn; "puppet-lint: fix all "double quoted string containing no variables"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54604 [18:54:35] paravoid: He is hesitant to switch to execjs for easy of installation. [18:54:46] how ironic [18:54:55] paravoid: Though, it's true that he can keep it in the gemspec, right? [18:55:07] notpeter hiho? [18:55:10] paravoid: https://github.com/senchalabs/jsduck/issues/339#issuecomment-15107308 [18:55:17] paravoid: So we'd both get our way. [18:55:36] the answer is "yes" [18:55:38] gem-install'ers can have execjs+therubyracer craziness. And raw fetch just execjs. [18:55:47] paravoid: Thought so :) [18:56:16] New patchset: Faidon; "Depend on parallel, licensing issues were fixed" [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54699 [18:56:17] paravoid: Don't wait for it though, jsduck will likely not release yet just for this. [18:56:24] I won't [18:56:31] local patches aren't a problem [18:56:53] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [18:57:03] New patchset: Dzahn; "puppet-lint: fix most "=> on line isn't properly aligned for resource", and all "unquoted file mode" and "ensure found on line but it's not the first attribute"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54605 [18:58:40] ottomata: sup [18:58:51] New patchset: Faidon; "Initial release." [operations/debs/ruby-parallel] (master) - https://gerrit.wikimedia.org/r/54700 [18:59:09] will you babysit me as I deploy frontend config changes? [18:59:16] https://gerrit.wikimedia.org/r/#/c/54686/ + frontend.conf.php for squids? [18:59:35] Change merged: Faidon; [operations/debs/ruby-parallel] (master) - https://gerrit.wikimedia.org/r/54700 [18:59:44] Change merged: Faidon; [operations/debs/ruby-dimensions] (master) - https://gerrit.wikimedia.org/r/54689 [19:00:25] Change merged: Faidon; [operations/debs/ruby-execjs] (master) - https://gerrit.wikimedia.org/r/54690 [19:00:36] Change merged: Faidon; [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54691 [19:01:02] soooo, wait [19:01:02] Change merged: Faidon; [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54699 [19:01:03] why not just use the multicast relay stream? [19:01:07] it's also in eqiad [19:01:13] along with oxygen [19:01:36] i thought about that too, i heard objections about making oxygen a spof [19:01:50] and this is a replacement for locke, so we intend to remove the locke IP eventually [19:02:31] beh [19:02:43] if we're not going to use the relay [19:02:47] then why do we have it at all? [19:02:54] I mean, we can also just disable it ;) [19:03:00] (I'm being srs) [19:03:39] analytics cluster is using it pretty heavily righ tnow [19:03:41] !log adding ruby-multi-json (universe), ruby-parallel, ruby-dimensions, ruby-execjs, ruby-jsduck (main) to apt [19:03:44] ok, cool [19:03:48] your answer checks out [19:03:48] Logged the message, Master [19:03:50] binasher, can you comment? [19:03:54] Krinkle: try apt-get update; apt-get install ruby-jsduck on labs [19:04:32] notpeter: ottomata speaks troof [19:04:34] Krinkle: see if everything works for you, then submit a puppet changeset to add it to gallium [19:04:53] sweet! [19:04:55] ok, cool [19:05:01] so better to use unicast for galadmadiumomiumin? [19:05:04] I am convinced. sorry for being a stick in the mud [19:05:07] yeah [19:05:08] haha, s'ok [19:05:17] i asked the same question on https://rt.wikimedia.org/Ticket/Display.html?id=4689 [19:05:18] but ok! [19:05:21] unicast it is [19:05:27] paravoid: Will do, thanks again! [19:05:39] ok, babysit commencing (?) [19:05:43] ok, the first thing I want you do is go to an ssl box, stop puppet, and test that change by hand [19:05:56] by editing the config there? [19:06:04] as the last time I added a third logging host, it brought down https [19:06:07] yeah [19:06:10] and hten reloading [19:06:12] it should be fixed [19:06:16] but a hand test would be smrt [19:06:34] mk, [19:07:21] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54608 [19:07:29] !log mlitn synchronized php-1.21wmf11/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [19:07:33] ok, doing so on ssl1001 [19:07:36] Logged the message, Master [19:07:55] !log mlitn synchronized php-1.21wmf12/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [19:08:00] !restarting nginx on ssl1001 to test udp2log settings for gadolinium [19:08:01] Logged the message, Master [19:08:12] Krinkle: resolved RT, I'll leave bugzilla to you [19:08:17] k [19:09:27] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54609 [19:10:24] notpeter, looking good on ssl1001, I see logs on gadolinium too [19:11:43] ottomata: ok, cool [19:12:09] New review: Dzahn; "(1 comment)" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/54610 [19:12:22] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54610 [19:13:54] !log mlitn synchronized wmf-config/CommonSettings.php 're-include AFTv5' [19:14:01] Logged the message, Master [19:14:09] ok, so puppet first? should I test varnish too just in case? [19:14:17] paravoid: Seems to crash before anything, :-/ [19:14:35] paravoid: https://gist.github.com/Krinkle/0885064154a47fd1295a [19:15:11] ottomata: yeah, everything in puppet looks pretty safe [19:15:16] I'd say go for it [19:15:27] paravoid: Not sure whether I forgot something, but it seems execjs is having issues [19:15:31] run puppet by hand on an ssl box (just to be safe) and a varnish box [19:15:37] ok [19:16:12] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54686 [19:16:24] oh, you're also going to have to edit nrpe::monitor_service { "varnishncsa": [19:16:34] it will start throwing warnings [19:16:48] because there will be 4 varnishncsa procs per host [19:17:05] so the -w 3:3 will need to be -w 4:4 [19:17:23] and the -c 3:6 will need to be, iunno, I guess -c 4:8 [19:17:33] oh [19:17:34] hm [19:17:55] in varnish.pp [19:18:00] no big deal, though [19:18:11] I'd do that once puppet has run on all the varnish boxes [19:18:24] ok…committing now [19:19:50] New patchset: Ottomata; "Upping number of expected varnishncsa instance to monitor (+gadolinium)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54707 [19:20:17] https://gist.github.com/Krinkle/0885064154a47fd1295a?r2 [19:20:20] Looks like execjs wasn't installed [19:20:46] ottomata: cool [19:20:55] feel free to deploy teh squidz now [19:21:03] I have removed the land mine, as well :) [19:22:24] ottomata: just fyi, if needed you can also check procs using a regex on the arguments of a process or a whole commandline, instead of just the process. instead of "-a" that would be like --ereg-argument-array '^/usr/bin/java -jar /usr/share/jenkins/jenkins.war' [19:23:03] me? [19:23:15] is your fyi for me or someone else? [19:23:22] because with -a all you could check would be "java" itself for those cases [19:23:27] cause cool! but I didn't know I needed to do that [19:23:34] ohohoho [19:23:37] the varnishncsa thing [19:23:38] ha [19:24:04] cool, i think its not using -a though? [19:24:09] check_procs -w 4:4 -c 4:8 -C varnishncsa [19:24:21] ottomata1: oh, for you or anyone. and you just need it if you need to check for something other than the process name itself [19:24:33] aye cool [19:24:50] ah, yes, true -a already means "check argument" , -C is "check process name" [19:25:04] Change abandoned: Pyoungmeister; "this was done another way." [operations/debs/flask-login] (master) - https://gerrit.wikimedia.org/r/54083 [19:25:19] and --ereg-argument-array is for a complete command line / regex [19:25:42] because i am looking at NRPE commands, and: [19:25:44] 18 command[check_varnishncsa]=/usr/lib/nagios/plugins/check_procs -w 2:2 -c 2:6 -C varnishncsa [19:26:27] and there still is ./templates/icinga AND ./templates/nagios each having nrpe_local.cfg.erb [19:27:00] ottomata1: want to go for it with the squid deploy? [19:27:33] New patchset: Krinkle; "Add missing dependency on ruby-execjs" [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54709 [19:27:51] yup [19:27:52] one sec [19:28:04] k [19:35:03] PROBLEM - Packetloss_Average on europium is CRITICAL: CRITICAL: packet_loss_average is 49.8162475 (gt 8.0) [19:35:21] well, that's promising.... [19:35:27] :D [19:35:42] nothing to do with me! haha [19:36:04] ok [19:36:06] sorry back [19:37:16] notpeter, i'm making configs on fenari now [19:37:19] New patchset: Dzahn; "puppet-lint: fix "tab character found on line" by using :retab in vim with 2-space soft tabs (tabstop=2,shiftwidth=2,expandtab)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54611 [19:37:21] sweet [19:38:04] I'm also going to merge that nrpe change [19:38:29] k danke [19:38:37] ok, its looking goot [19:38:38] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54707 [19:38:41] good [19:39:14] yeah, all deployed and shit? [19:39:18] not deployed [19:39:24] was waiting for a def go ahead [19:39:25] gonna run [19:39:27] ./deploy all [19:39:34] ja? [19:39:34] ja [19:39:34] diff looks good [19:39:34] k [19:39:37] well, again, can do frontend [19:39:40] but all fine too [19:40:13] oh [19:40:16] right [19:40:27] looks pretty good! [19:40:40] yup data coming in now [19:40:53] yay! [19:41:35] yay! [19:41:37] looks good! [19:41:38] thanks notpeter [19:41:49] !log added gadolinium as a udp2log host on frontend caches [19:41:56] Logged the message, Master [19:43:46] RECOVERY - Puppet freshness on mw121 is OK: puppet ran at Tue Mar 19 19:43:40 UTC 2013 [19:44:27] RECOVERY - Puppet freshness on mw1115 is OK: puppet ran at Tue Mar 19 19:44:21 UTC 2013 [19:44:45] ottomata1: yep! cool! [19:52:11] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 2 processes with args zuul-server [19:52:12] PROBLEM - Parsoid on celsus is CRITICAL: Connection refused [19:52:12] PROBLEM - Apache HTTP on mw1085 is CRITICAL: Connection refused [19:52:12] PROBLEM - Memcached on marmontel is CRITICAL: Connection refused [19:52:22] PROBLEM - udp2log log age for locke on europium is CRITICAL: NRPE: Command check_udp2log_log_age-locke not defined [19:52:24] PROBLEM - Varnish traffic logger on cp1034 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [19:52:24] PROBLEM - mysqld processes on rdb1001 is CRITICAL: NRPE: Command check_mysqld not defined [19:52:24] PROBLEM - Full LVS Snapshot on rdb1001 is CRITICAL: NRPE: Command check_lvs not defined [19:52:24] PROBLEM - MySQL Recent Restart on rdb1001 is CRITICAL: NRPE: Command check_mysql_recent_restart not defined [19:52:24] PROBLEM - Backend Squid HTTP on knsq17 is CRITICAL: Connection refused [19:52:24] RECOVERY - Puppet freshness on mw1014 is OK: puppet ran at Tue Mar 19 19:52:20 UTC 2013 [19:52:32] New review: Ottomata; "Ok, one more nitpicky thing. Is there a good reason to call this /var/www/limn-public-data, and not..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54116 [19:53:03] ^demon: you there? [19:53:59] <^demon> Yep. [19:55:03] <^demon> notpeter: Sup? [19:55:05] the ticket you made about search index problems [19:55:07] sup with them? [19:55:17] they look like they're importing ok [19:55:53] New review: Ottomata; "Hm, instead of calling the key for X-CS value 'zero', would it make more sense to still call it x-cs?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52606 [19:56:33] <^demon> notpeter: Complaining like this: http://p.defau.lt/?8ioGS_rUsgXIP_2lO4KOgA [19:56:45] <^demon> Google pointed me at corrupted indices. [19:57:51] RECOVERY - Puppet freshness on search26 is OK: puppet ran at Tue Mar 19 19:57:44 UTC 2013 [19:58:14] PROBLEM - Puppet freshness on formey is CRITICAL: Puppet has not run in the last 10 hours [19:58:36] ^demon: soooo, if I had to guess [19:58:39] key here being guess [19:58:47] I would say that that's realtime indexing fialing [19:59:04] as we don't update any of the private indexes in realtime, instead just re-create every night [19:59:23] this was done for... some reason that has been lost in the ether at this point [20:05:12] PROBLEM - Puppet freshness on sq79 is CRITICAL: Puppet has not run in the last 10 hours [20:07:12] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54611 [20:07:14] ori-l [20:07:16] you around? [20:07:25] half. what's up? [20:08:04] looking at python-jsonschema :) [20:08:13] i just built a faidon blessed python deb yesterday [20:08:14] :) [20:08:38] ottomata: i would love you forever; possibly longer [20:08:41] Change abandoned: Dzahn; "already done in PS2/3 of the former patch set" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54612 [20:08:58] mind if I start over? [20:09:09] i want to use git-buildpackage for the repo structure [20:09:09] no! [20:09:17] also, is version 1.1.0 ok? [20:09:20] (i'm just googling for source) [20:09:34] https://pypi.python.org/pypi/jsonschema [20:09:42] yes! [20:09:46] ok cool, danke [20:09:53] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54604 [20:09:55] i'm going to try to document my steps too [20:09:57] i will build an altar in your honor [20:10:01] hashar has similar needs :) [20:10:08] will add to this page [20:10:08] https://wikitech.wikimedia.org/wiki/Debianize_python_package [20:10:25] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54605 [20:10:32] \o/ [20:12:07] Change abandoned: Dzahn; "duplicate of 53861" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54073 [20:16:05] New review: Dzahn; "has been removed from DNS, remove nginx site "ipv6and4"" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/54098 [20:16:18] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54098 [20:17:55] PROBLEM - swift-account-reaper on ms-be11 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [20:17:55] PROBLEM - MySQL disk space on blondel is CRITICAL: DISK CRITICAL - free space: /a 0 MB (0% inode=4%): [20:17:55] PROBLEM - swift-account-reaper on ms-be12 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [20:18:55] New patchset: Dzahn; "remove protoproxy::ipv6_labs inclusion from site.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54717 [20:19:54] New patchset: Ori.livneh; "(Bug 44394) Set $wgGrownUpBehavior to 'false'" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54718 [20:21:18] New review: Dzahn; "this just installed the "ipv6and4" nginx template" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/54717 [20:21:31] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54717 [20:22:35] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54718 [20:25:48] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 186 seconds [20:26:44] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 200 seconds [20:27:13] New review: Dzahn; "oh, this wasn't merged. recheck please (jenkins)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53387 [20:34:39] <^demon> notpeter: Sounds like a reasonable assumption, only dewikisource was failing for the same reason :\ [20:34:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:35:32] !log mlitn synchronized php-1.21wmf11/extensions/ArticleFeedbackv5/SpecialArticleFeedbackv5.php 'Touched AFT Special page file' [20:35:39] Logged the message, Master [20:35:50] !log mlitn synchronized php-1.21wmf12/extensions/ArticleFeedbackv5/SpecialArticleFeedbackv5.php 'Touched AFT Special page file' [20:35:56] Logged the message, Master [20:36:20] woosters: hey [20:37:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.160 second response time [20:43:08] doh - trying to upload a slide deck to wikitech. the deck seems to have uploaded, however when i try to view it, i get: [20:43:08] Error creating thumbnail: [20:43:09] libgomp: Thread creation failed: Resource temporarily unavailable [20:43:21] see for yourself: https://wikitech.wikimedia.org/wiki/File:Mobile_architecture_review,_Q3_2012_2013.pdf [20:45:51] * MaxSem looks around [20:45:58] scapping [20:46:14] * Damianz gives SlightlySadPanda a cookie [20:48:06] <^demon> awjr: Hmm, PdfHandler's installed. Wonder if it's missing a package or misconfigured. [20:48:13] https://wikitech.wikimedia.org/w/index.php?title=File:Gerrit-Discoverable-Projects.pdf&page=3 [20:48:17] it's working for other things [20:48:21] awjr: confirmed .. http://www.mwusers.com/forums/showthread.php?15782-Error-creating-thumbnail-libgomp-Thread-creation-failed... [20:48:40] they raised $wgMaxShellMemory [20:48:54] <^demon> Ryan_Lane: Try purging ;-) [20:48:59] tried [20:49:12] ah [20:49:13] lame [20:49:16] https://wikitech.wikimedia.org/wiki/File:Mobile_architecture_review,_Q3_2012_2013.pdf [20:50:02] they also claim "must be added to the END of localsettings.php to work. " [20:51:55] <^demon> Well, that used to be the generic instruction, because people would put it at the top above DefaultSettings' inclusion. [20:52:07] ah,ok [20:52:22] different error now:) [20:52:28] Error creating thumbnail: /bin/bash: line 1: 3246 Done 'gs' -sDEVICE=jpeg -sOutputFile=- -dFirstPage=2 -dLastPage=2 -r150 -dBATCH -dNOPAUSE -q '/srv/org/wikimedia/controller/wikis/slot0/images/5/55/Mobile_architecture_review,_Q3_2012_2013.pdf' [20:52:32] 3247 Segmentation fault | 'convert' -depth 8 -resize 180 - '/tmp/transform_24d58bec54b2-1 [20:52:45] uh, problem scapping [20:52:51] Copying to fenari from 10.0.5.8...rsync: send_files failed to open "/php-1.21wmf11/.git/modules/extensions/FormPreloadPostCache/index.lock" (in common): Permission denied (13) [20:53:19] <^demon> Can ignore that, but should be fixed. [20:55:43] ok. upped the memory limit [20:55:51] seems to be working now [20:55:53] seems to have done the trick, Ryan_Lane [20:55:55] thanks :) [20:55:57] yw [20:55:59] yep, works [20:56:27] and the page that renders says "what we know we need from ops" , heh [20:56:39] heh [20:59:23] paravoid, if you are still around, any suggestions for generating an initial debian directory for a new python package? [20:59:44] PROBLEM - Puppet freshness on europium is CRITICAL: Puppet has not run in the last 10 hours [20:59:47] other than stdeb? :) [21:05:15] gerrit-wm's right on schedule :) [21:05:17] ottomata: i think dh_make to generate your debian subdir and the control files etc [21:06:17] hmmmmmmm, one thing that is super nice about stdeb, is that it reads setup.py and builds a control file with debs [21:06:20] deps* [21:06:22] dependencies [21:06:42] ottomata: totally still ask him when he's back, but i guess he will refer to Debian policy ..like http://www.debian.org/doc/packaging-manuals/python-policy/ap-packaging_tools.html [21:11:52] <^demon> jeremyb_: Wait, that wasn't merged yet? :p [21:12:42] ^demon: check gerrit :) [21:13:04] 19 18:11:54 <+gerrit-wm> New patchset: Jeremyb; "futureproof for ruby1.9.x (not in use yet)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54692 [21:13:07] 19 18:12:40 <+gerrit-wm> New review: Jeremyb; "tested" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/54681 [21:13:10] 19 18:13:18 < jeremyb_> ^demon: LeslieCarr: ^ [21:13:20] !log maxsem Started syncing Wikimedia installation... : Weekly mobile deployment [21:13:27] Logged the message, Master [21:14:06] how to trigger a jenkins "recheck"? [21:14:16] i see hashar doing it every once in a while [21:14:25] just comment with "recheck" [21:14:29] mutante, make a comment saying "recheck" [21:14:43] New review: Jeremyb; "recheck" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53387 [21:14:47] ah:) i said "recheck please (jenkins)" :) [21:14:56] you said more than that [21:14:56] New review: Dzahn; "recheck" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53387 [21:15:24] yes, i didn't even know this was triggered by comment content [21:16:52] New review: Dzahn; "updated bugzilla report" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53387 [21:17:08] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53387 [21:18:16] RECOVERY - Puppet freshness on sq73 is OK: puppet ran at Tue Mar 19 21:18:06 UTC 2013 [21:19:23] !installing package upgrades on sockpuppet [21:22:54] RECOVERY - Puppet freshness on db56 is OK: puppet ran at Tue Mar 19 21:22:49 UTC 2013 [21:24:40] !log installing package upgrades on sockpuppet [21:24:46] Logged the message, Master [21:24:46] !log installing package upgrades on kaulen [21:24:52] Logged the message, Master [21:25:47] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [21:25:47] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [21:25:47] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [21:25:47] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [21:26:22] New review: Dzahn; "Can't send mail: sendmail process failed with error code 1" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53387 [21:29:34] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Tue Mar 19 21:29:29 UTC 2013 [21:30:29] New patchset: Mattflaschen; "Revert "(Bug 44394) Set $wgGrownUpBehavior to 'false'"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54725 [21:34:43] New review: Dzahn; "nevermind that, the report still works and it mails out, just the formatting of the new "Most urgent..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53387 [21:36:51] !log maxsem Finished syncing Wikimedia installation... : Weekly mobile deployment [21:36:58] Logged the message, Master [21:40:23] hi ^demon, more gerrit qs for you [21:40:34] same thing as yesterday, except this time the gerrit repo already exists [21:40:42] so I don't thikn I can push directly to id [21:40:43] it* [21:40:55] I have these branches [21:40:56] * master [21:40:56] master_orig [21:40:56] pristine-tar [21:40:56] upstream [21:40:57] <^demon> Can if you give yourself push on the ref in question :) [21:40:58] paravoid: Ping [21:41:03] oh? [21:41:18] I renamed the old master to master_orig [21:41:27] and created a new master [21:42:59] ^demon: for even more obscure reasons, dewikisource is the only public wii handled thusly as well [21:43:16] so, the pattern fits of wikis that are imported by cron and not indexed in realtime [21:43:49] <^demon> Gotcha. [21:43:59] ^demon, i'm not sure how or if I can give myself push on things [21:44:27] <^demon> Go to projects -> list -> click on your project -> click "Access" [21:44:39] ja i'm there [21:45:38] add a group? [21:47:28] hmm, ok! [21:48:30] ^demon: is there anything that you'd like me to do on that ticket at this point? or can I close it? or...? [21:48:55] What happened to paravoid? He disappeared everywhere ~ 2.5 hours ago [21:49:17] well, it's after midnight there, right? :) [21:49:32] not as if he ever sleeps [21:49:50] <^demon> notpeter: Go ahead and close it, I'll file something in BZ for us to handle it in lsearchd better. [21:50:32] cool! [21:50:32] thanks [21:51:20] New patchset: Dzahn; "bugzilla_report.php - make the output of the new "Most urgent issues" section more readable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54781 [21:51:37] New patchset: Ottomata; "Initial deb packaging" [operations/debs/python-jsonschema] (master) - https://gerrit.wikimedia.org/r/54782 [21:52:12] !log mlitn synchronized php-1.21wmf11/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [21:52:16] New review: Dzahn; "check formatting before and after, i sent you 2 copies via mail, manually" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/54781 [21:52:18] Logged the message, Master [21:52:38] !log mlitn synchronized php-1.21wmf12/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [21:52:45] Logged the message, Master [21:53:18] New review: Dzahn; "follow-up for formatting: https://gerrit.wikimedia.org/r/#/c/54781/" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53387 [22:07:46] PROBLEM - Puppet freshness on mc1002 is CRITICAL: Puppet has not run in the last 10 hours [22:07:46] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [22:13:43] New review: Mattflaschen; "See https://gerrit.wikimedia.org/r/#/c/54723/" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54725 [22:15:07] AaronSchulz: https://www.mediawiki.org/wiki/User:Bsitu/Echo_Database_Discussion [22:15:31] !log mlitn synchronized php-1.21wmf11/extensions/ArticleFeedbackv5/data/DataModel.php 'Update ArticleFeedbackv5; filter counts are incorrect' [22:15:38] Logged the message, Master [22:16:11] New patchset: Pyoungmeister; "WIP: first bit of stuff for taming the mysql module and making the SANITARIUM" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53907 [22:17:11] !log mlitn synchronized php-1.21wmf11/extensions/ArticleFeedbackv5/data/DataModel.php 'Update ArticleFeedbackv5; filter counts are incorrect' [22:17:18] Logged the message, Master [22:28:52] New patchset: Matthias Mullie; "re-enable AFTv5 on dewiki & testwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54785 [22:30:40] ori-l, wee, now just need a review from faidon: [22:30:41] https://gerrit.wikimedia.org/r/#/c/54782/ [22:31:59] New patchset: Ryan Lane; "Add bond ip for network node in eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54786 [22:32:56] !log adding udp-filter_0.3.22 for precise to apt repo (lucid version already was there) [22:33:02] Logged the message, Master [22:35:56] New review: Krinkle; "(1 comment)" [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54691 [22:36:37] !log mlitn synchronized php-1.21wmf11/extensions/ArticleFeedbackv5/data/DataModel.php 'Update ArticleFeedbackv5; filter counts are incorrect' [22:36:40] Could someone merge https://gerrit.wikimedia.org/r/54709 and re-build/push that package to ubuntu.wikimedia.org? [22:36:44] Logged the message, Master [22:37:46] !log mlitn synchronized php-1.21wmf11/extensions/ArticleFeedbackv5/data/DataModel.php 'Update ArticleFeedbackv5; filter counts are incorrect' [22:37:53] Logged the message, Master [22:37:54] ottomata: Looks like you're on duty, are comfortable with debs? [22:38:46] PROBLEM - Puppet freshness on db35 is CRITICAL: Puppet has not run in the last 10 hours [22:38:46] PROBLEM - Puppet freshness on labstore1 is CRITICAL: Puppet has not run in the last 10 hours [22:38:47] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 194 seconds [22:38:47] New review: Krinkle; "Resolves fixme I725c0ffc80b9726" [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54709 [22:39:02] Krinkle, barely, i'm getting there, currently comfortable with python debs at least :/ [22:39:06] getting comfy with git-buildpackage [22:39:17] but, alas, I am signing out for the day [22:39:21] happy to help you tomorrow though [22:39:27] This particular one only adds a dependency. [22:39:38] He built the packages this morning and pushed them to ubuntu.wikimedia.org [22:40:00] but forgot a depends, so now the code is still broken and I've been waiting for it all day basically (not doing nothing, doing other things) [22:40:09] who is he? (faidon?) [22:40:11] ottomata: Do you know what happened to paravoid? He just dissapeared 3 hours ago [22:40:16] Yes [22:40:18] he's in greece, so probably sleeping [22:40:27] at 7 pm? [22:40:32] oh 3 hours ago? [22:40:35] yeah, i dunno [22:40:43] yesterday we were working on it around this time [22:40:46] yeah, he's up late sometimes [22:40:47] dunno [22:40:49] Anyway, will wait another day. [22:40:56] yeah, sorry, its almost 7pm here and i've gotta go [22:41:03] Sure. [22:41:06] what time zone are you? [22:41:11] GMT+2 [22:41:17] GMT+1 * [22:41:30] oh so late for you too [22:41:35] ok well, yeah, ping me tomorrow [22:41:36] Not really, I work nightshifts [22:41:39] ah [22:41:43] I woke up at 3pm. [22:41:45] RECOVERY - Packetloss_Average on europium is OK: OK: packet_loss_average is 0.0 [22:44:02] laters all! [22:45:04] Change merged: Kaldari; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54785 [22:45:57] !time-greece is http://www.timeanddate.com/worldclock/city.html?n=26 [22:45:58] Key was added [22:47:01] New patchset: Krinkle; "Include package ruby-jsduck on contint server." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54787 [22:47:09] Anyone know why git-review is so slow lately for operations/puppet? [22:47:14] Sometimes takes over a minute [22:47:20] (pushing new patch sets) [22:47:38] it's a bug in zuul [22:47:59] from what i heard it is actually done doing it, but then does not report back to jenkins for a while [22:48:07] New review: Krinkle; "Faidon has published ruby-jsduck, ruby-execjs and other packages on ubuntu.wikimedia.org earlier today." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54787 [22:48:37] mutante: back to gerrit I suppose? [22:48:40] (not jenkins) [22:48:46] PROBLEM - Backend Squid HTTP on knsq17 is CRITICAL: Connection refused [22:48:51] since Jenkins runs asynchronously from Zuul. [22:49:03] I guess Zuul should also be asynchronous from Gerrit [22:49:09] Krinkle: eh, yea, back to Gerrit .. https://bugzilla.wikimedia.org/show_bug.cgi?id=46176 [22:49:13] Thx [22:49:46] mutante: hm.. that might be a separate bug though [22:50:27] mutante: I mean when I create a commit locally and push it to gerrit with "git review", that seems to take gerrit a long time to process it and respond back to the terminal etc. [22:50:28] New patchset: Ryan Lane; "Update x509 auth for libvirt" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54788 [22:51:23] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54786 [22:51:41] thanks jeremyb_ :) [22:51:46] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54681 [22:51:51] Krinkle: ah, ok, well i didn't notice that as much, but in general it feels like sometimes it's faster than other times [22:53:18] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54788 [22:57:19] New patchset: Ryan Lane; "Fixing references to fqdn style virt certs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54789 [22:58:02] !g 54692 | LeslieCarr [22:58:02] LeslieCarr: https://gerrit.wikimedia.org/r/#q,54692,n,z [22:58:10] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54781 [23:01:10] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54789 [23:02:04] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 6 seconds [23:02:32] New patchset: Dzahn; "add wikiartpedia.ORG to the other wikiartpedia redirects (RT-4240)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54791 [23:04:06] New patchset: Dzahn; "add wikiartpedia.ORG to the other wikiartpedia redirects (RT-4240)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54791 [23:04:59] mutante: you reordered? [23:06:46] yes [23:06:54] New patchset: Ryan Lane; "Adjust cert name used in libvirtd conf" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54792 [23:07:16] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54791 [23:07:45] !log mlitn synchronized wmf-config/InitialiseSettings.php 'Re-enable AFTv5 on dewiki/testwiki' [23:07:52] Logged the message, Master [23:08:41] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54792 [23:11:06] Ryan_Lane: mutante: Since you both seem to be actively in puppet mode, when you're done, perhaps you could check out https://gerrit.wikimedia.org/r/54787 later? It's a simple change installing a package on gallium. [23:11:14] hey [23:11:16] back [23:11:31] paravoid: Oh, you're back. I figured you'd went to sleep. [23:11:44] no, I went to have a life :) [23:12:00] (j/k) [23:12:01] it's your evening, by all means. [23:12:16] New review: Dzahn; "apache-fast-test wikiartpedia.url mw1044" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54791 [23:12:57] New patchset: Faidon; "Add a Depends on ruby-execjs (doh!)" [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54793 [23:13:24] Krinkle: ^ [23:13:25] paravoid: https://gerrit.wikimedia.org/r/#/c/54709/ [23:13:39] Read your gerrit spam :P [23:13:39] ah! [23:14:03] so, installing ruby-execjs and ruby-jsduck on gallium is ok? [23:14:08] Change abandoned: Faidon; "Already commited, r54709" [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54793 [23:14:25] Change merged: Faidon; [operations/debs/ruby-jsduck] (master) - https://gerrit.wikimedia.org/r/54709 [23:14:35] mutante: hold on paravoid returned, He'll fix the package so only ruby-jsduck is needed [23:14:42] ok, cool [23:14:49] bbiaw [23:14:58] I'd suggest installing it by hand for now, in case you find more bugs that we need to push at the same time [23:15:20] New patchset: Ryan Lane; "Change trusted DN libvirtd config to use site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54794 [23:16:01] paravoid: I already did that in labs (install them as separate packages) and ran it [23:16:12] http://integration.wmflabs.org/mw/extensions/VisualEditor/docs/ [23:16:14] works perfectly [23:16:15] New patchset: Krinkle; "Include package ruby-jsduck on contint server." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54787 [23:16:21] and btw, I left at ~9:30pm, not 7 :) [23:16:38] paravoid: Right, you're a few hours ahead of me [23:16:44] 1 [23:16:49] I stand corrected. [23:16:56] a few hours ahead of UTC. [23:17:10] 2 for the next week and a half :) [23:17:16] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54787 [23:17:37] Many hours ahead of my biological timezone (which according to James_F is HST, Hawaiian Standard Time) [23:17:51] i.e. Nightshift in CET [23:20:50] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54794 [23:23:16] hooray. virt*..wmnet certs [23:23:27] works perfectly for libvirtd :) [23:24:11] now to test instance resizing [23:25:44] Ryan_Lane: btw, we have a bug in SSL certs [23:25:48] in case you're interested [23:25:56] in the unified ones? [23:29:39] dzahn is doing a graceful restart of all apaches [23:30:24] !log dzahn gracefulled all apaches [23:30:31] Logged the message, Master [23:33:24] !log DNS update - add wikiartpedia.org [23:33:28] Krinkle: are you not SF now? [23:33:30] Logged the message, Master [23:33:42] jeremyb_: No, I'm in the netherlands. [23:34:21] huh. i thought you all mostly migrated [23:34:34] except Reedy and chad [23:35:41] Krinkle: :-) [23:37:23] jeremyb_: Nope, I visit once of twice a year, but the rest of the year I work from home in the Netherlands. [23:37:29] Doesn't mark also work from the Netherlands? [23:37:36] and Siebrand [23:37:42] and Hashar (from France) [23:38:03] and Ed (from the UK) [23:38:08] there's quite a few :) [23:38:11] Zjelko (Serbia) [23:38:19] Pau (Valencia) [23:38:25] Niklas (Finland) [23:38:30] Santhosh (India) [23:38:32] Runa (India) [23:38:36] Krinkle: yeah, some people never move. but some (roan, mutante, nike) i thought all moved [23:38:36] and if you include US outside SF, there's even more. [23:38:39] siebrand: no more niklas [23:38:43] Yuvi (http://wheresyuvi.com) [23:39:11] YuviPanda: Why is the site out? [23:39:22] whereswheresyuvi.com [23:39:29] * siebrand nods at Krinkle  [23:39:30] :D [23:39:38] god 'un [23:39:48] good 'un I meant to say. [23:40:06] * siebrand is going to roam the office for the last afternoon. [23:41:09] siebrand: yuvi.in/where.html :P [23:41:22] paravoid: Does puppet not update aptitude before installing? [23:41:32] paravoid: I get the require-execjs failure on gallium now in production [23:41:44] I didn't put the new apt package yet [23:41:47] oh [23:42:03] I thought you did, since you merged the change that puts it in production ;-) [23:46:31] New patchset: Krinkle; "contint: Set ensure latest on ruby-jsduck" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54798 [23:46:46] that wasn't the issue [23:47:02] and I'm hoping it won't need to evolve quickly [23:47:06] anyway, upgraded now [23:47:31] I hope this is the end of it, this has taken far more than ops has the capacity for [23:48:22] paravoid: What I mean is, our local build scripts for developers of mediawiki core and VisualEditor always use the latest version of certain packages, among them jshint, nodejs and jsduck. [23:48:42] Krinkle: that "remove doc generation from svn" thing... looks reasonable to me, but doc.wm doesn't really replace that /doc/ stuff on mediawiki, or does it [23:48:49] Meaning when jsduck releases a new version, we'll likely want to update our package as well. Which should be trivial once this initial step is done. [23:48:58] mutante: It does [23:48:58] please don't [23:49:20] mutante: https://doc.wikimedia.org/mediawiki-core/master/php/html/ === https://svn.wikimedia.org/doc/ [23:49:40] paravoid: How so? [23:49:49] Or does this debian package automatically update? [23:49:53] it doesn't [23:49:59] good [23:50:04] this is a time-consuming process for us [23:50:08] Krinkle: aha. gotcha. just no link on doc.wikimedia.org where we just list "puppet" [23:50:14] and I don't see the benefit of latest and greatest [23:50:23] if there's some particular feature you need, sure [23:50:30] but just latest for no reason, no [23:50:48] mutante: Not yet, but will soon be. I'm currently moving doc.wikimedia.org into integration/docroot.git [23:50:57] New review: Dzahn; "looks reasonable, and replaced by https://doc.wikimedia.org/mediawiki-core/master/php/html/" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/53954 [23:51:04] paravoid: not latest upstream, latest of wikimedia package [23:51:13] Krinkle: arr, now i merged :p [23:51:16] eh? [23:51:16] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53954 [23:51:27] paravoid: It is indeed time consuming to update the package, rebuild and push to our apt repo [23:51:32] yes [23:51:39] not much time [23:51:47] paravoid: last thing we want is to then have to update all puppet manifest where packages are used and increase the require to use that version [23:51:51] but still, we have better things to do than updating random ruby packages [23:52:04] er? [23:52:05] afaik ensure => present will not upgrade if already installed some other version [23:52:09] it doesn't matter what puppet says [23:52:16] if the package isn't there, it doesn't matter [23:52:31] so ensure => latest will update if and when we build a new version of debian package of jsduck. [23:53:01] when we commit, merge, build, upload to apt, reprepro include a new version we can also do a dist-upgrade [23:53:04] or hashar can [23:53:05] it doesn't matter [23:53:10] Krinkle: i'm still going ahead with this, right. that was just about the link.. [23:53:58] Okay, I don't know much about debian and puppet. All I'm saying is, if 2 months from now jsduck fixes a bug or implements a feature that mw core would like to use, I dont want to update the puppet manifest for gallium and ensure a certain version of jsduck. It is work enough to have to rebuild the package. [23:54:14] you don't have do [23:54:19] you can just do apt-get upgrade on gallium [23:54:30] * chrismcmahon haz popcorn [23:54:31] (after we rebuild the package) [23:54:35] paravoid: ? [23:54:59] paravoid: first of all, I don' thave generic sudo on machines, secondly, I dont want to force an upgrade of unrelated random stuff [23:55:25] so what is the difference between ensure latest and present if you want me to manually update it? [23:55:35] * Krinkle is trying to learn [23:55:40] it doesnt break in the middle of the night? [23:56:06] it won't unless paravoid builds a new package in the middle of the night [23:56:10] in which case it is expected [23:56:16] it doesn't matter [23:56:19] right? [23:56:23] let's not put latest then [23:56:29] please :) [23:56:34] also, why do you need sudo on gallium? [23:57:18] paravoid: I'm not sure in what context you want me to answer [23:57:39] apt-get obviously requires sudo, but no I will not use apt-get in production. [23:57:53] If I were to be expected to manually upgrade it locally on gallium, then I'd need sudo yes. [23:58:38] paravoid: Assuming ensure=>latest just means it will automatically upgrade it if and when we (wikimedia) publish a new version of the package. [23:58:40] How is that unwanted? [23:58:43] you have an RT request asking for root [23:58:45] why do you need that? [23:59:31] Because contint still has a fair bit of unpuppetised stuff and I run into crap at least once a week having to wait days or weeks to get it fixed by hashar or someone in ops [23:59:51] which is why I think it'd be appropiate at least for the time being for me (like hashar) to have sudo locally. [23:59:53] I've never seen such a request to fix things from you to ops