[00:00:50] TimStarling: Do know regarding header concern thedj raised on https://gerrit.wikimedia.org/r/#/c/127818/3 ? [00:01:16] I can't find any record of ops concern for squid/varnish or some other part of our stack for why that would justify the meta tag. [00:01:47] (03Abandoned) 10BBlack: Varnish should restart on initscript/defaults changes [operations/puppet] - 10https://gerrit.wikimedia.org/r/115637 (owner: 10BBlack) [00:02:51] who added the meta tag? [00:04:11] it was yaron [00:04:40] https://gerrit.wikimedia.org/r/#/c/103387/ [00:04:52] "Re: doing this as an HTTP header instead; I kinda prefer the meta tag as it's more visible -- headers are black magic you have to dig for when debugging. :)" [00:04:57] says brion [00:05:17] oh, and that was in reply to you [00:05:52] * AaronSchulz likes how gerrit just does not work in iceweasel [00:07:28] TimStarling: Nice find. [00:07:49] the wonders of git gui blame [00:08:49] I usually use github's blame, but it had an outdated index causing it to get stuck between two revisions [00:08:53] AaronSchulz: wfm ? [00:09:29] Krinkle: I can't find any mention of varnish in the linked gerrit/bugz stuff, and I can't think of a good reason why varnish would care about this header [00:09:45] greg-g: it just says "working" all the fucking time so I use chromium for gerrit [00:09:47] but that doesn't necessarily mean there isn't one :) it would be nice if we had a record somewhere of whatever that concern was [00:09:48] the later works fine [00:09:57] no JS errors show up though [00:10:05] god knows what it's trying to do [00:10:38] work [00:10:44] AaronSchulz: :) it definitely taks longer than chrom(e|ium), but it works for me. Blame Google devs for not caring arbout anything other than chrome ;) [00:12:14] greg-g: https://groups.google.com/forum/#!topic/repo-discuss/2JszC4nKdvU not new [00:12:33] adding /1 often helps as per that bug...though it's still slow [00:12:49] * greg-g nods [00:16:48] (03PS1) 10Ori.livneh: webperf/deprecate: use 'meter' rather than 'counter' type [operations/puppet] - 10https://gerrit.wikimedia.org/r/130787 [00:18:14] (03PS2) 10Ori.livneh: webperf/deprecate: use 'meter' rather than 'counter' type [operations/puppet] - 10https://gerrit.wikimedia.org/r/130787 [00:20:32] (03CR) 10Ori.livneh: [C: 032] webperf/deprecate: use 'meter' rather than 'counter' type [operations/puppet] - 10https://gerrit.wikimedia.org/r/130787 (owner: 10Ori.livneh) [00:21:20] <^d> GWT -- "Google (and only Google browsers) Web Toolkit (that's totally for Chrome, duh)" [00:23:47] Gerrit also works in Icewasel 24.4.0 for me; I don't use Gerrit in Chrome enough to compare the performance. [00:30:20] Can someone take a look at the GettingStarted token patches? [00:30:21] https://gerrit.wikimedia.org/r/#/c/130381/ [00:30:56] This is reinstating the reverted code; the issues (hook, cookie name) are also resolved, and it's changed to only add the token if they're on an edit page (including VE), per paravoid's request. [00:32:11] bblack, if you're comfortable, could you review the meta->header patch (https://gerrit.wikimedia.org/r/#/c/127818/1)? [00:32:29] After I stopped the merge (since I didn't know if the Varnish issue was resolved), I saw your comment above and mentioned that on the Gerrit thread. [00:33:56] well, the only bit that bugs me is thedj saying "We should get someone with varnish knowledge to sign off, I remember vaguely that varnish was the reason we preferred the meta element the last time..." [00:34:35] in general random headers that aren't part of the Vary set shouldn't be an issue [00:35:16] (although keep in mind that caching will probably mean your new header won't show up for many anonymous pages for quite some time, as the old versions without the header are in-cache now) [00:35:28] (perhaps *that* was the varnishy concern when it was initially deployed) [00:36:23] in light of that, you might want to leave both versions of the tag in place for ~30 days for caches to expire, before removing the meta-tag [00:36:44] bblack, what's the scenario where neither the meta nor header would show up? [00:36:47] On a given page. [00:37:08] oh, you're right, I guess they'll get one version or the other either way [00:37:32] Okay, I'm going to step out. [00:38:19] I'll go +1 it on varnish issues anyways [00:43:15] (03PS2) 10JGonera: Enable Compact personal bar beta feature on test wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130266 [00:49:38] superm401: re 130381, what's the relationship between that and 130393 ? They seem overlapping (one does the cookie rename, the other does cookie rename + edit-only)? [00:50:00] can we just skip straight to the edit-only version of it? [01:02:13] (03CR) 10Manybubbles: "Looks sane. I can deploy the plugin tomorrow if you'd like. I'll do another review, I think. I have to push out an update to the highli" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:04:52] (03CR) 10Chad: "Would like Aaron/Faidon to chime in as to whether I've got the Swift config right (was mostly guessing, most unsure about 'key')." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:07:12] (03CR) 10MZMcBride: "Neat. I'm excited to try this out. :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130266 (owner: 10JGonera) [01:08:40] (03PS1) 10RobH: adding *.zero.wikipedia.org and zero.wikipedia.org to unified cert SANS [operations/puppet] - 10https://gerrit.wikimedia.org/r/130797 [01:09:13] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [01:10:32] (03CR) 10RobH: "This is NOT to go live unless someone is babysitting the change and ensuring it doesn't break EVERYTHING. As this is the certificate in u" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130797 (owner: 10RobH) [01:11:13] all caps in my messages, i want folks to think im yelling at them ;] [01:26:04] (03PS1) 10BBlack: add DNS for lvs300x public-vlan addrs [operations/dns] - 10https://gerrit.wikimedia.org/r/130800 [01:26:13] (03PS1) 10BBlack: add public addrs to lvs300x over 802.1q [operations/puppet] - 10https://gerrit.wikimedia.org/r/130801 [01:28:09] (03CR) 10BBlack: [C: 032 V: 032] add DNS for lvs300x public-vlan addrs [operations/dns] - 10https://gerrit.wikimedia.org/r/130800 (owner: 10BBlack) [01:28:58] (03CR) 10BBlack: [C: 032 V: 032] add public addrs to lvs300x over 802.1q [operations/puppet] - 10https://gerrit.wikimedia.org/r/130801 (owner: 10BBlack) [01:34:32] (03CR) 10Aaron Schulz: Configure Swift-backed elasticsearch backups (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:37:29] (03CR) 10Faidon Liambotis: [C: 04-1] "I'd like a separate username/password than MediaWiki, so that we can isolate permissions (fault isolation) as well as being able to track " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:45:26] (03CR) 10Chad: "Inline comments to Aaron." (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:58:36] bblack, 381 is essentially a mini-library for the user tokening/bucketting. It will have a future use case, in addition to the 'token on edit' case (once the experiment starts). [01:58:56] 393 is just to assign a token an edit, then instrument edits based on that. [01:59:51] bblack, there's no overlap that I see. 393 adds the cookie to the server-side (it doesn't rename it), since the server-side instrumentation needs it there. [02:04:24] yeah ok [02:04:54] superm401: 393's last comment was about "no need for review yet", should I wait on that one or do the same there now? [02:05:56] bblack, no, it's ready. Patch set 4 did the update I mentioned. [02:07:43] PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 0 [02:08:42] RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2034: active_shards: 6077: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [02:12:45] (03PS3) 10Chad: Configure Swift-backed elasticsearch backups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 [02:22:47] !log LocalisationUpdate completed (1.24wmf1) at 2014-05-01 02:21:44+00:00 [02:22:54] Logged the message, Master [02:27:02] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 3.90333333333 [02:34:38] !log LocalisationUpdate completed (1.24wmf2) at 2014-05-01 02:33:34+00:00 [02:34:45] Logged the message, Master [02:35:52] PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.141 [02:43:46] (03PS1) 10Springle: Switch analytics-store back to dbstore1002. [operations/dns] - 10https://gerrit.wikimedia.org/r/130808 [02:44:47] (03CR) 10Springle: [C: 032] Switch analytics-store back to dbstore1002. [operations/dns] - 10https://gerrit.wikimedia.org/r/130808 (owner: 10Springle) [02:45:33] bleh, its sad, I'm turning off most of the reindex jobs and going to start them one by one [02:45:47] looks like elastic1009 actually crashed [02:45:59] puppet must have restarted elasticsearch, because it is coming back [02:47:02] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [02:47:23] PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2034: active_shards: 5855: relocating_shards: 0: initializing_shards: 28: unassigned_shards: 195 [02:48:18] (03PS1) 10Withoutaname: Create 'noratelimit' user group on dewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) [02:48:39] now its just complaining about it coming back together.... [03:13:23] RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [03:13:42] RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [03:13:44] ^demon|away: should be allcalm now [03:18:46] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu May 1 03:17:39 UTC 2014 (duration 17m 38s) [03:18:52] Logged the message, Master [04:10:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [07:11:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [08:15:27] !log switching s1-analytics-slave db1047 enwiki to tokudb [08:15:34] Logged the message, Master [08:20:37] springle: can you please peek at https://gerrit.wikimedia.org/r/#/c/127909/ ? [08:28:15] (03PS1) 10Odder: Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130815 (https://bugzilla.wikimedia.org/64700) [08:29:24] (03CR) 10Springle: [C: 04-1] "Actually this is obsolete, but admittedly that wasn't obvious from the neglected RT ticket :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/127909 (owner: 10Matanya) [08:29:41] matanya: sorry, forgot about that one [08:32:56] greg-g: :) [09:24:30] _joe_: as the scoping guy, can you please advice on : https://gerrit.wikimedia.org/r/#/c/111787/2 [09:26:27] <_joe_> matanya: "the scoping guy" sounds terrible :P [09:26:37] :D [09:26:48] (03CR) 10Matanya: "Thanks, does the l10nupdate user use this key?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [09:28:15] <_joe_> btw, today is bank holiday here, I'm working on my other (volunteer-only) projects, so don't expect me to be here 100% of the time [09:29:03] <_joe_> matanya: I agree with the comments I see there. [09:29:22] sure they are right, wondering how to fix [09:29:35] and have a great holiday :) [09:31:41] <_joe_> matanya: it *is* ok as it is right now [09:32:08] (03Abandoned) 10Matanya: cache: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/111787 (owner: 10Matanya) [09:32:09] <_joe_> matanya: if $something is declared in the node def, you can reference it in classes onli as $something [09:33:35] yeah, stupid patch [09:33:35] thanks anyway [09:34:42] <_joe_> that is because variables that are undefined will be looked up in hiera, and hiera variables are usually node-dependent [09:43:09] (03CR) 10Matanya: "I'm a bit confused here. I see manifests/role/ldap.pp and modules/ldap/manifests/role/server.pp It seems like the latter only holds the op" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [09:45:18] (03CR) 10Matanya: webserver: fixing duplicate declaration of apache-mpm (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [10:12:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [10:13:58] <_joe_> why do people not acknowledge the alarms if they're being handled? [11:17:42] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 1.66666666667% of data exceeded the critical threshold [500.0] [11:23:32] (03PS1) 10Odder: SVG logos for two non-Wikibooks, non-Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) [11:28:17] (03PS1) 10Springle: Use correct my.cnf option. [operations/puppet] - 10https://gerrit.wikimedia.org/r/130822 [11:32:21] (03PS2) 10Odder: SVG logos for two non-Wikibooks, non-Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) [11:34:55] (03CR) 10Springle: [C: 032] Use correct my.cnf option. [operations/puppet] - 10https://gerrit.wikimedia.org/r/130822 (owner: 10Springle) [12:15:43] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1% data above the threshold [250.0] [12:46:11] (03CR) 10Jgreen: [C: 031] fix apache-fast-test for use in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/130614 (owner: 10Dzahn) [12:51:48] (03Abandoned) 10Manybubbles: Lower udp2log maxage [operations/puppet] - 10https://gerrit.wikimedia.org/r/125247 (owner: 10Manybubbles) [12:57:57] aude: can you link the patch for today's swat deploy? [12:58:04] on the deployments page, I mean [13:11:01] aude: Please link the actual patch needing deployment in the SWAT deploy on [[wikitech:Deployments]] [13:13:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [13:23:11] (03PS1) 10coren: Tool Labs: support some non-work prefixes in crontabs [operations/puppet] - 10https://gerrit.wikimedia.org/r/130825 [13:26:26] (03CR) 10coren: [C: 032] Tool Labs: support some non-work prefixes in crontabs [operations/puppet] - 10https://gerrit.wikimedia.org/r/130825 (owner: 10coren) [13:35:56] (03PS1) 10Manybubbles: All remaining wikis get cirrus betafeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130826 [13:38:48] (03CR) 10Chad: [C: 031] All remaining wikis get cirrus betafeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130826 (owner: 10Manybubbles) [13:38:50] anomie: manybubbles shall have quick as gerrit / jenkins allows [13:39:01] if we are not ready in time, we can do later swat [13:39:27] aude: thanks! [13:39:53] manybubbles: I take it then you're going to handle today's SWAT? Fine with me if you want to. [13:40:04] anomie: either way is fine with me [13:40:37] manybubbles: That's how I feel too, on the days you don't have changes of your own going in [13:40:53] I'll do it then. [13:41:00] this should be our last one for a while [13:41:25] I'm cutting a release right now which intellectually taxing so the distraction shouldn't hurt [13:43:08] being a holiday here, our jenkins should supposedly be faster, less busy [13:44:07] (03PS2) 10Anomie: Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130815 (https://bugzilla.wikimedia.org/64700) (owner: 10Odder) [13:51:44] (03PS1) 10coren: Tool Labs: further tweaks/bugfixes to crontab [operations/puppet] - 10https://gerrit.wikimedia.org/r/130831 [13:53:18] Is Reedy around? [13:54:01] (03CR) 10coren: [C: 032] Tool Labs: further tweaks/bugfixes to crontab [operations/puppet] - 10https://gerrit.wikimedia.org/r/130831 (owner: 10coren) [13:54:30] anomie: Morning. https://bugzilla.wikimedia.org/show_bug.cgi?id=43737#c11 really ought to have someone from ops poke at it, if you know anyone who has time today. [13:55:37] (03CR) 10Anomie: SVG logos for two non-Wikibooks, non-Wikisource wikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [13:56:24] Gloria: can take a look [13:56:37] Thanks! [13:56:43] Gloria: I'm looking too [13:56:58] (03PS1) 10Cmjohnson: adding dns entries for db10[64-73] and fixing tab/spaces [operations/dns] - 10https://gerrit.wikimedia.org/r/130832 [13:57:55] It's definitely the RSS code that's triggering the mediawiki.org issue. [13:58:09] Weird... you'd think wikimediafoundation.org would be affected as well. [13:59:11] Probably it's only triggered by some bit of wikitext or something. But Extension:RSS seems likely due to the backtrace I see in the log [14:00:37] RSS indeed [14:01:05] https://www.mediawiki.org/wiki/Template:RSSPost [14:01:06] Hmmm. [14:01:13] \o/ anomie [14:01:31] and a deleted template? [14:01:37] Maybe that does it... [14:02:42] Gloria: looking at the code, i think so [14:02:47] Gloria, aude: Looks like the problem is that Extension:RSS doesn't check whether the page passed to its "template" parameter actually exists before trying to get the content, and Article::fetchContent() doesn't check either. [14:02:48] it doesn't check if it's deleted [14:03:12] Quick fix would be to undelete the template. [14:03:23] anomie: want to fix the code? [14:03:24] Or kill template="RSSPost"? [14:03:35] * aude needs to prepare my wikidata patch [14:03:51] aude: Yeah, as soon as I figure out whether the fix should be in core or Extension:RSS. Or maybe both. [14:03:58] ok [14:06:52] (03CR) 10Cmjohnson: [C: 032] adding dns entries for db10[64-73] and fixing tab/spaces [operations/dns] - 10https://gerrit.wikimedia.org/r/130832 (owner: 10Cmjohnson) [14:07:19] (03CR) 10Odder: SVG logos for two non-Wikibooks, non-Wikisource wikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [14:16:09] manybubbles: If we can get https://gerrit.wikimedia.org/r/#/c/130835/ reviewed in time, it might be good to add backporting that to the SWAT. [14:20:26] (03PS1) 10coren: Tool Labs: Make xcrontab smarter about environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/130836 [14:22:23] (03CR) 10coren: [C: 032] Tool Labs: Make xcrontab smarter about environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/130836 (owner: 10coren) [14:34:39] anomie: your wish, and all that [14:36:47] * anomie adds to SWAT list [14:38:42] Not much point in doing 1.24wmf1, since everything using RSS is already on wmf2. [14:39:31] and wmf1 will be done in a couple of hours anyways [14:40:00] That would've been my reasoning if RSS had been enabled on one of the Wikipedias. ;) [14:40:27] ha [14:47:43] making core submodule patch now [14:54:59] (03PS1) 10Andrew Bogott: Change UIDs for a bunch of users: [operations/puppet] - 10https://gerrit.wikimedia.org/r/130841 [14:55:28] aude: thanks! [14:55:43] twkozlowski: I'll do your changes first because config changes are quicker. five minute warning [14:56:21] James_F|Away or RoanKattouw_away: are either of you available to verify that your swat change worked once I push it? [14:59:13] manybubbles: linked on the wiki [15:00:36] manybubbles: Yes. [15:00:45] aude: wonderf [15:00:48] wonderful [15:01:04] why did I expect tab to autocomplete wonderful [15:01:13] heh [15:01:14] James_F: k. [15:01:25] aude: i'll do yours first since I haven't heard from twkozlowski [15:01:31] * manybubbles has the conch [15:01:34] ok [15:01:44] * aude has tabs open to verify the fixes [15:03:33] jenkins is merging.... [15:03:41] ok [15:04:01] (03CR) 10Andrew Bogott: [C: 032] Change UIDs for a bunch of users: [operations/puppet] - 10https://gerrit.wikimedia.org/r/130841 (owner: 10Andrew Bogott) [15:07:13] aude: syncing [15:07:35] k [15:07:48] taking a while [15:08:06] James_F: you'll be next once aude verifies her fix [15:08:12] !log manybubbles synchronized php-1.24wmf2/extensions/Wikidata/ 'SWAT update for time parsing and formatting' [15:08:12] Kk. [15:08:16] Logged the message, Master [15:08:21] * aude checks [15:08:37] perfect! [15:08:51] great! [15:09:04] * aude done bug fixing and back to regularly assigned stuff  [15:09:28] aude: feels good [15:10:46] wasn't anything too terrible but not things to leave broken for a couple weeks [15:12:41] Gloria: I am now... [15:17:27] twkozlowski: are you around? [15:18:12] (03CR) 10BryanDavis: "> Thanks, does the l10nupdate user use this key?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [15:19:12] looks like it is clear for a merge [15:26:30] James_F: syncing [15:26:42] Thanks. [15:27:34] !log manybubbles synchronized php-1.24wmf2/extensions/VisualEditor/ 'SWAT update for firefox focus' [15:27:40] Logged the message, Master [15:27:54] James_F: there you go, please verify everything is ok [15:28:41] anomie: starting the merge process on your patch [15:28:43] Yup, looks grand. Thank you! [15:28:48] manybubbles: ok [15:28:53] twkozlowski: still looking for you so I can push your patches [15:33:45] anomie: syncing [15:34:39] ottomata: looks like elastic1008 is ready for management! [15:34:47] ohwweeoo [15:34:48] ok [15:34:48] !log manybubbles synchronized php-1.24wmf2/includes/Article.php 'SWAT update to prevent fatal in backwards compatibility method' [15:34:54] manybubbles: Appears to work. Thanks! [15:34:55] Logged the message, Master [15:34:59] sweet [15:35:14] manybubbles: should I do that now? [15:35:27] ottomata: hmmm.... sure! [15:35:49] !log reassigning a ton of UIDs in production; running a couple dozen 'find' commands to chown files [15:35:56] Logged the message, Master [15:35:57] ok, manybubbles, moving shards off [15:36:06] yay [15:36:21] hm, elastic1008 already has fewer shards [15:36:25] any particular reason why? [15:36:28] {"length":170,"node":"elastic1008"} [15:36:29] manybubbles: ohaio [15:36:33] twkozlowski: yay! [15:36:52] ottomata: because it has less disk to hold the shards and I've pushed a bunch of oversides shards recently [15:36:58] I need to find a better way to get ping notifications [15:37:00] I'll be fixing the shard sizes today [15:37:09] irssi + screen doesn't work too well. [15:37:14] (03CR) 10Manybubbles: [C: 032] SVG logos for two non-Wikibooks, non-Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [15:37:17] (03CR) 10Manybubbles: [C: 032] Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130815 (https://bugzilla.wikimedia.org/64700) (owner: 10Odder) [15:37:24] thanks manybubbles [15:37:30] (03Merged) 10jenkins-bot: SVG logos for two non-Wikibooks, non-Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [15:37:30] sorry to have made you wait so long [15:37:35] (03Merged) 10jenkins-bot: Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130815 (https://bugzilla.wikimedia.org/64700) (owner: 10Odder) [15:38:38] twkozlowski: syncing [15:38:41] twkozlowski: its ok. [15:40:07] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'SWAT fix GWtoolset url and add some more logos' [15:40:13] ottomata: Elasticsearch has disk utilization stoppers that kick in at 85% and 95%, respectively. Rather, I've configured them to do that. at 85% it'll refuse more shards and they'll go elsewhere. at 95% it'll start moving things off. [15:40:17] Logged the message, Master [15:40:24] twkozlowski: there you go, please verify [15:40:31] ahhh [15:40:34] makes sense, cool [15:42:32] (03PS1) 10Manybubbles: Update highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130846 [15:42:44] (03PS1) 10Rush: puppet-lint in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/130847 [15:43:03] manybubbles: The logos didn't change for me yet, but I expect cache [15:43:31] twkozlowski: makes sense to me. I don't know that branch of the code at all, though [15:47:37] * manybubbles puts down the conch for 15 minutes [15:49:39] manybubbles: https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php [15:49:42] can't see it here. [15:50:30] twkozlowski: my fault, syncing again [15:50:33] * manybubbles has the conch [15:51:37] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'SWAT fix GWtoolset url and add some more logos' [15:51:46] twkozlowski: check again please [15:52:20] (03CR) 10Krinkle: "I've addressed Hoo's concerns and rebased the patches into a single mergable stack. He revoked his -1 (though Gerrit doesn't show that)." [operations/puppet] - 10https://gerrit.wikimedia.org/r/130071 (owner: 10Krinkle) [15:53:33] anyone around who can give me a pointer on where to put something puppet? [15:54:28] chasemp: depends on what it is [15:54:49] fair, so I have a defined thing for cpu monitoring w/ diamond [15:55:04] I want it to be included as a base or default or whatever term [15:55:08] * manybubbles puts down the conch [15:55:11] probably similar to how ganglia does it [15:55:16] no idea :/ [15:55:18] but what what I'm seeing makes no sense to me [15:55:24] :) thanks anyway [15:56:06] ottomata might be your best bet. I bother him with monitoring stuff all the time! [15:56:37] ^demon|away: about ready to turn on betafeature everywhere! [15:56:41] wooooooo [15:57:22] greg-g: is now ok for that? I figured it was the same time as the schedule + one day [15:57:34] manybubbles: sure [15:58:40] (03CR) 10Manybubbles: [C: 032] All remaining wikis get cirrus betafeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130826 (owner: 10Manybubbles) [15:58:53] (03Merged) 10jenkins-bot: All remaining wikis get cirrus betafeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130826 (owner: 10Manybubbles) [15:58:53] * manybubbles has the conch again [15:59:16] ottomata: thoughts? [15:59:19] ok so wha? [15:59:32] what is it? [15:59:34] python script? [15:59:38] so a module for diamond exists [15:59:55] and there is a defined type to add a monitoring collector -- cpu / mem whatever [16:00:02] I have a cpu one I want to add [16:00:10] where in the heck do I put it, not a role persay [16:00:18] an ganglia stuff seems to spread all over creation [16:00:46] ah, hm [16:01:04] so, you need to define a define somewhere? this is for all nodes or just one? or just a type? [16:01:06] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'Enable cirrus as a betafeature on all wikis which did not already have it.' [16:01:09] <_joe_> chasemp: IMO for base metrics you should create a class (not a role) you include in the standard class [16:01:13] Logged the message, Master [16:01:20] "'Enable cirrus as a betafeature on all wikis which did not already have it.' [16:01:29] * greg-g pops champagne [16:01:39] <_joe_> (and happy mayday to everyone) [16:01:56] (a little early, the world might still end and we'll have to revert, but pre-emptive champagne is always nice) [16:02:03] _joe_: ditto [16:02:12] <_joe_> greg-g: can I ask you what is cirrus based on? [16:02:13] _joe_: so a class in puppet/manifests/diamond.pp or something within which is a parameterized diamond defined type? [16:02:24] and that is included in standard in site.pp? [16:02:34] Yay mayday [16:02:48] manybubbles: Oops, sorry, been closing an RfA. [16:02:55] <_joe_> chasemp: directly the class in modules/$diamond_module/manifests/init.pp [16:02:59] manybubbles: Checked it right now, and it works! \o/ [16:03:04] chasemp: so, this will be included on every node? [16:03:10] twkozlowski: sweet [16:03:15] eventually yes [16:03:17] ^demon|away: we're live on atleast jawiki [16:03:23] I checked it because I can kinda read it. [16:03:25] kinda [16:03:31] * manybubbles puts down the conch [16:03:35] is the define different for different nodes? [16:03:35] <_joe_> chasemp: at start we can include it in the single nodes [16:03:40] or is it always the same? [16:03:42] _joe_: elastic search backend, php mw extension [16:03:47] <_joe_> ottomata: it should be the same [16:03:48] like, different parameters? [16:03:51] <_joe_> greg-g: ok, ES [16:03:53] in this case same [16:04:02] then, yeah, i think a class is right, probably a class in your module that uses it [16:04:09] except of course, i don't want it to hit labs in any way [16:04:12] and then that class included from either class standard...or maybe even from base module [16:04:19] manybubbles: you put the conch down, you good/feel safe? [16:04:20] <_joe_> ottomata: the paramters IMO should be defined at node level and be looked up in hiera eventually [16:04:34] in site.pp you mean? [16:04:50] <_joe_> chasemp: IF you need to define parameter values, yes [16:04:58] I think the 'role' abstraction is meant for parameterizing things to keep it out of site.pp [16:05:09] but in this case it being different it's odd [16:05:19] ok so [16:05:23] <^demon> :) [16:05:28] thijs would be like diamond::collector { 'CPU': [16:05:29] ... [16:05:30] } [16:05:30] so [16:05:30] yeah [16:05:32] true [16:05:32] probably [16:05:33] <_joe_> chasemp: these are *env* variables [16:05:34] a class in [16:05:44] diamond/manifests/cpu.pp [16:05:46] maybe? [16:05:47] or [16:05:48] <^demon> manybubbles: I'm both here and |away [16:05:49] collector/cpu.pp [16:05:51] then you can just [16:05:53] <^demon> (|away is my bouncer) [16:05:57] include diamond::collector::cpu [16:06:00] I tried that and ori yanked it out saying it should be in base, etc [16:06:05] so I wasn't sure what that means I guess [16:06:05] ? [16:06:18] <_joe_> chasemp: what code review please? [16:06:22] if you have some generic abstractions that are not wmf specific [16:06:31] i think its ok to put them in their own classes in the module [16:06:33] so this: include diamond::collector::cpu [16:06:34] <_joe_> chasemp: we did not a great work in organizing puppet here. [16:06:36] manybubbles: ^demon if you feel like you won't need to revert/are good I'll pass the conch to subbu / gwicke [16:06:41] would be under standard in site.pp in that case? [16:06:43] as long as the are used manually from the users [16:06:49] <_joe_> but, I'm out! bank holiday yay! [16:06:54] chasemp, eventually, i htink so yeah [16:06:54] or [16:06:55] _joe_: enjoy :) [16:06:58] morelikely in base/init.pp somewhere [16:07:02] <_joe_> chasemp: ping me in pvt if you need [16:07:03] which is included by standard [16:07:07] _joe_: thanks man, have a good one! and I will [16:07:21] I think I see what I can do for now at least [16:07:25] actually [16:07:30] trying to model it on ganglia may have been a mistake in this case [16:07:36] maybe base::monitoring??host [16:07:36] dunno [16:07:39] hmm, have there been any recent changes wrt git-deploy? [16:07:51] none of the nodes seem to be syncing [16:07:59] gwicke: i have one coming in soon, but havne't merged it yet [16:08:03] why aren't they syncing? [16:08:06] does this use submodules? [16:08:07] ottomata: I will do diamond::collector::cpu [16:08:16] and then the inclusion for now can be host level to deploy slowl [16:08:17] ottomata, we do use submodules [16:08:21] slowly I guess [16:08:22] <^demon> greg-g: manybubbles' call, I'm just a spectator in all this today :) [16:08:31] give that a whirl, ottomata thanks man [16:08:41] cool, yw [16:08:50] gwicke: submodules have not been working well for me ever. [16:08:57] gwicke, your id got changed today? [16:09:01] my change will maybe help, not entirely sure yet [16:09:06] andrewbogott, ^ [16:09:06] ottomata, they were buggy initially, but have been working since [16:09:09] gwicke: what is the error code from the sync? [16:09:13] 50? [16:09:17] if you get detailed status [16:09:18] wtp1018.eqiad.wmnet: [16:09:18] fetch status: 0 [started: 1134 mins ago, last-return: 1134 mins ago] [16:09:29] 0 is ok...though, no? [16:09:34] fetch? [16:09:38] hm, i haven't seen that problem [16:09:46] usually fetch works, but checkout doesn't for me [16:09:49] subbu, gwicke: I'm still in the process of chowning files. The actual uid stuff should be changed by now... [16:09:55] 0/24 minions completed fetch [16:09:56] greg-g: I seed my time [16:09:56] logging in might be weird if you turn out to not own your homedir yet [16:10:05] I find it unlikely I'm going to have to revert [16:10:24] andrewbogott, could this affect sudo calls to salt? [16:10:24] gwicke: sad:( [16:10:25] !log reedy updated /a/common to {{Gerrit|I832b45db6}}: Correct a domain in wgCopyUploadsDomains [16:10:31] Logged the message, Master [16:10:38] gwicke: I wouldn't think so [16:10:48] Well, actually... [16:11:04] <^demon> manybubbles: You want https://gerrit.wikimedia.org/r/#/c/130846/? [16:11:10] gwicke, i can try the sync and see what happens. [16:11:12] PROBLEM - Apache HTTP on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:11:15] I'm not sure. Naturally if your user account is scrambled on the target then sudo would fail [16:11:20] subbu, go ahead [16:11:31] andrewbogott, the sudo only happens on tin [16:11:33] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130849 [16:11:35] (03PS1) 10Reedy: testwiki to 1.24wmf3 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130850 [16:11:37] (03PS1) 10Reedy: Wikipedias to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130851 [16:11:39] (03PS1) 10Reedy: group0 wikis to 1.24wmf3 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130852 [16:11:41] from there it's all root via salt [16:11:43] ^demon: I'll merge it before I deploy it. If you'd like to give it a ceremonial +1, that'd be cool [16:11:47] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130849 (owner: 10Reedy) [16:11:49] gwicke: should be obvious then; if tin is happy then your personal uid shouldn't matter [16:11:53] since everything happens as root after that [16:12:19] gwicke: log into one of the nodes [16:12:21] cd to the deploy dir [16:12:24] run [16:12:30] salt-call deploy.fetch [16:12:38] I don't have the rights to do so [16:12:39] see what it says [16:12:41] oh, can you do that? [16:12:42] (03CR) 10Chad: [C: 031] "Merge ahoy when you be ready." [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130846 (owner: 10Manybubbles) [16:12:42] ah ok [16:12:43] i will [16:12:48] (03Merged) 10jenkins-bot: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130849 (owner: 10Reedy) [16:13:02] gwicke: what is deploy dir? [16:13:05] (03CR) 10Reedy: [C: 032] testwiki to 1.24wmf3 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130850 (owner: 10Reedy) [16:13:06] ottomata: 1008 is just about ready for you. 1 still on it [16:13:13] ottomata, moment.. [16:13:15] (03Merged) 10jenkins-bot: testwiki to 1.24wmf3 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130850 (owner: 10Reedy) [16:13:16] gwicke, yes, fetch completed for me. [16:13:23] ^demon: I'm going to go get some lunch I think [16:13:37] !log reedy Started scap: testwiki to 1.24wmf3 [16:13:40] gwicke, continue or are we debugging this now? [16:13:44] Logged the message, Master [16:13:45] <^demon> manybubbles: Enjoy [16:13:46] andrewbogott, /srv/deployment/parsoid/deploy [16:14:03] RECOVERY - Apache HTTP on mw1053 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.080 second response time [16:14:08] subbu, andrewbogott: lets finish the deploy first [16:14:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [16:14:15] subbu, can you do the restart too? [16:14:19] will do [16:14:25] thx [16:14:27] gwicke: i don't understand [16:14:29] 0 is a good status? [16:14:34] gwicke: I think you changed Andrews midstream [16:14:34] fetch looks fine to me [16:14:48] andrewbogott, yeah sorry [16:14:48] hehe :) [16:14:58] ;) [16:15:23] ottomata, I'm not sure what the issue is / was [16:15:30] gwicke: what happens if you just go ahead and continue? [16:15:34] to checkout [16:15:35] it worked for subbu, so we are finishing the deploy first [16:15:39] oh ok [16:15:42] ok weird [16:16:37] I would not be surprised if this was some issue with the salt returner stuff [16:16:43] !log reinstalling elastic1008 [16:16:50] Logged the message, Master [16:17:17] waiting for parsoid svc to restart on all nodes .. 22 done. [16:17:34] gwicke, ok, all restarted. now to verify [16:17:46] subbu, don't forget the !log [16:17:58] will do. [16:18:22] !log deployed parsoid 5e05c585 (with deploy sha ca2db96d) [16:18:29] Logged the message, Master [16:19:12] PROBLEM - Host elastic1008 is DOWN: PING CRITICAL - Packet loss = 100% [16:24:22] RECOVERY - Host elastic1008 is UP: PING OK - Packet loss = 0%, RTA = 0.40 ms [16:26:22] PROBLEM - Disk space on elastic1008 is CRITICAL: Connection refused by host [16:26:42] PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.140 [16:27:03] PROBLEM - RAID on elastic1008 is CRITICAL: Connection refused by host [16:27:03] PROBLEM - check configured eth on elastic1008 is CRITICAL: Connection refused by host [16:27:13] PROBLEM - SSH on elastic1008 is CRITICAL: Connection refused [16:27:13] PROBLEM - puppet disabled on elastic1008 is CRITICAL: Connection refused by host [16:27:13] PROBLEM - check if dhclient is running on elastic1008 is CRITICAL: Connection refused by host [16:27:22] PROBLEM - DPKG on elastic1008 is CRITICAL: Connection refused by host [16:27:40] Expected ottomata? [16:27:50] yes [16:27:51] I suspect so [16:27:53] it was logged too :) [16:28:35] Just a reinstall, anyway, carry on [16:28:57] greg-g, thanks. our deploy looks good. we are done. /cc gwicke [16:31:56] anyone ever had debian auto install not able to download preseed.cfg from apt.wikimedia.org? [16:32:15] it looks like it is trying to resolve apt.wikimedia.org to IPv6, and for some reason can't download from that [16:32:21] ^demon: didn't go get lunch - waiting on guy to finish building chicken coop [16:32:42] greg-g: can you ping me when the train is finished today? I have some indexes to start rebuilding.... [16:33:09] manybubbles: yessir [16:33:14] thanks! [16:33:38] <^demon> manybubbles: you have chickens? [16:33:56] milimetric: does too! [16:33:56] ottomata: IPv6 on apt is new IIRC [16:34:10] Reedy: ah, so maybe I'm the first one to have this problem [16:34:12] do you know who set that up? [16:34:13] ^demon: had for years, didn't for a few years, did for a few months, now don't due to fox. getting more tonight [16:34:23] ottomata: akosiaris I think [16:34:28] akosiaris: yt? [16:34:40] 08:30 akosiaris: Published carbon's IPv6 address in DNS. apt.wikimedia.org and ubuntu.wikimedia.org are now IPv6 enabled [16:34:41] (03CR) 10Tim Landscheidt: [C: 04-1] "The *Tools* project isn't meant for Puppet development, and the admins can always install puppet-lint when they set up a self-hosted puppe" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130847 (owner: 10Rush) [16:34:50] oh today? [16:34:53] ottomata: take your time, we're doing just fine on 15 servers [16:34:57] nope [16:35:06] 29th [16:35:09] aye ok [16:36:27] ^demon: can we try the swift thing in beta first? [16:36:51] <^demon> Prolly should. [16:38:17] <^demon> manybubbles: Once we get the thing in archiva we'll do that. [16:38:26] oh, yeah, archiva! [16:38:32] PROBLEM - NTP on elastic1008 is CRITICAL: NTP CRITICAL: No response from NTP server [16:38:33] let me get some lunch. guy is leaving [16:38:45] maybe, in an hour? [16:39:04] <^demon> manybubbles: Sounds good, ping me :) [16:43:32] !log reedy Finished scap: testwiki to 1.24wmf3 (duration: 29m 54s) [16:43:37] Logged the message, Master [16:45:47] manybubbles: I obsessively seal my chicken coops with 1/2" hardware cloth all around (even the floor). Racoons and foxes will dig tunnels to your chickens and even grab them through 1" gaps and eath them through the fence! [16:46:50] milimetric: gah! [16:46:57] milimetric: that is a horrible image! [16:48:43] The foxes in my neighborhood mostly eat house cats. [16:48:44] right? [16:57:16] (03PS1) 10Ottomata: Setting server.use-ipv6 = "enable" for install-server lighttpd.conf [operations/puppet] - 10https://gerrit.wikimedia.org/r/130856 [17:04:06] (03PS4) 10BryanDavis: Send Vary header on http to http redirect [operations/apache-config] - 10https://gerrit.wikimedia.org/r/111925 [17:25:53] never seen this in a dmesg before: [17:25:54] May 1 13:53:09 ytterbium kernel: [22236968.437105] CPU11: Package power limit notification (total events = 10305894) [17:26:25] cpu nerd pal says it could be something as simple as shifting from code which stalls the pipeline to code which does not [17:26:36] as well as other stuff like exercising the FPU [17:26:53] That's a new one to me too. [17:27:19] or it simply upped its clocks and then hit its TDP and backed off [17:28:02] PROBLEM - Host elastic1008 is DOWN: PING CRITICAL - Packet loss = 100% [17:29:12] RECOVERY - SSH on elastic1008 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.3 (protocol 2.0) [17:29:22] RECOVERY - Host elastic1008 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [17:31:13] <_joe_> jgage: wow fgrep the kernel for that [17:36:01] * jgage downloads the kernel source for the first time in far too long [17:36:42] RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [17:37:29] ottomata: oh! [17:37:39] gonna do plugins, I imagine [17:38:21] ottomata: shards are going back to it and I don't _think_ it has the plugins [17:39:20] ottomata: if one of the group0 wikis goes to it then search will break for them [17:40:24] wha, shards shouldn't begoing back to it [17:40:26] i was just running puppet [17:40:46] jgage: so you're even worse than https://www.xkcd.com/979/ ? [17:40:55] manybubbles: are shards going back to it? [17:41:15] {"length":5,"node":"elastic1008"} [17:41:22] uh oh [17:41:26] "_ip" : "" [17:41:32] just drop the exclude back on it [17:41:54] moving off [17:41:54] yeah [17:42:03] RECOVERY - RAID on elastic1008 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [17:42:03] RECOVERY - check configured eth on elastic1008 is OK: NRPE: Unable to read output [17:42:12] RECOVERY - puppet disabled on elastic1008 is OK: OK [17:42:12] RECOVERY - check if dhclient is running on elastic1008 is OK: PROCS OK: 0 processes with command name dhclient [17:42:22] RECOVERY - DPKG on elastic1008 is OK: All packages OK [17:42:22] RECOVERY - Disk space on elastic1008 is OK: DISK OK [17:44:02] Nemo_bis: haha :) [17:44:55] (03CR) 10MaxSem: "According to my estimations, this change is safe to go." [operations/puppet] - 10https://gerrit.wikimedia.org/r/118249 (owner: 10Faidon Liambotis) [17:44:57] ok manybubbles, plugins deployed [17:45:03] ok to move shards back? [17:45:07] paravoid, ^^ :) [17:45:09] ottomata: so long as you bounce it [17:45:18] boucned :) [17:45:54] (03CR) 10Rush: "ah, I did not understand this would call out tools stuff specifically. But I do disagree with puppet-lint being in labs in general. If e" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130847 (owner: 10Rush) [17:45:56] sweet [17:45:58] looks good [17:46:00] ok to more stuff back [17:46:11] cool, its going [17:48:10] ^demon: want to sneek that hangout in before the metrics meeting? [17:48:33] <^demon> Oh I thought we were just gonna do IRC, bad time for a hangout. [17:51:44] ^demon: can do irc [17:54:18] greg-g: any objections to me pushing another version of that highlighter out now? [17:54:25] Its a somewhat convenient time [17:54:27] for me, at least [17:55:11] Reedy: what's your plan re timing? [17:55:22] RECOVERY - NTP on elastic1008 is OK: NTP OK: Offset -0.02268815041 secs [17:59:51] greg-g: my thing doesn't syncing code, only a git-deploy and a restart on the elasticsearch nodes [18:00:34] manybubbles: ah, I don't really know what the highlighter is ;) [18:00:37] manybubbles: sure, go for it [18:00:49] greg-g: sorry, its an elasticsearch plugin [18:01:25] (03CR) 10Manybubbles: [C: 032 V: 032] Update highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130846 (owner: 10Manybubbles) [18:07:42] !log upgrading highlighter plugin on elasticsearch machines - the cluster will go yellow for a few hours during the rolling restart [18:07:48] Logged the message, Master [18:24:26] greg-g: now ish [18:26:20] Reedy: cool, was just wondering for nik, but he moved on :) [18:30:42] PROBLEM - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.109 [18:33:35] hah! now I know manybubbles doesn't read Tech News [18:33:43] ? [18:33:45] * twkozlowski slaps manybubbles around a bit with a large trout [18:33:54] https://meta.wikimedia.org/w/index.php?title=Tech/News/2014/19&diff=next&oldid=8347270 [18:34:09] While in https://meta.wikimedia.org/wiki/Tech/News/2014/18 :-) [18:38:31] (03CR) 10Reedy: [C: 032] Wikipedias to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130851 (owner: 10Reedy) [18:40:49] (03Merged) 10jenkins-bot: Wikipedias to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130851 (owner: 10Reedy) [18:42:31] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf2 [18:42:36] Logged the message, Master [18:48:18] jgage: any luck w/ that statsd package? [18:49:04] (03CR) 10coren: [C: 031] "I have no fundamental objection to linting being done in labs; this sets up the correct expectations as well." [operations/puppet] - 10https://gerrit.wikimedia.org/r/130847 (owner: 10Rush) [18:49:34] (03CR) 10coren: "(But that patch only does tools)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130847 (owner: 10Rush) [18:51:58] (03CR) 10Reedy: [C: 032] group0 wikis to 1.24wmf3 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130852 (owner: 10Reedy) [18:52:07] (03Merged) 10jenkins-bot: group0 wikis to 1.24wmf3 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130852 (owner: 10Reedy) [18:52:42] RECOVERY - ElasticSearch health check on elastic1002 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [18:52:42] PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [18:52:42] PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [18:52:52] PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [18:52:52] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [18:52:52] PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [18:52:52] PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [18:53:03] PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [18:55:31] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf3 [18:55:37] Logged the message, Master [18:59:31] (03PS5) 10Reedy: Fourth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125034 (owner: 10MarkTraceur) [18:59:36] (03CR) 10Reedy: [C: 032] Fourth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125034 (owner: 10MarkTraceur) [18:59:47] (03Merged) 10jenkins-bot: Fourth batch of pilot sites for Media Viewer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125034 (owner: 10MarkTraceur) [19:00:53] (03PS3) 10Reedy: Enable Compact personal bar beta feature on test wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130266 (owner: 10JGonera) [19:01:02] (03CR) 10Reedy: [C: 032] Enable Compact personal bar beta feature on test wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130266 (owner: 10JGonera) [19:01:10] (03Merged) 10jenkins-bot: Enable Compact personal bar beta feature on test wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130266 (owner: 10JGonera) [19:02:42] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 1.66666666667% of data exceeded the critical threshold [500.0] [19:03:27] James_F: About? [19:03:36] manybubbles: ping: other than random config changes, wikis are now at their new versions now [19:03:43] Oh, nvm [19:03:47] greg-g: thanks! [19:04:23] Reedy: Yes. [19:04:26] Reedy: Why? [19:06:08] (03PS2) 10Reedy: Enable VE language editor Beta Feature in whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130717 (owner: 10Jforrester) [19:06:26] Reedy: Messy rebase? :-( [19:07:09] Just other additions above it [19:07:17] And then whitespace changes for the comments [19:08:05] (03CR) 10Reedy: [C: 032] Enable VE language editor Beta Feature in whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130717 (owner: 10Jforrester) [19:08:13] (03Merged) 10jenkins-bot: Enable VE language editor Beta Feature in whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130717 (owner: 10Jforrester) [19:08:33] Reedy: Yeah; maybe I should re-do the comments so the whitespace doesn't change. [19:09:39] (03CR) 10Reedy: [C: 04-1] "Needs rebase" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130718 (owner: 10Jforrester) [19:09:55] Bah. [19:15:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [19:15:42] PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:42] PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:42] PROBLEM - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:42] PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.111 [19:15:42] PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:43] PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:43] PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:44] PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:52] PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:52] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:52] PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:15:52] PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:16:03] PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2040: active_shards: 5674: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 410 [19:17:39] !log reedy synchronized wmf-config/ [19:17:47] Logged the message, Master [19:18:29] andrewbogott: quick question - access requests to cluster for WMF employees (stat1 specifically) - they go to ops-request@rt.wikimedia.org or somewhere else? [19:19:09] HappyPanda: ops-requests or access-requests, either one is fine [19:19:18] hm… access-requests is restricted, maybe you can't create one there; not sure :) [19:19:26] andrewbogott: hah. let me put it on ops-request then [19:19:34] But I'll re-file it immediately anyway [19:27:10] andrewbogott: sent! :) [19:27:27] andrewbogott: also, can you tell me how someone new at the WMF is supposed to attach key? attach to RT ticket, or put it on officewiki? [19:27:57] HappyPanda: they can submit a patch to admins.pp, or they can put their key on their userpage on the office wiki [19:39:53] (03CR) 10Ottomata: [C: 032] "Looks good to me! Lemme know when you are back from lunch and we can merge." [operations/puppet] - 10https://gerrit.wikimedia.org/r/130211 (owner: 10BryanDavis) [19:40:42] PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.112 [19:49:26] andrewbogott: hmm, so am I the only person who can respond? the people cc'd aren't receiving your replies or the auto reply [19:50:11] HappyPanda: I don't see any cc's. [19:50:31] andrewbogott: hmm, I cc'd dbrant@wikimedia.org and tfinc@wikimedia.org (and mentioned it) in the original email [19:50:43] Ah, ok, RT must've ignored that. I'll add them [19:50:48] andrewbogott: ty! [19:53:50] andrewbogott: should I forward your replies and the autoreply back to them? [19:54:00] HappyPanda: sure [19:55:52] RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [19:55:52] RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [19:55:52] RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [19:55:52] RECOVERY - ElasticSearch health check on elastic1015 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [19:56:02] andrewbogott: ty! [19:58:44] (03PS2) 10Jforrester: Remove Nearby BF from whitelist [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130718 [20:00:42] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1% data above the threshold [250.0] [20:04:59] ottomata: I'm around now if you want to merge the trebuchet patch [20:05:53] andrewbogott: any idea if my wikitech username/password (username is dr0ptp4kt, password is different than ssh passphrase) is supposed to work on logstash? i can't seem to log in there [20:06:00] (at logstash web interface) [20:06:04] cool [20:06:17] (03PS2) 10Ottomata: Add scap/scap trebuchet target [operations/puppet] - 10https://gerrit.wikimedia.org/r/130211 (owner: 10BryanDavis) [20:06:22] (03CR) 10Ottomata: [C: 032 V: 032] Add scap/scap trebuchet target [operations/puppet] - 10https://gerrit.wikimedia.org/r/130211 (owner: 10BryanDavis) [20:06:27] dr0ptp4kt: You have to be in the "wmf" ldap group to login to logstash [20:06:48] dr0ptp4kt: I don't know anything about logstash [20:07:03] bd808, andrewbogott, thx, what's the best way to make the request for access? [20:07:17] "Ask ^demon|away" [20:07:42] Chad usually handles adding folks to that group [20:10:22] !log Deployed scap 92ea0e9 via trebuchet (not actively used yet) [20:10:29] Logged the message, Master [20:17:28] ottomata: Looks like it's working. The number of registered minions keeps ticking up as puppet runs around the cluster. [20:19:03] ok awesome [20:20:03] Now I need to apply the second part in beta and make sure that works too. [20:21:30] <_joe_> bd808: are you the right person to bug if I want to understand better how we deploy software? [20:22:04] Sure. I think I have a TODO here to talk to you about such things [20:22:20] so many things to do, so little attention span [20:22:23] <_joe_> bd808: oh ok [20:22:26] * matanya is listening to [20:22:28] o [20:23:13] <_joe_> bd808: then I will bug you about this :) [20:23:34] The short answer is that we have 2 different systems: scap and trebuchet [20:23:44] scap is used to deploy MediaWiki [20:24:10] trebuchet is used to deploy parsoid and various other things [20:24:45] <_joe_> trebuchet is something we built internally? [20:24:57] <_joe_> scap, I get it's bash over ssh, right? [20:25:41] trebuchet was made by RyanLane. It has fairly good docs at https://wikitech.wikimedia.org/wiki/Trebuchet [20:25:58] <_joe_> sorry, it's just curiosity, it's pretty late in the evening here and I won't have the attention span to follow everything [20:26:06] He's making it into a real open source project which is cool [20:26:22] scap was mostly bash. Now it's mostly pythong [20:26:26] python [20:26:50] The best docs for it right now are at https://doc.wikimedia.org/mw-tools-scap/docs/_build/html/ [20:28:00] Trebuchet uses salt and scap uses ssh + ssh-agent (yuck) [20:28:02] RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:28:32] PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:32] PROBLEM - ElasticSearch health check on elastic1001 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:32] PROBLEM - ElasticSearch health check on elastic1016 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:42] PROBLEM - ElasticSearch health check on elastic1002 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:42] PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:42] PROBLEM - ElasticSearch health check on elastic1005 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:42] PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.141 [20:28:42] PROBLEM - ElasticSearch health check on elastic1013 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:43] PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:43] PROBLEM - ElasticSearch health check on elastic1008 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:44] PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:51] _joe_: We should schedule a time to talk that's not in the middle of the night for you. :) [20:28:53] PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:53] PROBLEM - ElasticSearch health check on elastic1007 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:53] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:53] PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 15: number_of_data_nodes: 15: active_primary_shards: 2040: active_shards: 5667: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 417 [20:28:56] <_joe_> this doesn't llok good [20:29:08] bleh [20:29:14] its under control.... [20:29:17] stupid thing [20:29:18] <_joe_> is this expected? [20:29:20] <_joe_> ok [20:29:21] <_joe_> :) [20:29:23] (03PS2) 10Rush: puppet-lint in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/130847 [20:29:24] ES is a pain in butt [20:29:31] rolling restart rolled too far? [20:29:35] its the check that is busted [20:29:46] bd808: some index doesn't have replica's, I imagine. [20:29:51] probably labs [20:29:55] because why listen to me [20:30:31] * bd808 always listens to Nik [20:30:43] <_joe_> bd808: I'll read the docs first, then set up a meeting. I'm pretty head-down in puppet at the moment [20:31:17] _joe_: Sounds good. [20:31:52] <_joe_> what TZ are you in? [20:33:58] _joe_: MST/MDT. I'm UTC-6 this time of year [20:35:24] <_joe_> ok, noted [20:35:47] <_joe_> I'll try to find a time that's not too uncomfortable for either of us [20:36:24] _joe_: My early morning should match up with your late afternoon I think [20:36:41] <_joe_> yes, it's an 8 hour difference [20:37:04] bd808: its actually that some of the reindexes that I did last night failed to complete and they left index with 0 replicas laying around. I should have checked.... [20:37:46] <_joe_> manybubbles: ugh, is ES the pain in the ass I always thought it is then? [20:38:24] _joe_: its a distributed system we don't have all the right tooling around - so yes [20:38:34] <_joe_> It *is* fancier than solr, but it always seemed less solid to me. [20:38:37] _joe_: I'm usually at my keyboard by 15:00Z but can easily be on at 14:00Z or a little earlier with some notice. [20:39:29] Elasticsearch beats the crap out of solr for realtime replication; at least in the last head to head I put them through [20:39:30] manybubbles: when cirrus will be primary search on large weeks ? [20:39:33] <_joe_> bd808: no problem at all, I'm available well after 17 UTC [20:39:43] *wikis [20:39:57] boy am i tired [20:40:47] matanya: on most large wikis? I'm not sure. technically the only thing holding us brack from that is worry that the users will hate it. they probably won't. its primary on _one_ large wiki now. [20:40:55] <_joe_> bd808: I never had a problem at the scale of what we have here. and I never considered realtime replication to be as important as solidity in failover and general performance on search [20:41:09] on _all_ large wikis - when I can get it performant enough [20:41:29] which one wiki manybubbles ? [20:41:37] matanya: itwiki [20:41:46] its top 10 or so [20:41:49] <_joe_> oh wow what an honour [20:41:58] enwikisource is pretty big [20:42:03] we're primary there [20:42:06] <_joe_> it's top 5 for size (in number of articles) [20:42:22] manybubbles: and what is the problem with the 5 wikis you mentioned in your mail ? [20:43:03] matanya: these ones: Japanese, Hebrew, Polish, and Chinese) [20:43:08] _joe_: The use case can make all the difference. The ES cluster I built was indexing credit card transactions in real-time. Not at the scale we take in new edits here but a few thousand per minute. [20:43:15] yes those manybubbles [20:43:27] _joe_: they are pretty happy with it, for the most part, so I'm happy [20:43:31] <_joe_> bd808: ooh OK [20:43:48] matanya: we have access to better analyzers that we _should_ be able to just plug in and deploy [20:43:59] which should make finding things better in those languages [20:44:00] <_joe_> manybubbles: I'm doing weird searches right now :) (I'm italian, btw) [20:44:06] not just the wikipedias [20:44:15] _joe_: cool! have fun. let me know if it breaks [20:44:46] enwiki is the only one we that we can't handle the load for yet - I've been working on it [20:46:21] <_joe_> manybubbles: out of curiosity, why did we decide to move to ES? [20:46:31] manybubbles: can you please elaborate on the better analyzers ? [20:47:28] matanya: better means better able find the words that users are looking for. or, supposed to be better. anyway, take japanese, for example. [20:47:42] RECOVERY - ElasticSearch health check on elastic1005 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:47:42] RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:47:42] RECOVERY - ElasticSearch health check on elastic1002 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:47:42] RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:47:42] RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:47:43] RECOVERY - ElasticSearch health check on elastic1008 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:47:43] RECOVERY - ElasticSearch health check on elastic1013 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:47:44] RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [20:47:54] I'd prefer to take hebrew manybubbles :) [20:47:58] ガリレオは、物体の運動の研究をする時に might looks like two words, but is more like 5 [20:48:02] PROBLEM - ElasticSearch health check on elastic1010 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.142 [20:48:07] andrewbogott: another access-request sent. can you re-add the two in cc? [20:48:09] ohh, hebrew then [20:48:22] HappyPanda: I can… did the last one work the way you expected? [20:48:24] andrewbogott: dammit, forgot to edit subject. It is for mhurd, not dbrant [20:48:41] I bet you can edit it in rt [20:48:42] andrewbogott: seems to. tfinc's approval came to me and dbrant [20:48:47] if you created the ticket, I think you can... [20:49:43] andrewbogott: let me log in [20:50:36] any example in hebrew manybubbles ? i'd like to bring this subject to the community, in order to promote cirrus [20:50:44] andrewbogott: can't find an 'edit' button in https://rt.wikimedia.org/Ticket/Display.html?id=7401 [20:51:05] matanya: sorry, trying to find one. [20:51:14] a specific one, that we're not accidentaly getting for free [20:51:15] HappyPanda: try under 'Basics'? [20:51:20] HappyPanda: just mail 7401@rt.wikimedia.org [20:53:07] matanya: found one: this finds stuff: [20:53:08] https://he.wikisource.org/w/index.php?search=%D7%91%D7%A8%D7%95%D7%9A+%D7%A7%D7%95%D7%A8%D7%A6%D7%95%D7%95%D7%99%D7%99%D7%9C&title=%D7%9E%D7%99%D7%95%D7%97%D7%93%3A%D7%97%D7%99%D7%A4%D7%95%D7%A9&go=%D7%9C%D7%93%D7%A3 [20:53:14] but this finds nothing: [20:53:16] https://he.wikisource.org/w/index.php?search=%D7%91%D7%A8%D7%95%D7%9A+%D7%A7%D7%95%D7%A8%D7%A6%D7%95%D7%95%D7%B4%D7%99%D7%9C&title=%D7%9E%D7%99%D7%95%D7%97%D7%93%3A%D7%97%D7%99%D7%A4%D7%95%D7%A9&go=%D7%9C%D7%93%D7%A3 [20:53:27] when it really ought to find the same stuff [20:53:59] the latter should really not find anything [20:54:00] matanya: though, that is an example of something that is getting better in cirrus [20:54:47] matanya: sorry, I'm copy and pasting blindly from a language I can't read. [20:54:57] the example i was trying to find [20:55:16] do you want some help from me? [20:55:33] matanya: sure! I'd love some examples of things that don't work but ought to [20:55:58] can i enable cirrus on some hebrew wiki ? [20:56:06] matanya: its a betafeature [20:56:28] yea, enabled [20:56:33] or you can search and add &srbackend=CirrusSearch to the url [20:56:41] that'll let yo compare them side by side [20:57:01] oh, that isn't good [20:57:15] it doesn't find exact matching [20:58:00] matanya: if you can send me an example I'll look [20:58:09] the thing we're looking to plug in is http://code972.com/hebmorph [20:58:35] https://he.wikipedia.org/w/index.php?search=%D7%91%D7%A8%D7%95%D7%9A+%D7%A7%D7%95%D7%A8%D7%A6%D7%95%D7%95%D7%99%D7%9C&title=%D7%9E%D7%99%D7%95%D7%97%D7%93%3A%D7%97%D7%99%D7%A4%D7%95%D7%A9&go=%D7%9C%D7%A2%D7%A8%D7%9A [20:59:53] matanya: I can't believe it. a good friend of mine wrote this [20:59:54] matanya: it is screwing up "קורצוויל" somehow [21:00:20] well, we'll plug it in and if it isn't better you can bother him for me [21:00:43] I sure will :) [21:00:53] PROBLEM - ElasticSearch health check on elastic1011 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.143 [21:01:20] HappyPanda: looks like you got that RT ticket organized the way you want it? [21:01:30] that one is because it is restarting, I'll bet [21:01:42] yup [21:02:11] matanya: let me deploy his plugin to beta and we'll see what it does [21:02:25] ok, great [21:02:30] andrewbogott: yeah, think so [21:02:54] andrewbogott: can you respond with the standard 'ssh key and manager approval please' email? [21:02:54] one the plus side, it finds things i didn't believe it can [21:03:21] HappyPanda: yep, one sec [21:08:40] (03PS1) 10Manybubbles: Add hebrew analyzer [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130969 [21:09:02] matanya: ..... that doesn't sound good? [21:09:15] (03CR) 10Manybubbles: [C: 04-1] "-1 until we love it in beta" [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130969 (owner: 10Manybubbles) [21:09:35] (03CR) 10Manybubbles: "Deploying to beta" [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130969 (owner: 10Manybubbles) [21:12:32] i'm looking manybubbles [21:13:15] matanya: not yet deployed, but if you want to copy some text into http://he.wikipedia.beta.wmflabs.org/ it'll give us something to find [21:13:48] looking at the code, not the wiki [21:14:05] i poked bd808 to get some content [21:14:47] ah, I'm not sure how that part works:) [21:15:26] matanya: Did I miss a question? [21:15:57] about a week ago i asked you how to import some portion of he.wiki into he.wikipedia.beta.wmflabs.org [21:16:03] RECOVERY - ElasticSearch health check on elastic1010 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [21:16:28] you said you'll get back to me, i guess it was lost in the gazillion tasks you have [21:16:31] Apparently I did miss that, or just spaced out [21:16:42] PROBLEM - ElasticSearch health check on elastic1012 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.144 [21:17:13] I tried to use my on wiki powers to do so, but failed [21:17:14] matanya: I've rebuilt the index using the new analyzer - anything added ought to show up there pretty soon. [21:17:24] thank you [21:17:40] i'll check it out, or just right a bot to add content [21:17:58] though i really prefer a normal wiki import [21:18:12] My wiki fu is low. The only way I'd know to do it is Special:Export/Import [21:18:41] yeah, Special:Export/Import [21:18:50] i have the export [21:18:53] however, Special:Import usually time outs on anything big enough, so perhaps Special:Export [21:18:56] and then import from commandline [21:18:57] but can't import [21:18:58] that sounds more doable [21:19:05] <^d> Everyone has export. [21:19:11] <^d> Import we could easily grant to beta. [21:19:14] got to step away for a bit [21:19:29] that is what i was looking for, an import within shell [21:19:41] i have import rights on beta [21:19:48] but files are too big [21:19:50] yeah. [21:19:59] someone with shell needs to do it [21:19:59] <^d> Then export smaller pages :) [21:20:05] <^d> Or without full history. [21:20:10] <^d> (History's boring for search) [21:20:21] i used dump.wikimedia.org [21:20:38] s [21:21:06] Ah. I could learn how to run the right maintenance script I suppose. It seems like I should know how to due that anyway [21:21:42] <^d> bd808: Easy script to run. [21:21:45] matanya: Is your dump in labs somewhere? [21:21:51] bd808: https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps [21:21:54] <^d> `mwscript importDump.php --wiki=hewiki ...` [21:22:24] it is mounted on nfs in labs [21:22:52] matanya: Which project? [21:23:02] just grab : https://dumps.wikimedia.org/hewiki/20140415/hewiki-20140415-pages-meta-current.xml.bz2 [21:23:12] I'm away, but I should point out that we don't normally import whole wikis into labs [21:23:13] it is internal lan anyway [21:23:23] * bd808 nods [21:23:23] matanya: all of hewiki? that doesn't sound like the best of ideas, and I don't know if betalabs will live with that [21:23:41] How big is it? [21:23:44] matanya: I remember hashar deleting a full simplewiki dump on betalabs saying it took up tooo much disk space on the mysql hosts [21:23:59] 1.1 GB [21:24:06] When expanded? [21:24:10] yes [21:24:12] <^d> We don't need so much content just for testing this. [21:24:21] <^d> Why is generating a small partial export not possible? [21:24:24] Export a few big cateogries? [21:24:31] i don't know how to strip down a dump [21:24:40] <^d> matanya: Use Special:Export, like I said before. [21:24:41] matanya: Special:Export lets you pick categories [21:24:44] ok Reedy good idea [21:25:08] <^d> Grab a category or two full of pages, with their templates, no history. [21:25:19] <^d> Should give us enough to test without overloading things. [21:26:53] RECOVERY - ElasticSearch health check on elastic1011 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [21:26:53] RECOVERY - ElasticSearch health check on elastic1007 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [21:26:53] RECOVERY - ElasticSearch health check on elastic1015 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [21:26:53] RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [21:27:29] ^d: /home/matanya on tools login [21:27:36] file named he.wiki [21:27:46] he.wiki.xml [21:27:48] <^d> I don't have a tools account lolol :p [21:28:00] i can move it to labs [21:28:09] what project you have access to? [21:28:20] puppet? [21:28:34] * bd808 can get it from tools [21:29:35] cd: /home/matanya/: Permission denied [21:29:58] can't help that [21:30:23] matanya: Copy it over to the logstash project. I can get it from there. [21:30:41] bd808: fqdn? [21:30:56] matanya: logstash-dev.eqiad.wmflabs [21:31:29] in /tmp there bd808 [21:32:39] andrewbogott: oh, so the new request (mhurd) was apparently already filed. 7345. Can you close the new one? [21:34:37] matanya: Got it. Now just a few config problems to work through on deployment-bastion... [21:34:45] HappyPanda: looks like we never got a key for mhurd? [21:36:47] matanya: {{done}} (I think) [21:36:55] Thanks! [21:37:40] bd808: nope, don't see new pages [21:38:10] hmmm... [21:38:21] and nothing in import log [21:39:03] matanya: http://he.wikipedia.beta.wmflabs.org/wiki/%D7%A7%D7%98%D7%92%D7%95%D7%A8%D7%99%D7%94:%D7%90%D7%99%D7%A9%D7%99%D7%9D ? [21:39:24] oh, sigh [21:39:25] HappyPanda: if you want to speed up the process you can write a puppet patch and link me to it. Make sure that it uses the same username and uid as labs though. (that is: mhurd, 3010) [21:39:41] only the categories, no articles [21:41:07] Are there articles in the dump? If not, just make a new one and I'll slurp it in too [21:42:21] doing bd808 [21:46:47] there bd808 same name, same location [21:46:47] andrewbogott: I'm just going to let it resolve whenever mhurd has the time to resolve it. [21:47:34] HappyPanda: ok [21:49:53] PROBLEM - ElasticSearch health check on elastic1014 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.11 [21:50:12] PROBLEM - Puppet freshness on hafnium is CRITICAL: Last successful Puppet run was Thu May 1 18:49:24 2014 [21:55:22] matanya: Still loading.... 800 so far [21:55:36] hmm, just 10 mb [21:56:05] beta is not fast. :) [21:56:33] it's loading at 2 pages/second [22:05:12] * AaronSchulz grrs at https://github.com/nicolasff/phpredis/issues/440 [22:06:03] manybubbles|away: when you are back, did some tests, some improvement, some worse. [22:07:51] matanya: sad [22:08:19] is the index rebuilt already ? [22:08:53] RECOVERY - ElasticSearch health check on elastic1014 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [22:08:53] PROBLEM - ElasticSearch health check on elastic1015 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.12 [22:09:05] looks like the blog is down... [22:10:00] ping is ok [22:10:06] Error 503 Service Unavailable Service Unavailable Guru Meditation: XID: 76548248 [22:10:22] wfm [22:10:25] back up now [22:10:39] I was getting the 503 for about 3-4 minutes from when I first saw it [22:11:42] it was loading slow for me too, but it did load [22:11:43] ^^ RobH [22:12:06] yeah, back to loading very slow but haven't got the varnish error again yet [22:12:07] is the w3 caching plugin live? [22:12:22] and back to the 503 error [22:12:33] I don't know... I can't login [22:12:46] I thought we had turned it off a while ago.. [22:12:54] matanya: I'll check [22:13:40] manybubbles|away: e.g ?? which is a rabbi, returns any word that contains those two letters [22:14:24] matanya: ? is wildard syntax in most languages [22:14:29] Jamesofur: well, the plugin is activated in the admin panel [22:14:34] rather, it is normally [22:14:35] but on the other hand ?'???? witch is the word jihad in hebrew returns nothing those i know there are articles containing the word [22:14:37] we're might have to replace it [22:14:48] ...i meant "live" in the sense of "working" ;) [22:14:50] HaeB: if one of us can get in should we deactivate it? [22:14:57] ssh deployment-bastion.eqiad.wmflabs ' mwscript maintenance/showJobs.php --wiki hewiki --group' [22:14:57] fair [22:15:08] that sais there are articles left to be indexed [22:15:14] 2877 of them [22:15:32] (it's also a few updates behind the current version) [22:15:57] no, we were told not to mess with it ;) [22:15:59] matanya: ? is not going to do the right thing. can you file a bug with the words that contain "?"? [22:16:09] because I have to escape them, or something [22:16:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [22:16:16] because they get turned into syntax [22:16:30] " turns into syntax too [22:16:45] matanya: https://gerrit.wikimedia.org/r/#/c/130799/ [22:17:15] <^d> 130799 can be merged other than that one nitpick I left. [22:17:24] ^d: I replied, I think [22:17:47] manybubbles|away: " != ? [22:17:57] ? [22:17:59] ? doesn't exist in hebrew [22:18:08] only " [22:18:17] and can be used within words [22:18:25] I though you sayd that ?? was rabbi [22:18:26] and as a quote [22:18:42] oh, UTF things [22:18:49] matanya: yeah, that is that gerrit thing I sent you, tries to figure out if the " is a quote or inside a word [22:18:51] <^d> manybubbles: Ah duh, you did. [22:18:57] <^d> And you're right [22:18:59] oh, UTF things [22:19:09] <^d> Merged to master. [22:19:11] i typed a word in hebrew [22:19:17] you got question marks [22:19:29] matanya: ah, my terminal normally displays hebrew. I can't read it, but it spits it out [22:19:48] <^d> No wonder I was confused. My client usually has good utf-8 support. [22:19:51] i'll use links instead [22:19:52] <^d> I was like what's with the ?????'s [22:20:08] http://he.wikipedia.beta.wmflabs.org/w/index.php?search=%D7%A8%D7%91%D7%A0%D7%99%D7%9D&title=%D7%9E%D7%99%D7%95%D7%97%D7%93%3A%D7%97%D7%99%D7%A4%D7%95%D7%A9&go=%D7%9C%D7%93%D7%A3 [22:20:19] this is the plural of a rabbi in hebrew [22:20:32] PROBLEM - ElasticSearch health check on elastic1016 is CRITICAL: CRITICAL - Could not connect to server 10.64.48.13 [22:20:35] matanya: I think it worked better this time -- http://he.wikipedia.beta.wmflabs.org/wiki/%D7%9E%D7%99%D7%95%D7%97%D7%93:%D7%A9%D7%99%D7%A0%D7%95%D7%99%D7%99%D7%9D_%D7%90%D7%97%D7%A8%D7%95%D7%A0%D7%99%D7%9D [22:20:56] yeah, totally! thanks bd808 !! [22:20:57] you may want to check if the page you are looking for is indexed yet... [22:21:04] wait, what happened? [22:21:21] i got a not related page as first result [22:21:31] not related at all [22:21:48] the second one is a good match though [22:22:30] has 3 Inflections of the word [22:22:47] <^d> We should force run all those jobs. [22:22:55] ^d: beta is just slow [22:23:45] <^d> They're all in batches of (1). [22:23:49] <^d> Since it's import jobs. [22:23:56] this is a great example: http://he.wikipedia.beta.wmflabs.org/w/index.php?search=%D7%9E%D7%95%D7%A8%D7%94&title=%D7%9E%D7%99%D7%95%D7%97%D7%93%3A%D7%97%D7%99%D7%A4%D7%95%D7%A9&go=%D7%9C%D7%93%D7%A3 [22:23:59] ^d: bleh [22:24:09] <^d> Won't take long [22:24:11] <^d> Running them all now [22:24:33] 7 correct finding, 8th being nice idea, but not related to the search. has other meaning of the Inflections [22:25:03] <^d> Just wait a minute, let's catch up the jobs. [22:25:05] <^d> Then see. [22:25:40] matanya: I wonder if I've misconfigured something with it - I do really have to go be with my family now, but I can have a look at it in the morning. there is a special analyzer for hebrew your are supposed to use for the query (as opposed to the text) maybe I'm not doing that [22:25:46] also, there is a hebrew light analyzer I can try [22:25:55] it'll pull back fewer results but might be less crazy [22:26:17] thanks a lot and good night [22:26:27] see you tomorrow [22:29:51] greg-g: around ? [22:30:54] nvm, i'm off. [22:31:02] <^d> Jobs all done on hewiki beta. [22:31:38] thanks ^d i'll test it tomorrow [22:31:48] <^d> cool cool, have a good night [22:37:01] matanya: sorry, just missed ya [22:37:12] greg-g: still around [22:37:17] heya [22:38:25] hope you are doing well. few things: did you proceed with the volunteer ACL stuff with LCA ? [22:39:28] (03PS4) 10BryanDavis: [WIP] Provision scap scripts using trebuchet [operations/puppet] - 10https://gerrit.wikimedia.org/r/129814 [22:39:45] matanya: still in progress [22:40:27] are we moving to cloudbee hosted jenkins ? [22:40:57] We are moving off of cloudbees [22:41:09] i meant off, sorrt [22:41:11] y [22:41:19] to SauceLabs ? [22:41:47] we already use sauce, will continue to [22:41:54] but our Jenkins will be local-only [22:42:04] sauce for the browser coverage they have [22:42:37] Maintaining a browser farm for testing is madness. [22:42:50] It's much nicer to outsource that [22:42:53] RECOVERY - ElasticSearch health check on elastic1015 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [22:42:56] https://saucelabs.com/ - hah, you know, if you used a real keyboard you wouldn't need those wristbraces [22:43:33] RECOVERY - ElasticSearch health check on elastic1001 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [22:43:33] RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [22:43:33] RECOVERY - ElasticSearch health check on elastic1016 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [22:43:42] RECOVERY - ElasticSearch health check on elastic1012 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [22:44:13] and last question, when will Compact Personal Bar come as a testable beta feature ? [22:44:33] for me it is currently less than alpha [22:44:51] matanya: now on testwikis, tuesday on non-wikipedias, thurs on wikipedias [22:45:03] sorry, not this feature [22:45:13] the one with fixed bar [22:45:23] fixed bar? [22:45:37] not this? https://www.mediawiki.org/wiki/Compact_Personal_Bar [22:45:56] no, that one is great [22:46:05] Fixed header [22:46:13] https://www.mediawiki.org/wiki/Winter [22:46:26] oh, that [22:46:34] Winter is a big'un [22:46:41] no idea on eta with it [22:46:53] i love the idea, but not testable yet [22:47:25] the mock up test page: http://unicorn.wmflabs.org/winter/ [22:47:48] ok, good answers, thanks a lot. will bug you in a few weeks again i guess with the first question. [22:48:00] :) [22:48:10] follow the fun at https://www.mediawiki.org/wiki/Wikimedia_Release_and_QA_Team#April_-_July_.2714_Goals_Progress [22:48:19] (sorry for all the red, new quarter, new not done things) [22:48:41] i peek there from time to time [22:49:41] I'll do the SWAT [22:49:54] MaxSem: yay [22:49:57] welcome back [22:50:03] * greg-g un''s you [22:50:03] ;) [22:50:11] oh, you already di [22:58:06] MaxSem: tgr [22:58:22] !log disabling puppet on holmium; manually overriding completely broken varnish config [22:58:28] Logged the message, Master [22:58:46] err, completon fail:) [22:59:35] hah [23:02:37] paravoid: thank you very much for looking into the blog issues [23:05:16] HaeB, paravoid is looking into the (very messed up) blog config, we may have to disable blog pingbacks [23:05:28] HaeB, Jamesofur but performance should be much improved for now [23:05:47] yes I should had made these comments here :) [23:05:49] thanks Eloquence [23:05:58] thanks Eloquence, paravoid ! [23:06:13] i still wonder if the caching plugin shoudl be updated [23:06:28] pages are being properly cached, but pingbacks are inherrently uncacheable [23:06:45] and this is what creates load at this time [23:08:27] !log maxsem synchronized php-1.24wmf2/extensions/CommonsMetadata/ 'https://gerrit.wikimedia.org/r/#/c/130971/' [23:08:33] Logged the message, Master [23:08:43] tgr, ^^^ please verify :) [23:08:51] HaeB: how useful are pingbacks to you folks? [23:08:53] paravoid: i see .. so just to be clear, we are talking about pingbacks our blog sends to others, or vice versa? [23:10:05] MaxSem: all good, thanks! [23:10:56] both are not super essential, and since we're just a few weeks away from moving the blog to third party hosting anyway... [23:12:00] greg-g, ^^^ - I'll keep an eye on logs for a few minutes but otherwise SWAT looks complete [23:12:20] MaxSem: coolio [23:12:38] HaeB, I thought this stuff is mostly used for spam these days:P [23:14:59] HaeB, paravoid - let's turn off pingbacks for now as a precaution so we don't have to worry about things falling over later [23:16:06] MaxSem: that's pretty much all of blogging right? :) [23:18:23] MaxSem: always the rosy-eyed optimist ;) [23:19:22] ok, I removed X-Pingback from varnish, so that's gone [23:19:29] I also removed the source of most of our pingbacks [23:19:33] which may or may not have been malicious [23:19:36] we do get useful information through them about legitimate blogs linking to us, but yes, there is also a large portion of copycat spam blogs who e.g. copy+paste a mashable article (togeher with a link to blog.wikimedia.org) [23:19:57] What do you mean, paravoid? [23:20:17] which may or may not have been malicious [23:20:27] I'd rather not expand on that further, as it's sensitive in nature [23:20:35] Okay./ [23:21:21] ...but like i said, it's more a nice to have thing [23:21:28] HaeB, Eloquence: with these hotfixes, load has fallen tremendously and we should be ready for much more load [23:21:42] paravoid, thanks for the late night intervention. [23:21:56] (and we made them invisible a while ago, so the blog admins do not have to judge every time whether it's a spammy site that we don't want to link back, or legit) [23:22:03] yes, much appreciated paravoid [23:38:46] puppet noob question: how do I tell puppet what to run when a role is disabled? [23:43:15] !log Restarted logstash on logstash1001; MaxSem noticed that many recursion-guard logs were not being completely reassembled and JVM had one CPU maxed out. [23:43:20] Logged the message, Master [23:44:18] >) bd808 [23:45:41] MaxSem: I think it was sick. Log input volume jumped dramatically. [23:47:29] I'd really really like to change how logstash gets logs. The current system was a hack to prove it could work. [23:50:18] greg-g, to get something in for ld/swat, when would be the next window? if the answer is to read a webpage, sorry! [23:51:45] Monday morning is the next normal deploy window [23:52:44] dr0ptp4kt: There's no table for next week yet, but https://wikitech.wikimedia.org/wiki/Deployments is where is will show up eventually. [23:53:51] 2014-05-05T15:00Z will be the next SWAT window. [23:54:17] what bryan said [23:54:42] * bd808 thinks that page should really be an app of some sort [23:58:15] bd808: yes. [23:59:17] greg-g: Your project for the hackathon? :) [23:59:36] bd808: :) we'll see