[00:00:50] TimStarling: Do know regarding header concern thedj raised on https://gerrit.wikimedia.org/r/#/c/127818/3 ? [00:01:16] I can't find any record of ops concern for squid/varnish or some other part of our stack for why that would justify the meta tag. [00:01:47] (03Abandoned) 10BBlack: Varnish should restart on initscript/defaults changes [operations/puppet] - 10https://gerrit.wikimedia.org/r/115637 (owner: 10BBlack) [00:02:51] who added the meta tag? [00:04:11] it was yaron [00:04:40] https://gerrit.wikimedia.org/r/#/c/103387/ [00:04:52] "Re: doing this as an HTTP header instead; I kinda prefer the meta tag as it's more visible -- headers are black magic you have to dig for when debugging. :)" [00:04:57] says brion [00:05:17] oh, and that was in reply to you [00:05:52] * AaronSchulz likes how gerrit just does not work in iceweasel [00:07:28] TimStarling: Nice find. [00:07:49] the wonders of git gui blame [00:08:49] I usually use github's blame, but it had an outdated index causing it to get stuck between two revisions [00:08:53] AaronSchulz: wfm ? [00:09:29] Krinkle: I can't find any mention of varnish in the linked gerrit/bugz stuff, and I can't think of a good reason why varnish would care about this header [00:09:45] greg-g: it just says "working" all the fucking time so I use chromium for gerrit [00:09:47] but that doesn't necessarily mean there isn't one :) it would be nice if we had a record somewhere of whatever that concern was [00:09:48] the later works fine [00:09:57] no JS errors show up though [00:10:05] god knows what it's trying to do [00:10:38] work [00:10:44] AaronSchulz: :) it definitely taks longer than chrom(e|ium), but it works for me. Blame Google devs for not caring arbout anything other than chrome ;) [00:12:14] greg-g: https://groups.google.com/forum/#!topic/repo-discuss/2JszC4nKdvU not new [00:12:33] adding /1 often helps as per that bug...though it's still slow [00:12:49] * greg-g nods [00:16:48] (03PS1) 10Ori.livneh: webperf/deprecate: use 'meter' rather than 'counter' type [operations/puppet] - 10https://gerrit.wikimedia.org/r/130787 [00:18:14] (03PS2) 10Ori.livneh: webperf/deprecate: use 'meter' rather than 'counter' type [operations/puppet] - 10https://gerrit.wikimedia.org/r/130787 [00:20:32] (03CR) 10Ori.livneh: [C: 032] webperf/deprecate: use 'meter' rather than 'counter' type [operations/puppet] - 10https://gerrit.wikimedia.org/r/130787 (owner: 10Ori.livneh) [00:21:20] <^d> GWT -- "Google (and only Google browsers) Web Toolkit (that's totally for Chrome, duh)" [00:23:47] Gerrit also works in Icewasel 24.4.0 for me; I don't use Gerrit in Chrome enough to compare the performance. [00:30:20] Can someone take a look at the GettingStarted token patches? [00:30:21] https://gerrit.wikimedia.org/r/#/c/130381/ [00:30:56] This is reinstating the reverted code; the issues (hook, cookie name) are also resolved, and it's changed to only add the token if they're on an edit page (including VE), per paravoid's request. [00:32:11] bblack, if you're comfortable, could you review the meta->header patch (https://gerrit.wikimedia.org/r/#/c/127818/1)? [00:32:29] After I stopped the merge (since I didn't know if the Varnish issue was resolved), I saw your comment above and mentioned that on the Gerrit thread. [00:33:56] well, the only bit that bugs me is thedj saying "We should get someone with varnish knowledge to sign off, I remember vaguely that varnish was the reason we preferred the meta element the last time..." [00:34:35] in general random headers that aren't part of the Vary set shouldn't be an issue [00:35:16] (although keep in mind that caching will probably mean your new header won't show up for many anonymous pages for quite some time, as the old versions without the header are in-cache now) [00:35:28] (perhaps *that* was the varnishy concern when it was initially deployed) [00:36:23] in light of that, you might want to leave both versions of the tag in place for ~30 days for caches to expire, before removing the meta-tag [00:36:44] bblack, what's the scenario where neither the meta nor header would show up? [00:36:47] On a given page. [00:37:08] oh, you're right, I guess they'll get one version or the other either way [00:37:32] Okay, I'm going to step out. [00:38:19] I'll go +1 it on varnish issues anyways [00:43:15] (03PS2) 10JGonera: Enable Compact personal bar beta feature on test wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130266 [00:49:38] superm401: re 130381, what's the relationship between that and 130393 ? They seem overlapping (one does the cookie rename, the other does cookie rename + edit-only)? [00:50:00] can we just skip straight to the edit-only version of it? [01:02:13] (03CR) 10Manybubbles: "Looks sane. I can deploy the plugin tomorrow if you'd like. I'll do another review, I think. I have to push out an update to the highli" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:04:52] (03CR) 10Chad: "Would like Aaron/Faidon to chime in as to whether I've got the Swift config right (was mostly guessing, most unsure about 'key')." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:07:12] (03CR) 10MZMcBride: "Neat. I'm excited to try this out. :-)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130266 (owner: 10JGonera) [01:08:40] (03PS1) 10RobH: adding *.zero.wikipedia.org and zero.wikipedia.org to unified cert SANS [operations/puppet] - 10https://gerrit.wikimedia.org/r/130797 [01:09:13] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [01:10:32] (03CR) 10RobH: "This is NOT to go live unless someone is babysitting the change and ensuring it doesn't break EVERYTHING. As this is the certificate in u" [operations/puppet] - 10https://gerrit.wikimedia.org/r/130797 (owner: 10RobH) [01:11:13] all caps in my messages, i want folks to think im yelling at them ;] [01:26:04] (03PS1) 10BBlack: add DNS for lvs300x public-vlan addrs [operations/dns] - 10https://gerrit.wikimedia.org/r/130800 [01:26:13] (03PS1) 10BBlack: add public addrs to lvs300x over 802.1q [operations/puppet] - 10https://gerrit.wikimedia.org/r/130801 [01:28:09] (03CR) 10BBlack: [C: 032 V: 032] add DNS for lvs300x public-vlan addrs [operations/dns] - 10https://gerrit.wikimedia.org/r/130800 (owner: 10BBlack) [01:28:58] (03CR) 10BBlack: [C: 032 V: 032] add public addrs to lvs300x over 802.1q [operations/puppet] - 10https://gerrit.wikimedia.org/r/130801 (owner: 10BBlack) [01:34:32] (03CR) 10Aaron Schulz: Configure Swift-backed elasticsearch backups (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:37:29] (03CR) 10Faidon Liambotis: [C: 04-1] "I'd like a separate username/password than MediaWiki, so that we can isolate permissions (fault isolation) as well as being able to track " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:45:26] (03CR) 10Chad: "Inline comments to Aaron." (032 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 (owner: 10Chad) [01:58:36] bblack, 381 is essentially a mini-library for the user tokening/bucketting. It will have a future use case, in addition to the 'token on edit' case (once the experiment starts). [01:58:56] 393 is just to assign a token an edit, then instrument edits based on that. [01:59:51] bblack, there's no overlap that I see. 393 adds the cookie to the server-side (it doesn't rename it), since the server-side instrumentation needs it there. [02:04:24] yeah ok [02:04:54] superm401: 393's last comment was about "no need for review yet", should I wait on that one or do the same there now? [02:05:56] bblack, no, it's ready. Patch set 4 did the update I mentioned. [02:07:43] PROBLEM - ElasticSearch health check on elastic1004 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2032: active_shards: 6075: relocating_shards: 2: initializing_shards: 1: unassigned_shards: 0 [02:08:42] RECOVERY - ElasticSearch health check on elastic1004 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2034: active_shards: 6077: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [02:12:45] (03PS3) 10Chad: Configure Swift-backed elasticsearch backups [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130760 [02:22:47] !log LocalisationUpdate completed (1.24wmf1) at 2014-05-01 02:21:44+00:00 [02:22:54] Logged the message, Master [02:27:02] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 3.90333333333 [02:34:38] !log LocalisationUpdate completed (1.24wmf2) at 2014-05-01 02:33:34+00:00 [02:34:45] Logged the message, Master [02:35:52] PROBLEM - ElasticSearch health check on elastic1009 is CRITICAL: CRITICAL - Could not connect to server 10.64.32.141 [02:43:46] (03PS1) 10Springle: Switch analytics-store back to dbstore1002. [operations/dns] - 10https://gerrit.wikimedia.org/r/130808 [02:44:47] (03CR) 10Springle: [C: 032] Switch analytics-store back to dbstore1002. [operations/dns] - 10https://gerrit.wikimedia.org/r/130808 (owner: 10Springle) [02:45:33] bleh, its sad, I'm turning off most of the reindex jobs and going to start them one by one [02:45:47] looks like elastic1009 actually crashed [02:45:59] puppet must have restarted elasticsearch, because it is coming back [02:47:02] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [02:47:23] PROBLEM - ElasticSearch health check on elastic1003 is CRITICAL: CRITICAL - elasticsearch (production-search-eqiad) is running. status: red: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2034: active_shards: 5855: relocating_shards: 0: initializing_shards: 28: unassigned_shards: 195 [02:48:18] (03PS1) 10Withoutaname: Create 'noratelimit' user group on dewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130809 (https://bugzilla.wikimedia.org/57819) [02:48:39] now its just complaining about it coming back together.... [03:13:23] RECOVERY - ElasticSearch health check on elastic1003 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [03:13:42] RECOVERY - ElasticSearch health check on elastic1009 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 2041: active_shards: 6084: relocating_shards: 2: initializing_shards: 0: unassigned_shards: 0 [03:13:44] ^demon|away: should be allcalm now [03:18:46] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu May 1 03:17:39 UTC 2014 (duration 17m 38s) [03:18:52] Logged the message, Master [04:10:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [07:11:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [08:15:27] !log switching s1-analytics-slave db1047 enwiki to tokudb [08:15:34] Logged the message, Master [08:20:37] springle: can you please peek at https://gerrit.wikimedia.org/r/#/c/127909/ ? [08:28:15] (03PS1) 10Odder: Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130815 (https://bugzilla.wikimedia.org/64700) [08:29:24] (03CR) 10Springle: [C: 04-1] "Actually this is obsolete, but admittedly that wasn't obvious from the neglected RT ticket :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/127909 (owner: 10Matanya) [08:29:41] matanya: sorry, forgot about that one [08:32:56] greg-g: :) [09:24:30] _joe_: as the scoping guy, can you please advice on : https://gerrit.wikimedia.org/r/#/c/111787/2 [09:26:27] <_joe_> matanya: "the scoping guy" sounds terrible :P [09:26:37] :D [09:26:48] (03CR) 10Matanya: "Thanks, does the l10nupdate user use this key?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [09:28:15] <_joe_> btw, today is bank holiday here, I'm working on my other (volunteer-only) projects, so don't expect me to be here 100% of the time [09:29:03] <_joe_> matanya: I agree with the comments I see there. [09:29:22] sure they are right, wondering how to fix [09:29:35] and have a great holiday :) [09:31:41] <_joe_> matanya: it *is* ok as it is right now [09:32:08] (03Abandoned) 10Matanya: cache: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/111787 (owner: 10Matanya) [09:32:09] <_joe_> matanya: if $something is declared in the node def, you can reference it in classes onli as $something [09:33:35] yeah, stupid patch [09:33:35] thanks anyway [09:34:42] <_joe_> that is because variables that are undefined will be looked up in hiera, and hiera variables are usually node-dependent [09:43:09] (03CR) 10Matanya: "I'm a bit confused here. I see manifests/role/ldap.pp and modules/ldap/manifests/role/server.pp It seems like the latter only holds the op" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [09:45:18] (03CR) 10Matanya: webserver: fixing duplicate declaration of apache-mpm (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [10:12:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [10:13:58] <_joe_> why do people not acknowledge the alarms if they're being handled? [11:17:42] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 1.66666666667% of data exceeded the critical threshold [500.0] [11:23:32] (03PS1) 10Odder: SVG logos for two non-Wikibooks, non-Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) [11:28:17] (03PS1) 10Springle: Use correct my.cnf option. [operations/puppet] - 10https://gerrit.wikimedia.org/r/130822 [11:32:21] (03PS2) 10Odder: SVG logos for two non-Wikibooks, non-Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) [11:34:55] (03CR) 10Springle: [C: 032] Use correct my.cnf option. [operations/puppet] - 10https://gerrit.wikimedia.org/r/130822 (owner: 10Springle) [12:15:43] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1% data above the threshold [250.0] [12:46:11] (03CR) 10Jgreen: [C: 031] fix apache-fast-test for use in eqiad [operations/puppet] - 10https://gerrit.wikimedia.org/r/130614 (owner: 10Dzahn) [12:51:48] (03Abandoned) 10Manybubbles: Lower udp2log maxage [operations/puppet] - 10https://gerrit.wikimedia.org/r/125247 (owner: 10Manybubbles) [12:57:57] aude: can you link the patch for today's swat deploy? [12:58:04] on the deployments page, I mean [13:11:01] aude: Please link the actual patch needing deployment in the SWAT deploy on [[wikitech:Deployments]] [13:13:12] PROBLEM - Puppet freshness on osmium is CRITICAL: Last successful Puppet run was Wed Apr 30 10:04:02 2014 [13:23:11] (03PS1) 10coren: Tool Labs: support some non-work prefixes in crontabs [operations/puppet] - 10https://gerrit.wikimedia.org/r/130825 [13:26:26] (03CR) 10coren: [C: 032] Tool Labs: support some non-work prefixes in crontabs [operations/puppet] - 10https://gerrit.wikimedia.org/r/130825 (owner: 10coren) [13:35:56] (03PS1) 10Manybubbles: All remaining wikis get cirrus betafeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130826 [13:38:48] (03CR) 10Chad: [C: 031] All remaining wikis get cirrus betafeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130826 (owner: 10Manybubbles) [13:38:50] anomie: manybubbles shall have quick as gerrit / jenkins allows [13:39:01] if we are not ready in time, we can do later swat [13:39:27] aude: thanks! [13:39:53] manybubbles: I take it then you're going to handle today's SWAT? Fine with me if you want to. [13:40:04] anomie: either way is fine with me [13:40:37] manybubbles: That's how I feel too, on the days you don't have changes of your own going in [13:40:53] I'll do it then. [13:41:00] this should be our last one for a while [13:41:25] I'm cutting a release right now which intellectually taxing so the distraction shouldn't hurt [13:43:08] being a holiday here, our jenkins should supposedly be faster, less busy [13:44:07] (03PS2) 10Anomie: Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130815 (https://bugzilla.wikimedia.org/64700) (owner: 10Odder) [13:51:44] (03PS1) 10coren: Tool Labs: further tweaks/bugfixes to crontab [operations/puppet] - 10https://gerrit.wikimedia.org/r/130831 [13:53:18] Is Reedy around? [13:54:01] (03CR) 10coren: [C: 032] Tool Labs: further tweaks/bugfixes to crontab [operations/puppet] - 10https://gerrit.wikimedia.org/r/130831 (owner: 10coren) [13:54:30] anomie: Morning. https://bugzilla.wikimedia.org/show_bug.cgi?id=43737#c11 really ought to have someone from ops poke at it, if you know anyone who has time today. [13:55:37] (03CR) 10Anomie: SVG logos for two non-Wikibooks, non-Wikisource wikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [13:56:24] Gloria: can take a look [13:56:37] Thanks! [13:56:43] Gloria: I'm looking too [13:56:58] (03PS1) 10Cmjohnson: adding dns entries for db10[64-73] and fixing tab/spaces [operations/dns] - 10https://gerrit.wikimedia.org/r/130832 [13:57:55] It's definitely the RSS code that's triggering the mediawiki.org issue. [13:58:09] Weird... you'd think wikimediafoundation.org would be affected as well. [13:59:11] Probably it's only triggered by some bit of wikitext or something. But Extension:RSS seems likely due to the backtrace I see in the log [14:00:37] RSS indeed [14:01:05] https://www.mediawiki.org/wiki/Template:RSSPost [14:01:06] Hmmm. [14:01:13] \o/ anomie [14:01:31] and a deleted template? [14:01:37] Maybe that does it... [14:02:42] Gloria: looking at the code, i think so [14:02:47] Gloria, aude: Looks like the problem is that Extension:RSS doesn't check whether the page passed to its "template" parameter actually exists before trying to get the content, and Article::fetchContent() doesn't check either. [14:02:48] it doesn't check if it's deleted [14:03:12] Quick fix would be to undelete the template. [14:03:23] anomie: want to fix the code? [14:03:24] Or kill template="RSSPost"? [14:03:35] * aude needs to prepare my wikidata patch [14:03:51] aude: Yeah, as soon as I figure out whether the fix should be in core or Extension:RSS. Or maybe both. [14:03:58] ok [14:06:52] (03CR) 10Cmjohnson: [C: 032] adding dns entries for db10[64-73] and fixing tab/spaces [operations/dns] - 10https://gerrit.wikimedia.org/r/130832 (owner: 10Cmjohnson) [14:07:19] (03CR) 10Odder: SVG logos for two non-Wikibooks, non-Wikisource wikis (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [14:16:09] manybubbles: If we can get https://gerrit.wikimedia.org/r/#/c/130835/ reviewed in time, it might be good to add backporting that to the SWAT. [14:20:26] (03PS1) 10coren: Tool Labs: Make xcrontab smarter about environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/130836 [14:22:23] (03CR) 10coren: [C: 032] Tool Labs: Make xcrontab smarter about environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/130836 (owner: 10coren) [14:34:39] anomie: your wish, and all that [14:36:47] * anomie adds to SWAT list [14:38:42] Not much point in doing 1.24wmf1, since everything using RSS is already on wmf2. [14:39:31] and wmf1 will be done in a couple of hours anyways [14:40:00] That would've been my reasoning if RSS had been enabled on one of the Wikipedias. ;) [14:40:27] ha [14:47:43] making core submodule patch now [14:54:59] (03PS1) 10Andrew Bogott: Change UIDs for a bunch of users: [operations/puppet] - 10https://gerrit.wikimedia.org/r/130841 [14:55:28] aude: thanks! [14:55:43] twkozlowski: I'll do your changes first because config changes are quicker. five minute warning [14:56:21] James_F|Away or RoanKattouw_away: are either of you available to verify that your swat change worked once I push it? [14:59:13] manybubbles: linked on the wiki [15:00:36] manybubbles: Yes. [15:00:45] aude: wonderf [15:00:48] wonderful [15:01:04] why did I expect tab to autocomplete wonderful [15:01:13] heh [15:01:14] James_F: k. [15:01:25] aude: i'll do yours first since I haven't heard from twkozlowski [15:01:31] * manybubbles has the conch [15:01:34] ok [15:01:44] * aude has tabs open to verify the fixes [15:03:33] jenkins is merging.... [15:03:41] ok [15:04:01] (03CR) 10Andrew Bogott: [C: 032] Change UIDs for a bunch of users: [operations/puppet] - 10https://gerrit.wikimedia.org/r/130841 (owner: 10Andrew Bogott) [15:07:13] aude: syncing [15:07:35] k [15:07:48] taking a while [15:08:06] James_F: you'll be next once aude verifies her fix [15:08:12] !log manybubbles synchronized php-1.24wmf2/extensions/Wikidata/ 'SWAT update for time parsing and formatting' [15:08:12] Kk. [15:08:16] Logged the message, Master [15:08:21] * aude checks [15:08:37] perfect! [15:08:51] great! [15:09:04] * aude done bug fixing and back to regularly assigned stuff [15:09:28] aude: feels good [15:10:46] wasn't anything too terrible but not things to leave broken for a couple weeks [15:12:41] Gloria: I am now... [15:17:27] twkozlowski: are you around? [15:18:12] (03CR) 10BryanDavis: "> Thanks, does the l10nupdate user use this key?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116936 (owner: 10Matanya) [15:19:12] looks like it is clear for a merge [15:26:30] James_F: syncing [15:26:42] Thanks. [15:27:34] !log manybubbles synchronized php-1.24wmf2/extensions/VisualEditor/ 'SWAT update for firefox focus' [15:27:40] Logged the message, Master [15:27:54] James_F: there you go, please verify everything is ok [15:28:41] anomie: starting the merge process on your patch [15:28:43] Yup, looks grand. Thank you! [15:28:48] manybubbles: ok [15:28:53] twkozlowski: still looking for you so I can push your patches [15:33:45] anomie: syncing [15:34:39] ottomata: looks like elastic1008 is ready for management! [15:34:47] ohwweeoo [15:34:48] ok [15:34:48] !log manybubbles synchronized php-1.24wmf2/includes/Article.php 'SWAT update to prevent fatal in backwards compatibility method' [15:34:54] manybubbles: Appears to work. Thanks! [15:34:55] Logged the message, Master [15:34:59] sweet [15:35:14] manybubbles: should I do that now? [15:35:27] ottomata: hmmm.... sure! [15:35:49] !log reassigning a ton of UIDs in production; running a couple dozen 'find' commands to chown files [15:35:56] Logged the message, Master [15:35:57] ok, manybubbles, moving shards off [15:36:06] yay [15:36:21] hm, elastic1008 already has fewer shards [15:36:25] any particular reason why? [15:36:28] {"length":170,"node":"elastic1008"} [15:36:29] manybubbles: ohaio [15:36:33] twkozlowski: yay! [15:36:52] ottomata: because it has less disk to hold the shards and I've pushed a bunch of oversides shards recently [15:36:58] I need to find a better way to get ping notifications [15:37:00] I'll be fixing the shard sizes today [15:37:09] irssi + screen doesn't work too well. [15:37:14] (03CR) 10Manybubbles: [C: 032] SVG logos for two non-Wikibooks, non-Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [15:37:17] (03CR) 10Manybubbles: [C: 032] Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130815 (https://bugzilla.wikimedia.org/64700) (owner: 10Odder) [15:37:24] thanks manybubbles [15:37:30] (03Merged) 10jenkins-bot: SVG logos for two non-Wikibooks, non-Wikisource wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130821 (https://bugzilla.wikimedia.org/52019) (owner: 10Odder) [15:37:30] sorry to have made you wait so long [15:37:35] (03Merged) 10jenkins-bot: Correct a domain in wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130815 (https://bugzilla.wikimedia.org/64700) (owner: 10Odder) [15:38:38] twkozlowski: syncing [15:38:41] twkozlowski: its ok. [15:40:07] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'SWAT fix GWtoolset url and add some more logos' [15:40:13] ottomata: Elasticsearch has disk utilization stoppers that kick in at 85% and 95%, respectively. Rather, I've configured them to do that. at 85% it'll refuse more shards and they'll go elsewhere. at 95% it'll start moving things off. [15:40:17] Logged the message, Master [15:40:24] twkozlowski: there you go, please verify [15:40:31] ahhh [15:40:34] makes sense, cool [15:42:32] (03PS1) 10Manybubbles: Update highlighter [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/130846 [15:42:44] (03PS1) 10Rush: puppet-lint in labs [operations/puppet] - 10https://gerrit.wikimedia.org/r/130847 [15:43:03] manybubbles: The logos didn't change for me yet, but I expect cache [15:43:31] twkozlowski: makes sense to me. I don't know that branch of the code at all, though [15:47:37] * manybubbles puts down the conch for 15 minutes [15:49:39] manybubbles: https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php [15:49:42] can't see it here. [15:50:30] twkozlowski: my fault, syncing again [15:50:33] * manybubbles has the conch [15:51:37] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'SWAT fix GWtoolset url and add some more logos' [15:51:46] twkozlowski: check again please [15:52:20] (03CR) 10Krinkle: "I've addressed Hoo's concerns and rebased the patches into a single mergable stack. He revoked his -1 (though Gerrit doesn't show that)." [operations/puppet] - 10https://gerrit.wikimedia.org/r/130071 (owner: 10Krinkle) [15:53:33] anyone around who can give me a pointer on where to put something puppet? [15:54:28] chasemp: depends on what it is [15:54:49] fair, so I have a defined thing for cpu monitoring w/ diamond [15:55:04] I want it to be included as a base or default or whatever term [15:55:08] * manybubbles puts down the conch [15:55:11] probably similar to how ganglia does it [15:55:16] no idea :/ [15:55:18] but what what I'm seeing makes no sense to me [15:55:24] :) thanks anyway [15:56:06] ottomata might be your best bet. I bother him with monitoring stuff all the time! [15:56:37] ^demon|away: about ready to turn on betafeature everywhere! [15:56:41] wooooooo [15:57:22] greg-g: is now ok for that? I figured it was the same time as the schedule + one day [15:57:34] manybubbles: sure [15:58:40] (03CR) 10Manybubbles: [C: 032] All remaining wikis get cirrus betafeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130826 (owner: 10Manybubbles) [15:58:53] (03Merged) 10jenkins-bot: All remaining wikis get cirrus betafeature [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130826 (owner: 10Manybubbles) [15:58:53] * manybubbles has the conch again [15:59:16] ottomata: thoughts? [15:59:19] ok so wha? [15:59:32] what is it? [15:59:34] python script? [15:59:38] so a module for diamond exists [15:59:55] and there is a defined type to add a monitoring collector -- cpu / mem whatever [16:00:02] I have a cpu one I want to add [16:00:10] where in the heck do I put it, not a role persay [16:00:18] an ganglia stuff seems to spread all over creation [16:00:46] ah, hm [16:01:04] so, you need to define a define somewhere? this is for all nodes or just one? or just a type? [16:01:06] !log manybubbles synchronized wmf-config/InitialiseSettings.php 'Enable cirrus as a betafeature on all wikis which did not already have it.' [16:01:09] <_joe_> chasemp: IMO for base metrics you should create a class (not a role) you include in the standard class [16:01:13] Logged the message, Master [16:01:20] "'Enable cirrus as a betafeature on all wikis which did not already have it.' [16:01:29] * greg-g pops champagne [16:01:39] <_joe_> (and happy mayday to everyone) [16:01:56] (a little early, the world might still end and we'll have to revert, but pre-emptive champagne is always nice) [16:02:03] _joe_: ditto [16:02:12] <_joe_> greg-g: can I ask you what is cirrus based on? [16:02:13] _joe_: so a class in puppet/manifests/diamond.pp or something within which is a parameterized diamond defined type? [16:02:24] and that is included in standard in site.pp? [16:02:34] Yay mayday [16:02:48] manybubbles: Oops, sorry, been closing an RfA. [16:02:55] <_joe_> chasemp: directly the class in modules/$diamond_module/manifests/init.pp [16:02:59] manybubbles: Checked it right now, and it works! \o/ [16:03:04] chasemp: so, this will be included on every node? [16:03:10] twkozlowski: sweet [16:03:15] eventually yes [16:03:17] ^demon|away: we're live on atleast jawiki [16:03:23] I checked it because I can kinda read it. [16:03:25] kinda [16:03:31] * manybubbles puts down the conch [16:03:35] is the define different for different nodes? [16:03:35] <_joe_> chasemp: at start we can include it in the single nodes [16:03:40] or is it always the same? [16:03:42] _joe_: elastic search backend, php mw extension [16:03:47] <_joe_> ottomata: it should be the same [16:03:48] like, different parameters? [16:03:51] <_joe_> greg-g: ok, ES [16:03:53] in this case same [16:04:02] then, yeah, i think a class is right, probably a class in your module that uses it [16:04:09] except of course, i don't want it to hit labs in any way [16:04:12] and then that class included from either class standard...or maybe even from base module [16:04:19] manybubbles: you put the conch down, you good/feel safe? [16:04:20] <_joe_> ottomata: the paramters IMO should be defined at node level and be looked up in hiera eventually [16:04:34] in site.pp you mean? [16:04:50] <_joe_> chasemp: IF you need to define parameter values, yes [16:04:58] I think the 'role' abstraction is meant for parameterizing things to keep it out of site.pp [16:05:09] but in this case it being different it's odd [16:05:19] ok so [16:05:23] <^demon> :) [16:05:28] thijs would be like diamond::collector { 'CPU': [16:05:29] ... [16:05:30] } [16:05:30] so [16:05:30] yeah [16:05:32] true [16:05:32] probably [16:05:33] <_joe_> chasemp: these are *env* variables [16:05:34] a class in [16:05:44] diamond/manifests/cpu.pp [16:05:46] maybe? [16:05:47] or [16:05:48] <^demon> manybubbles: I'm both here and |away [16:05:49] collector/cpu.pp [16:05:51] then you can just [16:05:53] <^demon> (|away is my bouncer) [16:05:57] include diamond::collector::cpu [16:06:00] I tried that and ori yanked it out saying it should be in base, etc [16:06:05] so I wasn't sure what that means I guess [16:06:05] ? [16:06:18] <_joe_> chasemp: what code review please? [16:06:22] if you have some generic abstractions that are not wmf specific [16:06:31] i think its ok to put them in their own classes in the module [16:06:33] so this: include diamond::collector::cpu [16:06:34] <_joe_> chasemp: we did not a great work in organizing puppet here. [16:06:36] manybubbles: ^demon if you feel like you won't need to revert/are good I'll pass the conch to subbu / gwicke [16:06:41] would be under standard in site.pp in that case? [16:06:43] as long as the are used manually from the users [16:06:49] <_joe_> but, I'm out! bank holiday yay! [16:06:54] chasemp, eventually, i htink so yeah [16:06:54] or [16:06:55] _joe_: enjoy :) [16:06:58] morelikely in base/init.pp somewhere [16:07:02] <_joe_> chasemp: ping me in pvt if you need [16:07:03] which is included by standard [16:07:07] _joe_: thanks man, have a good one! and I will [16:07:21] I think I see what I can do for now at least [16:07:25] actually [16:07:30] trying to model it on ganglia may have been a mistake in this case [16:07:36] maybe base::monitoring??host [16:07:36] dunno [16:07:39] hmm, have there been any recent changes wrt git-deploy? [16:07:51] none of the nodes seem to be syncing [16:07:59] gwicke: i have one coming in soon, but havne't merged it yet [16:08:03] why aren't they syncing? [16:08:06] does this use submodules? [16:08:07] ottomata: I will do diamond::collector::cpu [16:08:16] and then the inclusion for now can be host level to deploy slowl [16:08:17] ottomata, we do use submodules [16:08:21] slowly I guess [16:08:22] <^demon> greg-g: manybubbles' call, I'm just a spectator in all this today :) [16:08:31] give that a whirl, ottomata thanks man [16:08:41] cool, yw [16:08:50] gwicke: submodules have not been working well for me ever. [16:08:57] gwicke, your id got changed today? [16:09:01] my change will maybe help, not entirely sure yet [16:09:06] andrewbogott, ^ [16:09:06] ottomata, they were buggy initially, but have been working since [16:09:09] gwicke: what is the error code from the sync? [16:09:13] 50? [16:09:17] if you get detailed status [16:09:18] wtp1018.eqiad.wmnet: [16:09:18] fetch status: 0 [started: 1134 mins ago, last-return: 1134 mins ago] [16:09:29] 0 is ok...though, no? [16:09:34] fetch? [16:09:38] hm, i haven't seen that problem [16:09:46] usually fetch works, but checkout doesn't for me [16:09:49] subbu, gwicke: I'm still in the process of chowning files. The actual uid stuff should be changed by now... [16:09:55] 0/24 minions completed fetch [16:09:56] greg-g: I seed my time [16:09:56] logging in might be weird if you turn out to not own your homedir yet [16:10:05] I find it unlikely I'm going to have to revert [16:10:24] andrewbogott, could this affect sudo calls to salt? [16:10:24] gwicke: sad:( [16:10:25] !log reedy updated /a/common to {{Gerrit|I832b45db6}}: Correct a domain in wgCopyUploadsDomains [16:10:31] Logged the message, Master [16:10:38] gwicke: I wouldn't think so [16:10:48] Well, actually... [16:11:04] <^demon> manybubbles: You want https://gerrit.wikimedia.org/r/#/c/130846/? [16:11:10] gwicke, i can try the sync and see what happens. [16:11:12] PROBLEM - Apache HTTP on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:11:15] I'm not sure. Naturally if your user account is scrambled on the target then sudo would fail [16:11:20] subbu, go ahead [16:11:31] andrewbogott, the sudo only happens on tin [16:11:33] (03PS1) 10Reedy: Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130849 [16:11:35] (03PS1) 10Reedy: testwiki to 1.24wmf3 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130850 [16:11:37] (03PS1) 10Reedy: Wikipedias to 1.24wmf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130851 [16:11:39] (03PS1) 10Reedy: group0 wikis to 1.24wmf3 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130852 [16:11:41] from there it's all root via salt [16:11:43] ^demon: I'll merge it before I deploy it. If you'd like to give it a ceremonial +1, that'd be cool [16:11:47] (03CR) 10Reedy: [C: 032] Add symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130849 (owner: 10Reedy) [16:11:49] gwicke: should be obvious then; if tin is happy then your personal uid shouldn't matter [16:11:53] since everything happens as root after that [16:12:19] gwicke: log into one of the nodes [16:12:21]